Secure Shared Data Analysis Environment on Kubernetes at CERN

 
J
u
p
y
t
e
r
H
u
b
 
o
n
 
K
u
b
e
r
n
e
t
e
s
a
s
 
a
 
p
l
a
t
f
o
r
m
 
f
o
r
 
d
e
v
e
l
o
p
i
n
g
 
s
e
c
u
r
e
s
h
a
r
e
d
 
e
n
v
i
r
o
n
m
e
n
t
f
o
r
 
d
a
t
a
 
a
n
a
l
y
s
i
s
 
a
t
 
M
A
X
 
I
V
 
A
n
d
r
i
i
 
S
a
l
n
i
k
o
v
,
Z
d
e
n
e
k
 
M
a
t
e
j
,
 
D
m
i
t
r
i
i
 
E
r
m
a
k
o
v
,
 
J
a
s
o
n
 
B
r
u
d
v
i
k
 
CS3 2024, CERN
 
 
Container images 
with pre-defined and
custom kernels
Kubernetes cluster
a
s
 
a
 
r
e
s
o
u
r
c
e
 
p
o
o
l
:
Moderate CPU
Large RAM
V100/A100 GPUs
a
s
 
a
 
d
e
p
l
o
y
m
e
n
t
 
p
l
a
t
f
o
r
m
:
review/prod/next lifecycle
CI testing of notebook images
a
s
 
a
 
r
u
n
t
i
m
e
 
e
n
v
i
r
o
n
m
e
n
t
Shared service 
for staff and researchers
Remote-desktop style experience
Resources overcommit
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
2
 
I
n
t
e
r
a
c
t
i
v
e
 
d
a
t
a
 
a
n
a
l
y
s
i
s
 
e
n
v
i
r
o
n
m
e
n
t
 
G
o
a
l
s
 
a
n
d
 
t
e
c
h
n
i
c
a
l
 
r
e
q
u
i
r
e
m
e
n
t
s
 
K
e
y
 
o
b
j
e
c
t
i
v
e
:
 
f
u
l
l
y
 
u
n
p
r
i
v
i
l
e
g
e
d
 
c
o
n
t
a
i
n
e
r
 
e
n
v
i
r
o
n
m
e
n
t
 
t
h
a
t
o
p
e
r
a
t
e
s
 
s
e
a
m
l
e
s
s
l
y
 
w
i
t
h
 
e
x
i
s
t
i
n
g
 
L
D
A
P
 
u
s
e
r
 
c
r
e
d
e
n
t
i
a
l
s
F
u
n
c
t
i
o
n
a
l
 
R
e
q
u
i
r
e
m
e
n
t
s
:
I
n
t
e
g
r
a
t
i
o
n
 
w
i
t
h
 
M
A
X
 
I
V
 
s
t
o
r
a
g
e
 
s
y
s
t
e
m
s
 
(
h
o
m
e
,
 
g
r
o
u
p
,
 
d
a
t
a
)
R
u
n
 
a
n
y
 
n
o
t
e
b
o
o
k
 
i
m
a
g
e
s
 
w
i
t
h
o
u
t
 
m
o
d
i
f
i
c
a
t
i
o
n
s
E
n
s
u
r
e
 
a
v
a
i
l
a
b
l
e
 
r
e
s
o
u
r
c
e
s
 
v
i
s
i
b
i
l
i
t
y
E
f
f
i
c
i
e
n
t
 
s
h
a
r
i
n
g
 
o
f
 
a
v
a
i
l
a
b
l
e
 
G
P
U
 
r
e
s
o
u
r
c
e
s
 
b
e
t
w
e
e
n
 
u
s
e
r
s
O
b
s
e
r
v
a
b
i
l
i
t
y
 
o
f
 
u
s
a
g
e
 
m
e
t
r
i
c
s
O
p
e
r
a
t
i
o
n
 
R
e
q
u
i
r
e
m
e
n
t
s
:
Z
e
r
o
 
t
o
 
J
u
p
y
t
e
r
H
u
b
 
w
i
t
h
 
K
u
b
e
r
n
e
t
e
s
 
H
e
l
m
 
C
h
a
r
t
 
w
i
t
h
o
u
t
 
m
o
d
i
f
i
c
a
t
i
o
n
s
Just custom hooks and proper 
values.yaml
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
3
 
E
x
i
s
t
i
n
g
 
L
D
A
P
 
c
r
e
d
e
n
t
i
a
l
s
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
4
 
UID/GIDs 
from Token
to 
securityContext
NSS 
data sync from
LDAP
 to 
configMap
 to
mount inside container
Environment variables
to define 
HOME
directory, etc
Wrapper 
startup script
to bootstrap
environment
Storage mounts 
are
simply defined in the
values.
 
L
X
C
F
S
:
 
R
e
s
o
u
r
c
e
s
 
v
i
s
i
b
i
l
i
t
y
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
5
 
LXCFS
 is a FUSE
filesystem offering 
overlay
files
 for 
cpuinfo
, 
meminfo
,
uptime
, etc
Deployed as DaemonSet
on Kubernetes level
Visible CPU and RAM
container limits
Mounted to 
/proc
 and
/sys
 in 
pre_spawn 
hook
Defining additional
environment variables in
startup scripts
 
G
P
U
 
s
h
a
r
i
n
g
:
 
M
o
r
t
a
l
G
P
U
 
d
e
v
e
l
o
p
m
e
n
t
 
K
u
b
e
r
n
e
t
e
s
 
d
e
v
i
c
e
 
p
l
u
g
i
n
 
f
o
r
 
G
P
U
 
m
e
m
o
r
y
 
o
v
e
r
c
o
m
m
i
t
,
 
w
h
i
l
e
m
a
i
n
t
a
i
n
i
n
g
 
a
l
l
o
c
a
t
i
o
n
 
l
i
m
i
t
 
p
e
r
 
G
P
U
 
w
o
r
k
l
o
a
d
 
-
 
t
h
e
 
a
p
p
r
o
a
c
h
 
u
s
e
d
 
f
o
r
s
h
a
r
i
n
g
 
R
A
M
 
o
n
 
K
u
b
e
r
n
e
t
e
s
.
F
o
r
k
 
o
f
 
M
e
t
a
G
P
U
 
w
i
t
h
 
d
e
v
e
l
o
p
m
e
n
t
 
f
o
c
u
s
 
o
n
 
i
n
t
e
r
a
c
t
i
v
e
 
w
o
r
k
l
o
a
d
s
r
u
n
 
b
y
 
m
o
r
t
a
l
s
 
(
w
i
t
h
 
o
p
e
r
a
t
i
o
n
s
 
s
u
p
p
o
r
t
 
b
y
 
m
o
r
t
a
l
 
a
d
m
i
n
s
)
Provides:
Device Plugin
: represent GPU (or MIG partition) with configurable number of
meta-devices (e.g. 
320
 of 
mortalgpu/v100
)
Memory enforcement 
based on the usage monitoring data
Kubernetes-aware observability 
in general and container-scoped
resource usage in particular:
mgctl
 tool and Prometheus exporter
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
6
 
J
u
p
y
t
e
r
h
u
b
 
w
i
t
h
 
M
o
r
t
a
l
G
P
U
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
7
 
Kubernetes DaemonSet
G
P
U
 
R
A
M
 
r
e
s
o
u
r
c
e
r
e
q
u
e
s
t
s
 
a
n
d
 
l
i
m
i
t
s
,
d
e
f
i
n
e
d
 
t
h
e
 
s
a
m
e
 
w
a
y
 
a
s
R
A
M
Multiple MortalGPU
resources available
(different GPUs and
partitions)
Wrapper over 
mgctl
 to
provide 
nvidia-smi
output for container
processes only
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
8
 
C
o
m
p
u
t
e
 
I
n
s
t
a
n
c
e
 
p
r
o
f
i
l
e
s
 
a
n
d
 
R
B
A
C
 
KubeSpawner is capable
of 
running additional
containers 
in the user Pod
Isolated walltime
countdown container
terminating user server
 via
JupyterHub API
Using the JupyterHub 
RBAC
feature
Developed UI extension 
to
show values to end-user
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
9
 
E
x
t
r
a
 
c
o
n
t
a
i
n
e
r
s
 
=
 
e
x
t
r
a
 
f
e
a
t
u
r
e
s
 
W
W
a
a
l
l
l
l
t
t
i
i
m
m
e
e
 
 
e
e
n
n
f
f
o
o
r
r
c
c
e
e
m
m
e
e
n
n
t
t
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
10
 
U
s
e
-
c
a
s
e
:
 
N
o
r
d
u
g
r
i
d
 
A
R
C
 
C
l
i
e
n
t
 
PoC
: Small grid for
transparent HPC usage
ARC with 
OAuth2 JWT
tokens
 auth:
Map to self at MAX IV
resources
Map to pool on external
resources
Additional challenge:
existing data sharing to
external sites with JWT
auth, following user
permissions
 
Idea
: 
use JupyterHub as ”oidc-agent”
 
KeyCloak Authenticator
 to
refresh access tokens
Isolated 
token-helper
container
 with privileges to
read 
auth_state
Using the Jupyterhub 
RBAC
feature
API to provide 
only Access
Tokens
 to JupyterLab
container
wrapper to use in ARC CLI
transparently
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
11
 
E
x
t
r
a
 
c
o
n
t
a
i
n
e
r
s
 
=
 
e
x
t
r
a
 
f
e
a
t
u
r
e
s
 
O
O
I
I
D
D
C
C
-
-
a
a
g
g
e
e
n
n
t
t
 
 
f
f
o
o
r
r
 
 
A
A
R
R
C
C
 
K
u
b
e
P
i
e
:
 
s
h
a
r
i
n
g
 
e
x
i
s
t
i
n
g
 
d
a
t
a
 
o
v
e
r
 
h
t
t
p
s
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
12
 
I
d
e
a
:
 
o
w
n
 
w
e
b
 
s
e
r
v
e
r
 
f
o
r
 
e
a
c
h
 
u
s
e
r
 
w
i
t
h
 
t
h
e
 
c
o
r
r
e
c
t
 
U
I
D
/
G
I
D
s
Sounds crazy? But we do run such Pods for each user in JupyterHub!
K
u
b
e
P
i
e
 
i
s
 
h
a
r
n
e
s
s
i
n
g
 
K
u
b
e
r
n
e
t
e
s
'
 
s
c
a
l
a
b
i
l
i
t
y
 
a
n
d
 
d
e
p
l
o
y
m
e
n
t
c
a
p
a
b
i
l
i
t
i
e
s
 
b
y
 
r
u
n
n
i
n
g
,
 
m
a
n
a
g
i
n
g
 
a
n
d
 
s
e
c
u
r
i
n
g
 
w
e
b
 
s
e
r
v
e
r
s
 
f
o
r
e
v
e
r
y
 
u
s
e
r
K
u
b
e
P
i
e
 
i
s
 
s
t
r
i
c
t
l
y
 
r
e
l
y
i
n
g
 
o
n
 
O
p
e
n
I
D
 
C
o
n
n
e
c
t
 
f
l
o
w
 
o
r
 
O
A
u
t
h
2
b
e
a
r
e
r
 
t
o
k
e
n
s
 
w
h
e
n
 
i
t
 
c
o
m
e
s
 
t
o
 
t
h
e
 
u
s
e
r
 
i
d
e
n
t
i
f
i
c
a
t
i
o
n
OAuth2 used in ARC PoC for data transfers
Claims-based user-mapping during Pod instantiation (admission)
Other auth credentials accessible via OIDC:
 WebDAV with S3-like credentials is implemented as an example
K
u
b
e
P
i
e
:
 
B
a
k
i
n
g
 
p
r
o
c
e
s
s
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
13
KubePie@MAX IV
is running on the
Data Acquisition
Kubernetes Cluster
 
C
o
n
c
l
u
s
i
o
n
s
 
Extensibility of both JupyterHub and Kubernetes 
allows
 to build data
analysis platforms, matching organization needs in functionality and
security.
LXCFS on the Kubernetes 
brings
 allocated resources visibility to
both interactive and batch containerized workloads.
Flexible and observable GPUs sharing with MortalGPU 
enriches
 the
interactive shared environments with CUDA capabilities.
Compute Instance profiles and RBAC 
extends
 the usage patterns of
the shared platform, improving the end-user experience.
Additional containers
 in the running Pod 
open a way
 to securely add
features beyond the usual JupyteHub capabilities.
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
14
 
T
h
a
n
k
 
y
o
u
 
f
o
r
 
a
t
t
e
n
t
i
o
n
!
 
Andrii Salnikov, JupyterHub@MAX IV, CS3 2024
 
15
 
Mail To: 
andrii.salnikov@maxiv.lu.se
 
Source code and
deployment configuration
can be found on
gitlab.com
We are working
towards establishing
similar deployment for
providing EOSC service
as Open Data analysis
platform
 
Ask me about:
Slide Note
Embed
Share

Develop a secure shared data analysis environment at MAX IV using CERN JupyterHub on Kubernetes. Utilize container images with custom kernels and manage resources efficiently, including GPU sharing. Integrate with existing LDAP credentials for seamless operation. Follow operational requirements with zero to JupyterHub setup and efficient user resource visibility. Achieve fully unprivileged container environment with functional requirements met successfully.

  • Data Analysis
  • Kubernetes
  • Secure Environment
  • JupyterHub
  • LDAP Integration

Uploaded on Apr 05, 2024 | 3 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CS3 2024, CERN JupyterHub on Kubernetes as a platform for developing secure shared environment for data analysis at MAX IV Andrii Salnikov, Zdenek Matej, Dmitrii Ermakov, Jason Brudvik Mail To: andrii.salnikov@maxiv.lu.se

  2. Interactive data analysis environment Container images with pre-defined and custom kernels Kubernetes cluster as a resource pool: Moderate CPU Large RAM V100/A100 GPUs as a deployment platform: review/prod/next lifecycle CI testing of notebook images as a runtime environment Shared service for staff and researchers Remote-desktop style experience Resources overcommit Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 2

  3. Goals and technical requirements Key objective: fully unprivileged container environment that operates seamlessly with existing LDAP user credentials Functional Requirements: Integration with MAX IV storage systems (home, group, data) Run any notebook imageswithout modifications Ensure available resources visibility Efficient sharing of available GPU resources between users Observability of usage metrics Operation Requirements: Zero to JupyterHub with Kubernetes Helm Chart without modifications Just custom hooks and proper values.yaml Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 3

  4. Existing LDAP credentials UID/GIDs from Token to securityContext NSS data sync from LDAP to configMap to mount inside container Environment variables to define HOME directory, etc Wrapper startup script to bootstrap environment Storage mounts are simply defined in the values. Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 4

  5. LXCFS: Resources visibility LXCFS is a FUSE filesystem offering overlay files for cpuinfo, meminfo, uptime, etc Deployed as DaemonSet on Kubernetes level Visible CPU and RAM container limits Mounted to /proc and /sys in pre_spawn hook Defining additional environment variables in startup scripts Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 5

  6. GPU sharing: MortalGPU development Kubernetes device plugin for GPU memory overcommit, while maintaining allocation limit per GPU workload - the approach used for sharing RAM on Kubernetes. Fork of MetaGPU with development focus on interactive workloads run by mortals (with operations support by mortal admins) Provides: Device Plugin: represent GPU (or MIG partition) with configurable number of meta-devices (e.g. 320 of mortalgpu/v100) Memory enforcement based on the usage monitoring data Kubernetes-aware observability in general and container-scoped resource usage in particular: mgctl tool and Prometheus exporter Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 6

  7. Jupyterhub with MortalGPU Kubernetes DaemonSet GPU RAM resource requests and limits, defined the same way as RAM Multiple MortalGPU resources available (different GPUs and partitions) Wrapper over mgctl to provide nvidia-smi output for container processes only Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 7

  8. Compute Instance profiles and RBAC Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 8

  9. Extra containers = extra features Walltime enforcement KubeSpawner is capable of running additional containers in the user Pod Isolated walltime countdown container terminating user server via JupyterHub API Using the JupyterHub RBAC feature Developed UI extension to show values to end-user Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 9

  10. Use-case: Nordugrid ARC Client PoC: Small grid for transparent HPC usage ARC with OAuth2 JWT tokens auth: Map to self at MAX IV resources Map to pool on external resources Additional challenge: existing data sharing to external sites with JWT auth, following user permissions Idea: use JupyterHub as oidc-agent Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 10

  11. Extra containers = extra features OIDC-agent for ARC KeyCloak Authenticator to refresh access tokens Isolated token-helper container with privileges to read auth_state Using the Jupyterhub RBAC feature API to provide only Access Tokens to JupyterLab container wrapper to use in ARC CLI transparently Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 11

  12. KubePie: sharing existing data over https Idea: own web server for each user with the correct UID/GIDs Sounds crazy? But we do run such Pods for each user in JupyterHub! KubePie is harnessing Kubernetes' scalability and deployment capabilities by running, managing and securing web servers for every user KubePie is strictly relying on OpenID Connect flow or OAuth2 bearer tokens when it comes to the user identification OAuth2 used in ARC PoC for data transfers Claims-based user-mapping during Pod instantiation (admission) Other auth credentials accessible via OIDC: WebDAV with S3-like credentials is implemented as an example Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 12

  13. KubePie: Baking process KubePie@MAX IV is running on the Data Acquisition Kubernetes Cluster Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 13

  14. Conclusions Extensibility of both JupyterHub and Kubernetes allows to build data analysis platforms, matching organization needs in functionality and security. LXCFS on the Kubernetes brings allocated resources visibility to both interactive and batch containerized workloads. Flexible and observable GPUs sharing with MortalGPU enriches the interactive shared environments with CUDA capabilities. Compute Instance profiles and RBAC extends the usage patterns of the shared platform, improving the end-user experience. Additional containers in the running Pod open a way to securely add features beyond the usual JupyteHub capabilities. Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 14

  15. Thank you for attention! Source code and deployment configuration can be found on gitlab.com Ask me about: We are working towards establishing similar deployment for providing EOSC service as Open Data analysis platform Mail To: andrii.salnikov@maxiv.lu.se Andrii Salnikov, JupyterHub@MAX IV, CS3 2024 15

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#