Isolation and Virtualization in Operating Systems

 
Isolation with Namespaces
 
Marion Sudvarg, Chris Gill, Brian Kocoloski, James Orr
CSE 522S – Advanced Operating Systems
Washington University in St. Louis
St. Louis, MO 63130
 
1
Virtualization
 
Virtualization
 refers to the act of creating a
virtual (rather than actual) version of something
 
Examples of virtualization we’ve already seen:
Virtual address spaces (virtual memory) creates illusion
of full access to complete, contiguous system memory
Context switching and scheduling creates illusion of full
access to system CPU(s)
 
Provides 
isolation
 between multiple processes
CSE 522S – Advanced Operating Systems
2
Virtualization
 
Virtual machine: 
emulation of a full computer system
 
Rather than virtualize only specific resources to support
multi-processing, we can 
virtualize the entire platform
 to
support multi-operating systems
 
Why would we want to?
Provide complete computing environments to isolated sets of tasks
Virtual test environments (e.g. OS development for specific
hardware platforms)
Cloud computing (server consolidation + software packaging)
Linux enthusiasts who still can’t decide which distribution is best 
CSE 522S – Advanced Operating Systems
3
Containers
 
Virtual machines provide strong isolation
Each VM hosts its own operating system kernel
Exists in a separate memory space on top of a
hardware virtualization layer
Virtual machines are resource-intensive
Each VM contains its own 
kernel
 
and 
applications
How can we provide self-contained, isolated
execution environments that share a kernel?
Containers (e.g. LXC, Docker)
Leverage kernel mechanisms that provide isolation
Exist above the kernel; VMs exist below the kernel
4
CSE 522S – Advanced Operating Systems
Kernel Isolation Mechanisms
 
chroot
: early mechanism that isolates a
process to a subset of the directory
hierarchy
cgroups
: Linux kernel mechanism that
isolates resource usage (presented later)
Namespaces
Extend the idea of virtualization to other
resources
Creates illusion that groups of processes share
an isolated instance of a global resource
5
CSE 522S – Advanced Operating Systems
 
chroot
 
Every process has a root directory 
/
By default, this is the root of the VFS unified hierarchy
The 
chroot() 
system call changes the root of the
calling process
Its children then inherit the new root – useful for
fork() 
and 
exec()
The new environment is known as a “chroot jail”
Early Unix isolation technique (introduced in 1979)
Not a completely secure mechanism. Various ways to
break out of jail (even by unprivileged processes)!
Namespaces provide more complete isolation
 
6
 
CSE 522S – Advanced Operating Systems
chroot
Every process has a root directory 
/
By default, this is the root of the VFS unified hierarchy
The 
chroot() 
system call changes the root of the
calling process
Its children then inherit the new root – useful for
fork() 
and 
exec()
The new environment is known as a “chroot jail”
Early Unix isolation technique (introduced in 1979)
Not a completely secure mechanism. Various ways to
break out of jail (even by unprivileged processes)!
Namespaces provide more complete isolation
7
CSE 522S – Advanced Operating Systems
 
Namespace Types
 
UTS
: Hostname
IPC
: System V IPC and POSIX message queues
Mount
: The set of filesystem mount points
PID
: Process ID number space
User
: User and group ID number spaces (next time)
cgroups
: View of configured cgroups (discussed later)
Network
: Networking system resources (discussed
later with container networking)
 
8
 
CSE 522S – Advanced Operating Systems
UTS Namespaces
 
UNIX Time-Sharing Namespaces
Allow a group of processes to present a
different 
hostname
 
 
 
 
Enables a container to present as a separate
host or server
9
CSE 522S – Advanced Operating Systems
pihost001.cec.wustl.edu
IPC Namespaces
 
Isolates 
certain
 IPC mechanisms
System V message queues
POSIX message queues
System V semaphores
System V shared memory (different than POSIX shared
memory covered in CSE 422!)
 
What about other IPC we’ve discussed? Isolated by
other
 namespace types or techniques.
Pipes: isolated by access to underlying file descriptor
FIFOs/UNIX domain sockets/POSIX shared memory: isolated
by access to associated file
Internet domain sockets: isolated by network namespaces
10
CSE 522S – Advanced Operating Systems
Creating Namespaces
11
CSE 522S – Advanced Operating Systems
 
clone(…,…,CLONE_NEWUTS,…)
 
sethostname(domain2,len)
 
CLONE_NEWIPC
CLONE_NEWNS
 (mount)
CLONE_NEWPID
CLONE_NEWCGROUP
CLONE_NEWNET
 (network)
CLONE_NEWUSER
clone() 
creates (forks) new process with optional flags to specify new
namespace (all settings cloned)
unshare() 
allows caller to separate into a new namespace (again, all
settings cloned)
 
Joining a Namespace
 
12
 
CSE 522S – Advanced Operating Systems
 
The proc filesystem contains information about namespace
membership for each process:
 
 
 
 
 
Each file in the /proc/PID/ns directory is a special symlink to a file
corresponding to the namespace
Get a file descriptor by 
open()
ing the file
Passing the fd to 
setns() 
allows caller to associate with that
namespace
 
PID Namespaces
 
Isolate the set of 
P
rocess 
ID
s
Nested – all are descendants of the original
PID namespace
A process has a PID in its namespace, and
each ancestor of its namespace
unshare() 
and 
setns() 
place caller’s
children
, but 
not
 
the caller, into
new/specified PID namespace
New proc filesystem can be mounted by a
process/shell in new PID namespace:
mount -t proc proc /mount_point
 
13
 
CSE 522S – Advanced Operating Systems
 
CLONE_NEWPID
PID NS0: 37
PID NS0: 38
PID NS1: 1
 
CLONE_NEWPID
PID NS0: 39
PID NS1: 2
PID NS2: 1
 
The 
init
 Process
 
First process in a new PID namespace has PID 1
Becomes the “
init
” process for that namespace
Only responds to signals for which it has established a handler
Processes in its ancestor namespaces can additionally send it 
SIGKILL
and 
SIGSTOP 
(given appropriate permissions)
If it receives 
SIGKILL
, its children are also killed
Becomes parent to orphaned processes in its namespace
Reaps terminated children (and terminated orphans)
Note: behavior of 
unshare() 
and 
setns()
 imply that a process other
than 
init 
might have a parent outside its PID namespace
 
14
 
CSE 522S – Advanced Operating Systems
 
Mount Namespaces
 
Allow processes to have a unique view of the
global directory hierarchy
Unlike chroot, this is not restricted to just a
subtree of the global hierarchy
chroot: 
Requested file paths are “appended” to
the chroot root directory.
Mount Namespaces: 
Requested file paths are
accessed based on that namespaces mount points
A process’s mount points are enumerated in
/proc/PID/mountinfo
 
15
 
CSE 522S – Advanced Operating Systems
Shared Subtrees and Peer Groups
Shared subtrees allow propagation of mount/unmount events between namespaces
Peer groups are a set of mount points related through namespace replication or
binding
16
CSE 522S – Advanced Operating Systems
 
A mount point is marked with propagation type:
MS_SHARED
: Shares events with other mount points in its peer group
MS_PRIVATE
: Does not share events
MS_SLAVE
: Receives from, but does not propagate to, its peer group
MS_UNBINDABLE
: Private, and can’t be the source for a bind mount
Allow containers to share mount points (e.g. shared storage, USB/CD-ROM device)
 
mount --make-private /
mount --make-shared /dev/sda3 /X
mount --make-shared /dev/sda5 /Y
unshare -m
mkdir /Z
mount –bind /X /Z
Mount NS 0
/
X
Z
Y
Peer Group 1
Mount NS 1
/
X*
Y*
Peer Group 2
 
Bind Mount
Mount Namespaces and Containers
 
Mounting a different device (or bind-mounting a directory) to the root
mount point “/” provides an isolated filesystem to a container.
mount(“/containers/c01/files”,“/”,“”,MS_BIND,0)
Additional necessary OS files and directories (e.g. devices, libraries, etc.)
are bind-mounted into the container’s mount namespace
17
CSE 522S – Advanced Operating Systems
 
From 
Docker: Up & Running: Shipping Reliable Containers in Production
, 2nd Edition by Sean P. Kane & Karl Matthias, 2018, pg. 71
 
A view of a Docker container’s
contents. Zero-byte files are required
for Linux containers and will be bind-
mounted from the host to the
container’s mount namespace.
Combining Mount and PID Namespaces
 
We’ve already seen that the corresponding proc filesystem can
be mounted from a new PID namespace:
mount -t proc proc /mount_point
But what if we want to mount it to /proc?
(necessary for 
ps, top, htop
)
If a process from a new PID namespace mounts to /proc, this
affects the entire system
Answer: use a mount namespace!
Creating a new mount namespace in conjunction with a PID
namespace allows a process in that namespace to unmount the
existing /proc filesystem and mount a new one
Only processes in that namespace will see the change!
18
CSE 522S – Advanced Operating Systems
 
Reading Assignments
 
LSP pp. 15-16: A brief review of filesystems and namespaces
LKD pp. 285-288: A review of VFS data structures – pay attention to how the
mnt_namespace structure is used
man 2 chroot: Coverage of isolation with the chroot() syscall
LWN Namespaces Series: An overview of namespaces from the author of LPI. Read the
following parts:
Part 1: Namespaces Overview
Part 2: The Namespaces API
Part 3: PID Namespaces
Part 4: More on PID Namespaces
Mount Namespaces and Shared Subtrees
Mount Namespaces, Mount Propagation, and Unbindable Mounts
man 2 unshare: Coverage of the unshare() syscall with examples
man 2 setns: Coverage of the setns() syscall with examples
man 8 umount: Coverage of the umount utility to unmount filesystems and bind mounts
man 2 umount: Coverage of the underlying umount system call
man 2 pivot_root: Coverage of the pivot_root() system call for setting a new root mount
We provide condensed PDFs focusing on relevant sections of these man pages:
man 2 clone: Coverage of the clone() syscall, with attention to namespaces
man 2 mount: Special attention to the section on changing propagation type
man 8 mount: Special attention to the section on shared subtree operations
 
19
 
CSE 522S – Advanced Operating Systems
 
Studio Exercises Today
 
Create new namespaces
clone()
, 
unshare()
Join an existing namespace
Open a 
/proc/PID/ns 
symbolic link
Join with 
setns()
UTS namespaces
sethostname()
, 
gethostname()
PID namespace
Create a simple 
init
 process
Mount namespaces
Isolate the 
/proc 
mount for a new PID namespace
 
Put this all together into a simple container
environment!
 
20
 
CSE 522S – Advanced Operating Systems
Slide Note
Embed
Share

This text delves into the concepts of isolation and virtualization in operating systems. It covers topics such as virtual memory, virtual machines, containers, and kernel isolation mechanisms like chroot and cgroups. The discussion explores how these techniques provide isolation between processes, create virtual environments, and manage resources efficiently. By leveraging these technologies, operating systems can offer secure and efficient execution environments for various tasks.

  • Operating Systems
  • Isolation
  • Virtualization
  • Containers
  • Kernel Mechanisms

Uploaded on Jul 10, 2024 | 5 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Isolation with Namespaces Marion Sudvarg, Chris Gill, Brian Kocoloski, James Orr CSE 522S Advanced Operating Systems Washington University in St. Louis St. Louis, MO 63130 1

  2. Virtualization Virtualization refers to the act of creating a virtual (rather than actual) version of something Examples of virtualization we ve already seen: Virtual address spaces (virtual memory) creates illusion of full access to complete, contiguous system memory Context switching and scheduling creates illusion of full access to system CPU(s) Provides isolation between multiple processes CSE 522S Advanced Operating Systems 2

  3. Virtualization Virtual machine: emulation of a full computer system Rather than virtualize only specific resources to support multi-processing, we can virtualize the entire platform to support multi-operating systems Why would we want to? Provide complete computing environments to isolated sets of tasks Virtual test environments (e.g. OS development for specific hardware platforms) Cloud computing (server consolidation + software packaging) Linux enthusiasts who still can t decide which distribution is best CSE 522S Advanced Operating Systems 3

  4. Containers Virtual machines provide strong isolation Each VM hosts its own operating system kernel Exists in a separate memory space on top of a hardware virtualization layer Virtual machines are resource-intensive Each VM contains its own kerneland applications How can we provide self-contained, isolated execution environments that share a kernel? Containers (e.g. LXC, Docker) Leverage kernel mechanisms that provide isolation Exist above the kernel; VMs exist below the kernel CSE 522S Advanced Operating Systems 4

  5. Kernel Isolation Mechanisms chroot: early mechanism that isolates a process to a subset of the directory hierarchy cgroups: Linux kernel mechanism that isolates resource usage (presented later) Namespaces Extend the idea of virtualization to other resources Creates illusion that groups of processes share an isolated instance of a global resource Leveraged by Containers CSE 522S Advanced Operating Systems 5

  6. chroot Every process has a root directory / By default, this is the root of the VFS unified hierarchy The chroot() system call changes the root of the calling process Its children then inherit the new root useful for fork() and exec() The new environment is known as a chroot jail Early Unix isolation technique (introduced in 1979) Not a completely secure mechanism. Various ways to break out of jail (even by unprivileged processes)! Namespaces provide more complete isolation CSE 522S Advanced Operating Systems 6

  7. chroot Every process has a root directory / By default, this is the root of the VFS unified hierarchy The chroot() system call changes the root of the calling process Its children then inherit the new root useful for fork() and exec() The new environment is known as a chroot jail Early Unix isolation technique (introduced in 1979) Not a completely secure mechanism. Various ways to break out of jail (even by unprivileged processes)! Namespaces provide more complete isolation CSE 522S Advanced Operating Systems 7

  8. Namespace Types UTS: Hostname IPC: System V IPC and POSIX message queues Mount: The set of filesystem mount points PID: Process ID number space User: User and group ID number spaces (next time) cgroups: View of configured cgroups (discussed later) Network: Networking system resources (discussed later with container networking) CSE 522S Advanced Operating Systems 8

  9. UTS Namespaces UNIX Time-Sharing Namespaces Allow a group of processes to present a different hostname pihost001.cec.wustl.edu c92.containers.wustl.edu pid 1 pid 2 pid 3974 pid 4109 pid 4110 pid 3985 pid 3985 pid 4235 pid 5972 Enables a container to present as a separate host or server CSE 522S Advanced Operating Systems 9

  10. IPC Namespaces Isolates certain IPC mechanisms System V message queues POSIX message queues System V semaphores System V shared memory (different than POSIX shared memory covered in CSE 422!) What about other IPC we ve discussed? Isolated by other namespace types or techniques. Pipes: isolated by access to underlying file descriptor FIFOs/UNIX domain sockets/POSIX shared memory: isolated by access to associated file Internet domain sockets: isolated by network namespaces CSE 522S Advanced Operating Systems 10

  11. Creating Namespaces clone() creates (forks) new process with optional flags to specify new namespace (all settings cloned) unshare() allows caller to separate into a new namespace (again, all settings cloned) domain1 domain1 clone( , ,CLONE_NEWUTS, ) CLONE_NEWIPC CLONE_NEWNS (mount) CLONE_NEWPID CLONE_NEWCGROUP CLONE_NEWNET (network) CLONE_NEWUSER pid 37 pid 38 sethostname(domain2,len) domain2 pid 38 CSE 522S Advanced Operating Systems 11

  12. Joining a Namespace The proc filesystem contains information about namespace membership for each process: Each file in the /proc/PID/ns directory is a special symlink to a file corresponding to the namespace Get a file descriptor by open()ing the file Passing the fd to setns() allows caller to associate with that namespace CSE 522S Advanced Operating Systems 12

  13. PID Namespaces Isolate the set of Process IDs PID NS0: 37 Nested all are descendants of the original PID namespace CLONE_NEWPID A process has a PID in its namespace, and each ancestor of its namespace PID NS0: 38 PID NS1: 1 unshare() and setns() place caller s children, but notthe caller, into new/specified PID namespace CLONE_NEWPID PID NS0: 39 PID NS1: 2 PID NS2: 1 New proc filesystem can be mounted by a process/shell in new PID namespace: mount -t proc proc /mount_point CSE 522S Advanced Operating Systems 13

  14. The init Process First process in a new PID namespace has PID 1 Becomes the init process for that namespace Only responds to signals for which it has established a handler Processes in its ancestor namespaces can additionally send it SIGKILL and SIGSTOP (given appropriate permissions) If it receives SIGKILL, its children are also killed Becomes parent to orphaned processes in its namespace Reaps terminated children (and terminated orphans) Note: behavior of unshare() and setns() imply that a process other than init might have a parent outside its PID namespace CSE 522S Advanced Operating Systems 14

  15. Mount Namespaces Allow processes to have a unique view of the global directory hierarchy Unlike chroot, this is not restricted to just a subtree of the global hierarchy chroot: Requested file paths are appended to the chroot root directory. Mount Namespaces: Requested file paths are accessed based on that namespaces mount points A process s mount points are enumerated in /proc/PID/mountinfo CSE 522S Advanced Operating Systems 15

  16. Shared Subtrees and Peer Groups Shared subtrees allow propagation of mount/unmount events between namespaces Peer groups are a set of mount points related through namespace replication or binding Mount NS 1 / Mount NS 0 / mount --make-private / mount --make-shared /dev/sda3 /X Z X* Y* X Y mount --make-shared /dev/sda5 /Y unshare -m Bind Mount mkdir /Z mount bind /X /Z Peer Group 2 Peer Group 1 A mount point is marked with propagation type: MS_SHARED: Shares events with other mount points in its peer group MS_PRIVATE: Does not share events MS_SLAVE: Receives from, but does not propagate to, its peer group MS_UNBINDABLE: Private, and can t be the source for a bind mount Allow containers to share mount points (e.g. shared storage, USB/CD-ROM device) CSE 522S Advanced Operating Systems 16

  17. Mount Namespaces and Containers Mounting a different device (or bind-mounting a directory) to the root mount point / provides an isolated filesystem to a container. mount( /containers/c01/files , / , ,MS_BIND,0) Additional necessary OS files and directories (e.g. devices, libraries, etc.) are bind-mounted into the container s mount namespace A view of a Docker container s contents. Zero-byte files are required for Linux containers and will be bind- mounted from the host to the container s mount namespace. From Docker: Up & Running: Shipping Reliable Containers in Production, 2nd Edition by Sean P. Kane & Karl Matthias, 2018, pg. 71 CSE 522S Advanced Operating Systems 17

  18. Combining Mount and PID Namespaces We ve already seen that the corresponding proc filesystem can be mounted from a new PID namespace: mount -t proc proc /mount_point But what if we want to mount it to /proc? (necessary for ps, top, htop) If a process from a new PID namespace mounts to /proc, this affects the entire system Answer: use a mount namespace! Creating a new mount namespace in conjunction with a PID namespace allows a process in that namespace to unmount the existing /proc filesystem and mount a new one Only processes in that namespace will see the change! CSE 522S Advanced Operating Systems 18

  19. Reading Assignments LSP pp. 15-16: A brief review of filesystems and namespaces LKD pp. 285-288: A review of VFS data structures pay attention to how the mnt_namespace structure is used man 2 chroot: Coverage of isolation with the chroot() syscall LWN Namespaces Series: An overview of namespaces from the author of LPI. Read the following parts: Part 1: Namespaces Overview Part 2: The Namespaces API Part 3: PID Namespaces Part 4: More on PID Namespaces Mount Namespaces and Shared Subtrees Mount Namespaces, Mount Propagation, and Unbindable Mounts man 2 unshare: Coverage of the unshare() syscall with examples man 2 setns: Coverage of the setns() syscall with examples man 8 umount: Coverage of the umount utility to unmount filesystems and bind mounts man 2 umount: Coverage of the underlying umount system call man 2 pivot_root: Coverage of the pivot_root() system call for setting a new root mount We provide condensed PDFs focusing on relevant sections of these man pages: man 2 clone: Coverage of the clone() syscall, with attention to namespaces man 2 mount: Special attention to the section on changing propagation type man 8 mount: Special attention to the section on shared subtree operations CSE 522S Advanced Operating Systems 19

  20. Studio Exercises Today Create new namespaces clone(), unshare() Join an existing namespace Open a /proc/PID/ns symbolic link Join with setns() UTS namespaces sethostname(), gethostname() PID namespace Create a simple init process Mount namespaces Isolate the /proc mount for a new PID namespace Put this all together into a simple container environment! CSE 522S Advanced Operating Systems 20

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#