Understanding Isolation and Virtualization in Operating Systems

Slide Note
Embed
Share

This text delves into the concepts of isolation and virtualization in operating systems. It covers topics such as virtual memory, virtual machines, containers, and kernel isolation mechanisms like chroot and cgroups. The discussion explores how these techniques provide isolation between processes, create virtual environments, and manage resources efficiently. By leveraging these technologies, operating systems can offer secure and efficient execution environments for various tasks.


Uploaded on Jul 10, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Isolation with Namespaces Marion Sudvarg, Chris Gill, Brian Kocoloski, James Orr CSE 522S Advanced Operating Systems Washington University in St. Louis St. Louis, MO 63130 1

  2. Virtualization Virtualization refers to the act of creating a virtual (rather than actual) version of something Examples of virtualization we ve already seen: Virtual address spaces (virtual memory) creates illusion of full access to complete, contiguous system memory Context switching and scheduling creates illusion of full access to system CPU(s) Provides isolation between multiple processes CSE 522S Advanced Operating Systems 2

  3. Virtualization Virtual machine: emulation of a full computer system Rather than virtualize only specific resources to support multi-processing, we can virtualize the entire platform to support multi-operating systems Why would we want to? Provide complete computing environments to isolated sets of tasks Virtual test environments (e.g. OS development for specific hardware platforms) Cloud computing (server consolidation + software packaging) Linux enthusiasts who still can t decide which distribution is best CSE 522S Advanced Operating Systems 3

  4. Containers Virtual machines provide strong isolation Each VM hosts its own operating system kernel Exists in a separate memory space on top of a hardware virtualization layer Virtual machines are resource-intensive Each VM contains its own kerneland applications How can we provide self-contained, isolated execution environments that share a kernel? Containers (e.g. LXC, Docker) Leverage kernel mechanisms that provide isolation Exist above the kernel; VMs exist below the kernel CSE 522S Advanced Operating Systems 4

  5. Kernel Isolation Mechanisms chroot: early mechanism that isolates a process to a subset of the directory hierarchy cgroups: Linux kernel mechanism that isolates resource usage (presented later) Namespaces Extend the idea of virtualization to other resources Creates illusion that groups of processes share an isolated instance of a global resource Leveraged by Containers CSE 522S Advanced Operating Systems 5

  6. chroot Every process has a root directory / By default, this is the root of the VFS unified hierarchy The chroot() system call changes the root of the calling process Its children then inherit the new root useful for fork() and exec() The new environment is known as a chroot jail Early Unix isolation technique (introduced in 1979) Not a completely secure mechanism. Various ways to break out of jail (even by unprivileged processes)! Namespaces provide more complete isolation CSE 522S Advanced Operating Systems 6

  7. chroot Every process has a root directory / By default, this is the root of the VFS unified hierarchy The chroot() system call changes the root of the calling process Its children then inherit the new root useful for fork() and exec() The new environment is known as a chroot jail Early Unix isolation technique (introduced in 1979) Not a completely secure mechanism. Various ways to break out of jail (even by unprivileged processes)! Namespaces provide more complete isolation CSE 522S Advanced Operating Systems 7

  8. Namespace Types UTS: Hostname IPC: System V IPC and POSIX message queues Mount: The set of filesystem mount points PID: Process ID number space User: User and group ID number spaces (next time) cgroups: View of configured cgroups (discussed later) Network: Networking system resources (discussed later with container networking) CSE 522S Advanced Operating Systems 8

  9. UTS Namespaces UNIX Time-Sharing Namespaces Allow a group of processes to present a different hostname pihost001.cec.wustl.edu c92.containers.wustl.edu pid 1 pid 2 pid 3974 pid 4109 pid 4110 pid 3985 pid 3985 pid 4235 pid 5972 Enables a container to present as a separate host or server CSE 522S Advanced Operating Systems 9

  10. IPC Namespaces Isolates certain IPC mechanisms System V message queues POSIX message queues System V semaphores System V shared memory (different than POSIX shared memory covered in CSE 422!) What about other IPC we ve discussed? Isolated by other namespace types or techniques. Pipes: isolated by access to underlying file descriptor FIFOs/UNIX domain sockets/POSIX shared memory: isolated by access to associated file Internet domain sockets: isolated by network namespaces CSE 522S Advanced Operating Systems 10

  11. Creating Namespaces clone() creates (forks) new process with optional flags to specify new namespace (all settings cloned) unshare() allows caller to separate into a new namespace (again, all settings cloned) domain1 domain1 clone( , ,CLONE_NEWUTS, ) CLONE_NEWIPC CLONE_NEWNS (mount) CLONE_NEWPID CLONE_NEWCGROUP CLONE_NEWNET (network) CLONE_NEWUSER pid 37 pid 38 sethostname(domain2,len) domain2 pid 38 CSE 522S Advanced Operating Systems 11

  12. Joining a Namespace The proc filesystem contains information about namespace membership for each process: Each file in the /proc/PID/ns directory is a special symlink to a file corresponding to the namespace Get a file descriptor by open()ing the file Passing the fd to setns() allows caller to associate with that namespace CSE 522S Advanced Operating Systems 12

  13. PID Namespaces Isolate the set of Process IDs PID NS0: 37 Nested all are descendants of the original PID namespace CLONE_NEWPID A process has a PID in its namespace, and each ancestor of its namespace PID NS0: 38 PID NS1: 1 unshare() and setns() place caller s children, but notthe caller, into new/specified PID namespace CLONE_NEWPID PID NS0: 39 PID NS1: 2 PID NS2: 1 New proc filesystem can be mounted by a process/shell in new PID namespace: mount -t proc proc /mount_point CSE 522S Advanced Operating Systems 13

  14. The init Process First process in a new PID namespace has PID 1 Becomes the init process for that namespace Only responds to signals for which it has established a handler Processes in its ancestor namespaces can additionally send it SIGKILL and SIGSTOP (given appropriate permissions) If it receives SIGKILL, its children are also killed Becomes parent to orphaned processes in its namespace Reaps terminated children (and terminated orphans) Note: behavior of unshare() and setns() imply that a process other than init might have a parent outside its PID namespace CSE 522S Advanced Operating Systems 14

  15. Mount Namespaces Allow processes to have a unique view of the global directory hierarchy Unlike chroot, this is not restricted to just a subtree of the global hierarchy chroot: Requested file paths are appended to the chroot root directory. Mount Namespaces: Requested file paths are accessed based on that namespaces mount points A process s mount points are enumerated in /proc/PID/mountinfo CSE 522S Advanced Operating Systems 15

  16. Shared Subtrees and Peer Groups Shared subtrees allow propagation of mount/unmount events between namespaces Peer groups are a set of mount points related through namespace replication or binding Mount NS 1 / Mount NS 0 / mount --make-private / mount --make-shared /dev/sda3 /X Z X* Y* X Y mount --make-shared /dev/sda5 /Y unshare -m Bind Mount mkdir /Z mount bind /X /Z Peer Group 2 Peer Group 1 A mount point is marked with propagation type: MS_SHARED: Shares events with other mount points in its peer group MS_PRIVATE: Does not share events MS_SLAVE: Receives from, but does not propagate to, its peer group MS_UNBINDABLE: Private, and can t be the source for a bind mount Allow containers to share mount points (e.g. shared storage, USB/CD-ROM device) CSE 522S Advanced Operating Systems 16

  17. Mount Namespaces and Containers Mounting a different device (or bind-mounting a directory) to the root mount point / provides an isolated filesystem to a container. mount( /containers/c01/files , / , ,MS_BIND,0) Additional necessary OS files and directories (e.g. devices, libraries, etc.) are bind-mounted into the container s mount namespace A view of a Docker container s contents. Zero-byte files are required for Linux containers and will be bind- mounted from the host to the container s mount namespace. From Docker: Up & Running: Shipping Reliable Containers in Production, 2nd Edition by Sean P. Kane & Karl Matthias, 2018, pg. 71 CSE 522S Advanced Operating Systems 17

  18. Combining Mount and PID Namespaces We ve already seen that the corresponding proc filesystem can be mounted from a new PID namespace: mount -t proc proc /mount_point But what if we want to mount it to /proc? (necessary for ps, top, htop) If a process from a new PID namespace mounts to /proc, this affects the entire system Answer: use a mount namespace! Creating a new mount namespace in conjunction with a PID namespace allows a process in that namespace to unmount the existing /proc filesystem and mount a new one Only processes in that namespace will see the change! CSE 522S Advanced Operating Systems 18

  19. Reading Assignments LSP pp. 15-16: A brief review of filesystems and namespaces LKD pp. 285-288: A review of VFS data structures pay attention to how the mnt_namespace structure is used man 2 chroot: Coverage of isolation with the chroot() syscall LWN Namespaces Series: An overview of namespaces from the author of LPI. Read the following parts: Part 1: Namespaces Overview Part 2: The Namespaces API Part 3: PID Namespaces Part 4: More on PID Namespaces Mount Namespaces and Shared Subtrees Mount Namespaces, Mount Propagation, and Unbindable Mounts man 2 unshare: Coverage of the unshare() syscall with examples man 2 setns: Coverage of the setns() syscall with examples man 8 umount: Coverage of the umount utility to unmount filesystems and bind mounts man 2 umount: Coverage of the underlying umount system call man 2 pivot_root: Coverage of the pivot_root() system call for setting a new root mount We provide condensed PDFs focusing on relevant sections of these man pages: man 2 clone: Coverage of the clone() syscall, with attention to namespaces man 2 mount: Special attention to the section on changing propagation type man 8 mount: Special attention to the section on shared subtree operations CSE 522S Advanced Operating Systems 19

  20. Studio Exercises Today Create new namespaces clone(), unshare() Join an existing namespace Open a /proc/PID/ns symbolic link Join with setns() UTS namespaces sethostname(), gethostname() PID namespace Create a simple init process Mount namespaces Isolate the /proc mount for a new PID namespace Put this all together into a simple container environment! CSE 522S Advanced Operating Systems 20

Related