Enhancing Userspace NVM File Systems with Trio Architecture
This study explores the utilization of LibFS as an efficient solution for enabling high-performance and secure userspace NVM file systems. LibFS offers direct access to NVMs, minimizes software overhead, allows unprivileged customization, supports private customization for individual applications, and faces challenges in secure sharing mechanisms. The research highlights the benefits, ideal fit, customization options, and security concerns associated with LibFS implementation in NVM file systems.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Enabling High-Performance and Secure Userspace NVM File Systems with the Trio Architecture Diyu Zhou, Vojtech Aschenbrenner, Tao Lyu, Jian Zhang, Sudarsun Kannan, and Sanidhya Kashyap 1
Library file systems for NVM Application Application User Kernel File System OS LibFS NVM LibFS NVM accessible through ordinary load/store Library file systems 2
LibFS is an ideal fit for NVM: direct access Application LibFS Application Direct access NVM s access latency: 100s of ns Must minimize software overhead User LibFS Kernel OS NVM 3
LibFS is an ideal fit for NVM: customization APP LibFS APP LibFS Direct access User Customization Kernel File system NVM Generic FS: good overall performance Extremely high performance of NVM App-specific design to exploit NVM 4
LibFS enables unprivileged customization APP LibFS APP LibFS Direct access User Unprivileged customization No special privilege needed for custom. All applications can benefit Kernel File system NVM 5
LibFS enables private customization User APP APP Direct access File system Unprivileged Private customization Kernel OS OS Customization for one APP slows another NVM 6
LibFS enables private customization APP LibFS APP LibFS Direct access User Unprivileged Private customization Kernel OS NVM Customization for one APP slows another Private LibFS for each application 7
Key challenge for LibFS: secure sharing APP LibFS APP LibFS Direct access User Unprivileged Private customization Kernel OS NVM Attack Secure sharing Malicious APPs or LibFSes corrupt shared state 8
Requirements to achieve secure sharing Enforce access permission Preserve metadata invariants Owner: rwx Group: r-x Others: r-- MMU Trusted entity 9
Design goals of the Trio architecture Maximizing the degree of direct access Unprivileged and private customization Secure sharing among malicious applications 10
Tensions in direct access vs. secure sharing Direct Access Secure Sharing Application LibFS Kernel NVM When to validate the metadata invariants? 11
Tensions in customization vs. secure sharing Custom- ization Secure Sharing Application LibFS Application LibFS Trusted entity How to validate the metadata invariants? 12
Existing NVM LibFS: metadata mediation Direct access Metadata updates mediated Application LibFS Metadata update Unpriv. private customization Need to change trusted entity Require special privilege Affect all applications read() write() Trusted entity NVM Secure sharing 13
Existing NVM LibFS: direct access Direct access Application Application Unpriv. private customization LibFS LibFS changes affect all apps Attack NVM Secure sharing Applications are trusted Mitigate the attack 14
Obstacle: maintaining file system abstraction App File system App Abstraction Design App Trusted entity FS state App App App LibFS FS state Root cause: The whole FS state maintained in one component 15
Key insight: file system state partitioning Supplementary functionalities (e.g., cache) Auxiliary state FS state Can be rebuilt from the core state Essential for FS integrity/functionality Core state Cannot recover if lost 16
Key insight: file system state partitioning File descriptors Caches Supplementary functionalities; can be rebuilt Performance FS state Locks Bitmaps Auxiliary state Core state Permissions Essential for correctness; must not be lost Security Directory entries File contents 17
Components in the Trio architecture Kernel Controller Verifier LibFS File system Access permission Metadata invariants 18
Shared core state and private auxiliary state Kernel Controller Verifier LibFS Aux. state Aux. state Aux. state Data Layout Data structures Core state 19
State partitioning enables direct access Application Direct access First access: map file s core state Build auxiliary state from core state Direct access afterwards LibFS read write build MMU map( /foo ) setup /foo Kernel Controller Core state Aux. state 20
State partitioning enables customization Application LibFS Application LibFS Direct access Unpriv. private customization Only LibFSes implement FS LibFSes private to applications Maximizing custom. benefits Kernel Controller Verifier Core state Aux. state 21
State partitioning resolves design tensions Issue: when to validate the metadata invariants? Solution: concurrent read or exclusive write to a file Validate invariants only upon sharing a modified file Issue: how to validate the metadata invariants? Solution: verifier understands data structure of the core state Validates only the core state 22
State partitioning enables secure sharing No corruption! Corruption! Direct access Application Application Verifier /foo read write Unpriv. private customization LibFS LibFS build /foo /foo Fix Secure sharing Invalidation only upon file sharing Only validate the core state /foo /foo Kernel Controller Core state Aux. state 23
ArckFS: Trio-based POSIX-like file system Designing core state and auxiliary state Handling POSIX file system operations Detecting and fixing metadata corruptions Ensuring crash consistency Adapting to Intel Optane persistent memory 24
ArckFS: Trio-based POSIX-like file system Designing core state and auxiliary state Handling POSIX file system operations Detecting and fixing metadata corruptions Ensuring crash consistency Adapting to Intel Optane persistent memory 25
Layout of the core state Simple core state Maximizing the degree of customization Minimizing metadata validation overhead Access permissions General info Shadow inode table Super block File pages: files, directories Core state 26
A directorys core state and auxiliary state Hash table a.txt Auxiliary state Multicore scalability Scalable data structure: radix tree/hash table Multiple tails for logging Fine grained locking Aux. state 83d6e c25f7 Data pages Index pages Core state customization & validation File = index pages + data pages Index pages: addresses of data pages Data pages: file content or directory entries Core state tail a.txt b.txt tail tail 27
A customized LibFS for small files: KVFS fd = open(file_name, ); read(fd, buf, size); close(fd); get(file_name, buf, size); Overhead: syscall & File descriptor ArckFS KVFS POSIX interfaces Aux. state GET/SET interfaces File descriptor Per-dir hash table Per-dir hash table Core state 28
Performance evaluation and baselines Does Trio improve NVM file system performance? NOVA: kernel file system SplitFS: metadata mediation overhead User Application LibFS Application Metadata OPs Kernel File system read() write() NVM Kernel NVM 29
Microbenchmark: basic file system operations 4KB read Create an empty file 4000 800 ArckFS: similar to SplitFS 12% faster than NOVA Direct access Throughput (ops/ms) 3000 600 Direct access Kernel 2000 400 6.2X 200 1000 Kernel Kernel 0 0 NOVA SplitFS ArckFS NOVA SplitFS ArckFS Direct access maximizes NVM file system performance 30
Multicore scalability of metadata operations Read-intensive (Webproxy) Write-intensive (Varmail) 2000 2500 Throughput (ops/ms) ArckFS 2000 1500 NOVA 2.4x 2.2x 1500 SplitFS 1000 1000 500 500 kernel bottleneck 0 0 1 2 4 8 16 1 2 4 8 16 #threads #threads Trio enables scalability through kernel bypass 31
Macrobenchmark: Customization Webproxy 2000 1.3X Throughput (ops/ms) 1500 1000 500 0 NOVA ArckFS KVFS Customization enabled by Trio can further improve performance 32
Conclusion Desired properties of library file systems for NVM Direct access Unprivileged private customization Secure sharing Trio: partition FS state to core/auxiliary state to achieve all the properties Core state for security Essential for FS correctness Shared among all components Auxiliary state for performance Rebuilt from core state Private in each component Publicly available: https://github.com/vmexit/trio-sosp-23ae Thank you! 33
Sharing cost Regular file Directory Sharing cost breakdown 1 100000 10000 32200.0 Aux.state 1553.0 Verifier 0.8 10000 Latency( s) 1000 Unmap Ratio 0.6 132.9 1000 Map 308.0 100 0.4 100 10 5.8 0.2 10 0.21.2 0 1 1 10 1000 100000 4K 2M 1G 100G Directory File
Sharing cost with microbenchmark Workload NOVA ArckFS Write 4KB concurrently to a 2MB file 1.92GiB/s 1.90GiB/s Write 4KB concurrently to a 1GB file 1.91GiB/s 0.25GiB/s Create empty files under a dir with 10 entries Create empty files under a dir with 100 entries 8.2 s 8.4 s 7.8 s 32.7 s
Interfaces for LibFSes Operation file_map file_unmap file_check page_alloc page_free inode_alloc inode_free Description Map a file to the LibFS address space. Unmap a file from the LibFS address space. Check the metadata invariants of the specified file. Obtain pages from the kernel controller. Free pages to the kernel controller. Obtain inodes from the kernel controller. Free inodes to the kernel controller.