Re-Animator: Versatile System Call Tracing and Replaying

Slide Note
Embed
Share

Re-Animator is a research project focusing on creating a high-fidelity system call capturing system with minimized overheads. The project aims to capture long-running applications and provide scalable and verifiable system call replaying. It introduces two prototype system call tracing systems and highlights the motivation behind system call tracing and the challenges faced in existing systems.


Uploaded on Oct 08, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Re-Animator: Versatile High- Fidelity System-Call Tracing and Replaying Ibrahim Umit Akgun Research Proficiency Examination File systems and Storage Lab Dept. of Computer Science Stony Brook University http://www.fsl.cs.stonybrook.edu/ 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 1

  2. Contributions Creating high fidelity system-call capturing system Minimized overheads Capturing long-running applications Scalable, verifiable system-call replayer Two prototype system-call tracing systems 1. Based on ptrace and strace 2. Based on Linux tracepoints and LTTng Open source, portable, extensible Submitted to SOSP 19 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 2

  3. Outline Introduction Related Work Design Design Overview Fidelity Low-overhead Scalable & Verifiable Portable Evaluation Conclusion & Future Work 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 3

  4. Motivation (1 of 2) Why do we need system-call tracing & replaying? Benchmarking the applications for the different HW/SW configurations Revealing the hidden patterns during the application execution Finding performance bottlenecks Analyzing characteristics of the applications E.g., system call use, security vulnerabilities Reproducing bugs 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 4

  5. Motivation (2 of 2) What are the problems in the system-call tracing & replaying systems? Hard to capture accurate information and verify it Lack of data capturing (raw data buffers) E.g., System-calls like read, write, stat, getdents High overheads No scalable and versatile replay tool No portable trace format No offline analysis tools 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 5

  6. Outline Introduction Related Work Design Design Overview Fidelity Low-overhead Scalable & Verifiable Portable Evaluation Conclusion & Future Work 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 6

  7. Related Work Overview Ptrace ptrace, strace, truss, tusc, StraceNT Shared-library interposition CLUSTER 13, //TRACE FAST 07 In-kernel techniques Ktrace, dtrace, sysdig, systemtap Replayer fidelity //TRACE FAST 07, ROOT SOSP 13, hfplayer FAST 17 Scalability Portable trace format 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 7

  8. Related Work (1 of 2) //TRACE FAST 07 Focuses on replaying system calls I/O throttling Requires sampling Library interposer Limitations Requires multiple runs Does not work with statically linked libraries 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 8

  9. Related Work (2 of 2) ROOT SOSP 13 Focuses on replaying system calls Finding inter system-call dependencies Re-ordering system-calls Cross platform (limited) Compiler based approach Limitations Memory bounded Does not capture buffers Can t replay multi-process 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 9

  10. Outline Introduction Related Work Design Design Overview Fidelity Low-overhead Scalable & Verifiable Portable Evaluation Conclusion & Future Work 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 10

  11. Design Goals (1 of 2) Capturing & replaying system-calls for the purpose of the storage system benchmarking Replaying Timing Overlapping system-call execution time Preserve Ordering (e.g., hfplayer FAST 17) On-disk state correctness Deduplication Compression Minimize overhead Adding minimum interference 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 11

  12. Design Goals (2 of 2) Scalability Design to capture long running applications Design to replay big system-call traces Tested with hundreds of gigabyte traces Verifiability Return values Buffer contents On-disk state verification Portability Supporting versatile tools for analyzing and converting Ease of use and extensibility RA-Library can be integrated with any other tracing tool Easy to extend for supporting un-implemented system-calls 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 12

  13. Outline Introduction Related Work Design Design Overview Fidelity Low-overhead Scalable & Verifiable Portable Evaluation Conclusion & Future Work 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 13

  14. Fidelity RA-Strace Strace Strace integration Re-Animator library Application 5 User 4 System call Async process_vm_readv( application_pid, buffer_ptr) write 3 1 signal System call handling layer Kernel 7 6 2 8 Ptrace Data Series Linux Kernel Core 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 14

  15. Fidelity RA-LTTng Re-Animator Tracer 1 2 configuration pgid #sub-buffers 7 Application LTTng session daemon LTTng consumer daemon 3 async User sub-buffers 5 1 3 2 . Kernel async allocates Re-Animator kernel module integration record-id syscall args, 4 record-id, buffer_content Linux Kernel system call tracepoints LTTng kernel modules async 6 Captured buffers CTF 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 15

  16. Outline Introduction Related Work Design Design Overview Fidelity Low-overhead Scalable & Verifiable Portable Evaluation Conclusion & Future Work 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 16

  17. Low-Overhead RA-Strace Efficient multithread memory allocation (TCMalloc) Strace Per-system-call DataSeries object caching Strace integration Re-Animator library Application . . . . User Async write System call handling layer Kernel Per-system-call extent design, multi-threaded writing Ptrace Data Series Linux Kernel Core 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 17

  18. Low Overhead RA-LTTng Re-Animator Tracer 1 2 configuration 7 Application LTTng session daemon LTTng consumer daemon 3 User sub-buffers 5 1 3 2 . Kernel Re-Animator kernel module integration 4 Linux Kernel system call tracepoints LTTng kernel modules 6 Captured buffers CTF 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 18

  19. Re-Animator Tracer configuration Application LTTng session daemon LTTng consumer daemon Low Overhead RA-LTTng 1 2 3 . User sub-buffers Kernel Re-Animator kernel module integration 4 Linux Kernel system call tracepoints LTTng kernel modules 6 Captured buffers CTF zoom in workqueue 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 19

  20. Outline Introduction Related Work Design Design Overview Fidelity Low-overhead Scalable & Verifiable Portable Evaluation Conclusion & Future Work 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 20

  21. Verifiable & Scalable Replayer Verifiable Check return values and buffer contents Limitations atime, mtime, ctime procfs (e.g., /sys/block/sda/sda1/stat) FD table consistency check Scalable Supporting multiple processes User space FD tables Concurrent lock-free design 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 21

  22. RA-Replayer Data Series Concurrent Priority Concurrent Priority . Queue Queue . Batch reading Reader thread Checks for overlapping timelines verifier Execution Execution thread thread . Linux Kernel 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 22

  23. Outline Introduction Related Work Design Design Overview Fidelity Low-overhead Scalability & Verifiable Portable Evaluation Conclusion & Future Work 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 23

  24. Portable DataSeries Versatile Ease of use Includes a set of useful DS tools Integrated with multiple capturing platforms Capturing as many system-calls as possible Enables other research opportunities Open source Major revision to SNIA syscall format design document Plan to release code 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 24

  25. Outline Introduction Related Work Design Design Overview Fidelity Low-overhead Scalability & Verifiable Portable Evaluation Conclusion & Future Work 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 25

  26. Source Code Total ~20,000 lines of C/C++ RA-Strace: 1,783 RA-Library: 3,928 RA-Replayer: 7,760 RA-LTTng kernel: 1,135 LTTng user level tools: 1,624 Babeltrace2ds: 2,347 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 26

  27. Testbed & Benchmarks Intel Xeon quad-core (8 hardware threads) 4GB RAM 148 GB SAS 15400 RPM boot drive 200 GB SSD test drive 500 GB SAS 7200 RPM trace drive Micro-benchmarks FIO, Filebench Macro-benchmarks LevelDB dbbench (mmap disabled) MySQL sysbench 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 27

  28. Evaluation FIO sequential read 14 14 Vanilla elapsed Vanilla user Vanilla system RA-LTTng elapsed RA-LTTng user RA-LTTng system 12 12 Run Time (minutes) Run Time (minutes) 10 10 Low overhead (18%) Tracing overhead including buffer Capturing (4 ) 8 8 6 6 4 4 2 2 1.3 1.1 1.1 1.1 0.8 0.8 0.2 0.2 0.2 0.2 0.2 0.2 0 0 1 8 1 8 Number of Threads Number of Threads 14 14 Strace elapsed Strace user Strace system RA-Strace elapsed RA-Strace user RA-Strace system time difference between 1 and 8 Threads (~40%) 12.0 12 12 Run Time (minutes) Run Time (minutes) 10 10 8.3 8.1 8 8 6.4 5.6 5.4 6 6 4.8 4.6 4.4 4.2 3.8 3.5 4 4 2 2 0 0 1 8 1 8 Number of Threads Number of Threads 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 28

  29. Evaluation - LevelDB High overhead 9.7 900 807.8 Overhead is 9% 800 700 High overhead 16.7 600 463.4 500 ms/op Applicationbecomes I/O bounded 400 318.4 291.9 268.0 300 152.2 130.9 200 50.2 100 31.2 27.6 12.9 9.0 0 RA-Strace 1GB RA-Strace 2GB RA-Strace 4GB RA-Strace 8GB RA-LTTng 1GB RA-LTTng 2GB RA-LTTng 4GB RA-LTTng 8GB Vanilla 1GB Vanilla 2GB Vanilla 4GB Vanilla 8GB 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 29

  30. Evaluation - MySQL -1.8 -22E3 38.58M 100,000,000 21.16M -3E3 5.29M 2.41M 1.32M 1,000,000 0.33M Count (log10) 10,000 1,696 106 100 1 Vanilla Strace RA-Strace RA-LTTng Total Queries Transactions 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 30

  31. Evaluation FIO Micro-Benchmark Replayer 30 Vanilla Replayed 9 Random Read Sequential Read Vanilla Replayed 8 25 7 Run Time (minutes) Run Time (minutes) 6 20 5 15 4 3 10 2 5 1 0 0 elapsed user sys elapsed user sys elapsed user sys elapsed user sys 1 8 1 8 Number of Threads Number of Threads Random Write Sequential Write 12 6 Vanilla Replayed Vanilla Replayed 10 5 Run Time (minutes) Run Time (minutes) 8 4 6 3 4 2 2 1 0 0 elapsed user sys elapsed user sys elapsed user sys elapsed user sys 1 8 1 8 Number of Threads Number of Threads 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 31

  32. Outline Introduction Related Work Design Design Overview Fidelity Low-overhead Scalability & Verifiable Portable Evaluation Conclusion & Future Work 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 32

  33. Conclusion Tracing and replaying Infrastructure Key design ideas: Fidelity Long-running applications Accuracy Minimizing overheads Scalable & verifiable Portable 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 33

  34. Future Work Mmap integration Ktrace integration Offline analysis on long-term traces Making babeltrace multithreaded 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 34

  35. Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying Q&A Ibrahim Umit Akgun Research Proficiency Examination File systems and Storage Lab Dept. of Computer Science Stony Brook University http://www.fsl.cs.stonybrook.edu/ 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 35

  36. Backup Related Work 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 36

  37. DataSeries Architecture 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 37

  38. FIO Random-Read 1,2,4,8 threads 30 30 Runt Time (minutes) Run Time (minutes) 25 25 20 20 15 15 10 10 5 5 0 0 1 2 4 8 1 2 4 8 Number of Threads Number of Threads Vanilla Elapsed Vanilla user Vanilla kernel LTTng Elapsed LTTng user LTTng kernel 30 30 25 25 Run Time (minutes) Run Time(minutes) 20 20 15 15 10 10 5 5 0 0 1 2 4 8 1 2 4 8 Number of Threads Number of Threads Strace Elapsed Strace user Strace kernel FSL-Strace Elapsed FSL-Strace user FSL-Strace kernel 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 38

  39. Replayer Random Read 30 Vanilla Replayed 24.3 25 Run Time (minutes) 19.0 20 18.2 15 12.7 10 7.8 7.6 7.0 6.5 4.9 3.9 5 3.1 2.6 2.4 2.3 2.3 2.1 1.9 1.0 0.8 0.7 0.5 0.5 0.5 0.4 0 elapsed user sys elapsed user sys elapsed user sys elapsed user sys 1 2 4 8 Number of Threads 5/20/2019 Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying (RPE) 39

Related


More Related Content