Re-Animator: Versatile High-Fidelity Storage System Tracing and Replaying

Slide Note
Embed
Share

Re-Animator is a system for capturing and replaying system calls that aims to benchmark storage systems, analyze application characteristics, and reproduce bugs. It addresses challenges in capturing accurate information, data buffers, overheads, replay tools, trace formats, and offline analysis. With design goals focusing on benchmarking, correctness, minimizing interference, scalability, verifiability, and portability, Re-Animator offers ease of use, extensibility, and integration with other tracing tools.


Uploaded on Oct 08, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying 13thACM International Systems and Storage Conference (SYSTOR 2020) Ibrahim Umit Akgun1, Geoff Kuening2, Erez Zadok1 1Stony Brook University; 2Harvey Mudd College 1 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  2. Motivation (1 of 2) Why do we need system-call tracing & replaying? Benchmarking the applications for the different HW/SW configurations Revealing the hidden patterns during the application execution Finding performance bottlenecks Analyzing characteristics of the applications E.g., system call use, security vulnerabilities Reproducing bugs 2 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  3. Motivation (2 of 2) What are the problems in the system-call tracing & replaying systems? Hard to capture accurate information and verify it Lack of data capturing (raw data buffers) E.g., System-calls like read, write, stat, getdents High overheads No scalable and versatile replay tool No portable trace format No offline analysis tools 3 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  4. Design Goals (1 of 2) Capturing & replaying system-calls for the purpose of the storage system benchmarking Replaying Timing Logical file system state correctness Minimize overhead & Fidelity Adding minimum interference Blocking mode Memory mapped files 4 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  5. Design Goals (2 of 2) Scalability Design to capture long running applications Design to replay big system-call traces Tested with hundreds of gigabyte traces Verifiability Return values Buffer contents (including mmap) Logical file system state verification Portability DataSeries -- Supporting versatile tools for analyzing and converting Ease of use and extensibility Re-animator can be integrated with any other tracing tool Easy to extend for supporting un-implemented system-calls 5 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  6. Fidelity RA-LTTng Re-Animator Tracer 1 2 configuration pgid #sub-buffers 7 Application LTTng session daemon LTTng consumer daemon 3 async User sub-buffers 5 1 3 2 . Kernel async allocates Re-Animator kernel module integration record-id syscall args, 4 record-id, buffer_content Linux Kernel system call tracepoints LTTng kernel modules async 6 Captured buffers CTF Linux Common Trace Format 6 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  7. Low Overhead RA-LTTng Re-Animator Tracer 1 2 configuration 7 Application LTTng session daemon LTTng consumer daemon 3 User sub-buffers 5 1 3 2 . Kernel Re-Animator kernel module integration 4 Linux Kernel system call tracepoints LTTng kernel modules 6 Captured buffers CTF 7 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  8. Re-Animator Tracer configuration Application LTTng session daemon LTTng consumer daemon Low Overhead RA-LTTng 1 2 3 . User sub-buffers Kernel Re-Animator kernel module integration 4 Linux Kernel system call tracepoints LTTng kernel modules 6 Captured buffers CTF zoom in workqueue 8 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  9. Verifiable & Scalable Replayer Verifiable Check return values and buffer contents Limitations atime, mtime, ctime procfs (e.g., /sys/block/sda/sda1/stat) FD table consistency check Scalable Supporting multiple processes User space FD tables Concurrent lock-free design 9 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  10. Evaluation - LevelDB Overhead is 9% 350 High overhead 9.7 300 250 ms/op 200 Applicationbecomes I/O bound 10.5 150 100 50 0 10 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  11. Evaluation Blocking mode User System Elapsed 169 180 Time (seconds) 150 115 120 94 91 80 90 68 65 66 47 60 30 0 1MB 1.72 26788 80 91 2MB 1.22 3794 68 65 115 4MB 1.00 53 66 47 94 Overhead(X) #Yields User System Elapsed Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020) 169 11 October 13, 2020

  12. Evaluation HDD vs. NVMe 35 mmap enabled mmap disabled 31.8 Time (minutes) 30 25 22.4 17.7 20 16.5 16.4 15.3 13.5 14.1 13.6 12.8 15 10 5 0 vanilla hdd-data hdd-nodata nvme-data nvme-nodata vanilla hdd-data hdd-nodata nvme-data nvme-nodata 12 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  13. Conclusion Tracing and replaying Infrastructure Key design ideas: Fidelity Long-running applications Accuracy Minimizing overheads Scalable & verifiable Portable 13 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  14. Re-Animator: Versatile High-Fidelity System-Call Tracing and Replaying Q&A 13th ACM International Systems and Storage Conference (SYSTOR 2020) Ibrahim Umit Akgun1, Geoff Kuening2, Erez Zadok1 1Stony Brook University; 2Harvey Mudd College Source code: https://github.com/sbu-fsl/fsl-lttng, https://github.com/sbu-fsl/fsl-strace 14 October 13, 2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  15. Portable DataSeries Versatile Ease of use Includes a set of useful DS tools Capturing as many system-calls as possible Enables other research opportunities Open source Major revision to SNIA system-call format design document 15 10/13/2020 Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying (ACM SYSTOR 2020)

  16. Testbed & Benchmarks Intel Xeon quad-core (8 hardware threads) 4GB RAM (configured in total 24 GB) 148 GB SAS 15400 RPM boot drive 200 GB SSD test drive 500 GB SAS 7200 RPM trace drive MZ1LV960HCJH-000MU 960GB M.2 NVMe Micro-benchmarks FIO, Filebench Macro-benchmarks LevelDB dbbench MySQL sysbench Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying ACM SYSTOR 2020 16 9/29/2020

  17. Evaluation FIO sequential read 14 14 Vanilla elapsed Vanilla user Vanilla system RA-LTTng elapsed RA-LTTng user RA-LTTng system 12 12 Run Time (minutes) Run Time (minutes) 10 10 Low overhead (18%) Tracing overhead including buffer Capturing (4 ) 8 8 6 6 4 4 2 2 1.3 1.1 1.1 1.1 0.8 0.8 0.2 0.2 0.2 0.2 0.2 0.2 0 0 1 8 1 8 Number of Threads Number of Threads 14 14 Strace elapsed Strace user Strace system RA-Strace elapsed RA-Strace user RA-Strace system time difference between 1 and 8 Threads (~40%) 12.0 12 12 Run Time (minutes) Run Time (minutes) 10 10 8.3 8.1 8 8 6.4 5.6 5.4 6 6 4.8 4.6 4.4 4.2 3.8 3.5 4 4 2 2 0 0 1 8 1 8 Number of Threads Number of Threads Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying ACM SYSTOR 2020 17 9/29/2020

  18. Evaluation - MySQL -1.8 -22E3 38.58M -3E3 100,000,000 21.16M 5.29M 2.41M 1.32M 1,000,000 0.33M Count (log10) 10,000 1,696 106 100 1 Vanilla Strace RA-Strace RA-LTTng Total Queries Transactions Re-Animator: Versatile High-Fidelity Storage-System Tracing and Replaying ACM SYSTOR 2020 18 9/29/2020

More Related Content