Accelerating Critical OS Services in Virtualized Systems

Slide Note
Embed
Share

The research discusses accelerating critical OS services in virtualized systems using flexible micro-sliced cores. Challenges in server consolidation, tackling virtual time discontinuity, shortening time slice, and dividing CPUs into pools are highlighted. The approach involves serving critical OS services to minimize waiting time while handling the main work of applications. Key challenges include precise task detection, guest OS transparency, and dynamic adjustment of micro-sliced cores.


Uploaded on Oct 09, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Accelerating Critical OS Services in Virtualized Systems with Flexible Micro-sliced Cores Jeongseob Ahn*, Chang Hyun Park , Taekyung Heo , Jaehyuk Huh *

  2. Challenge of Server Consolidation Fraction of time 0 1 0.5 Utilization Consolidation improves system utilization However, resources are contended

  3. Challenge of Server Consolidation Fraction of time 0 1 0.5 Utilization Consolidation improves system utilization However, resources are contended

  4. So, What Can Happen? Virtual time discontinuity Running Waiting vCPU 0 Virtual Time VMEXIT / VMENTER vCPU 1 Physical Time pCPU 0 Time shared

  5. So, What Can Happen? Virtual time discontinuity Running Waiting vCPU 0 Virtual Time Spinlock waiting time(gmake) VMEXIT / VMENTER TLB synchronization latency ( sec) vCPU 1 Avg. waiting time ( sec) Avg. Min. Max. 1927 74915 2052 121548 Kernel Component solo 1.03 co-run* 420.13 solo co-run* solo co-run* 28 5 7 dedup Physical Time 55 5 14,928 17 pCPU 0 6,354 Page reclaim Time shared 1,053.26 Page allocator 3.42 vips Dentry 2.93 1,298.87 Runqueue 1.22 256.07 I/O latency & throughput (iPerf) * Concurrently running with Swaptions of PARSEC Jitters (ms) 0.0043 9.2507 Throughput (Mbits/sec) Processing time is amplified solo mixed co-run* 936.3 435.6

  6. How about Shortening Time Slice? Waiting time Time slice vCPU 0 vCPU 1 vCPU 2 Shortening time slice is very simple and powerful, but the overhead of frequent context switches is significant T pCPU 0 Time shared Reduced waiting time ??????? ???? = (# ?????? ????? 1) ???? ????? vCPU 0 vCPU 1 vCPU 2 T pCPU 0 Time shared

  7. Approach: Dividing CPUs into Two Pools Micro-sliced pool - quickly but briefly schedule vCPUs Normal pool Shortened time slice Waiting time Time slice vCPU 0 vCPU 0 vCPU 1 vCPU 1 vCPU 2 vCPU 2 pCPU 3 pCPU 0 Serving critical OS services to minimize the waiting time Serving the main work of applications

  8. Challenges in Serving Critical OS Services on Micro-sliced Cores 1. Precise detection of urgent tasks 2. Guest OS transparency 3. Dynamic adjustment of micro-sliced cores

  9. Detecting Critical OS Services Instruction pointer 0x8106ed62 Examining the instruction pointer (a.k.a PC) whenever a vCPU yields its pCPU # of yields workloads solo 79,440 157,023 290,406 644,643 co-run* 295,262,662 24,102,495 164,578,839 57,650,538 gmake exim dedup vips * Concurrently running with Swaptions of PARSEC

  10. Profiling Virtual CPU Scheduling Logs Investigating frequently preempted regions vCPU scheduling trace (w/ Inst. Pointer) Critical Guest OS Components Module Operation scheduler_ipi() resched_curr() sched Instruction pointer and kernel symbols enable to precisely detect vCPUs preempted while executing critical OS services without guest OS modification flush_tlb_all() get_page_from_freelist() mm irq_enter() irq_exit() irq __raw_spin_unlock() __raw_spin_unlock_irq() spinlock In our paper, you can find the table in details Kernel symbol tables

  11. Accelerating Critical Sections P0 P1 P3 P2 Yield occurring Investigating the preempted vCPUs Scheduling the selected vCPU on the micro-sliced pool

  12. Accelerating Critical TLB Synchronizations P0 P1 P3 P2 Yield occurring Investigating the preempted vCPUs Scheduling the selected vCPU on the micro-sliced pool Dynamically adjusting micro-sliced cores based on profiling

  13. Detecting Critical I/O Events vIPI vIRQ pIRQ I/O handling consists of a chain of operations involving potentially multiple vCPUs

  14. Experimental Environments Testbed 12 HW threads (Intel Xeon) 2 VMs with 12 vCPUs for each Xen hypervisor 4.7 App App OS OS 12 virtual CPUs Xen hypervisor Benchmarking workloads dedup and vips from PARSEC exim and gmake from MOSBENCH 12 physical threads 2-to-1 consolidation ratio Pool configuration Normal: 30ms (Xen default) Micro-sliced: 0.1ms

  15. Performance of Micro-sliced Cores [Our schemes] baseline static-ideal dynamic Norm. execution time (%) 1 1.2 3 1 3 3 0.8 1 0.6 0.4 3 0.2 0 dedup (VM-1) swaptions (VM-2) vips (VM-1) swaptions (VM-2) gmake (VM-1) swaptions (VM-2)

  16. Performance of Micro-sliced Cores [Our schemes] baseline static-ideal dynamic Norm. execution time (%) 1.5 1.4 140,000,000.00 exim swaptions 1 1.2 3 120,000,000.00 5.0 1.2 Normalized execution time (%) # of yield exceptions 1.1 1 8% gap 4.5 3 Throughtput improvement 100,000,000.00 3 1.0 4.0 0.8 80,000,000.00 0.9 1 3.5 0.6 0.8 60,000,000.00 3.0 0.7 0.4 3 2.5 0.6 40,000,000.00 0.2 0.5 2.0 20,000,000.00 0.4 0 0.3 0.00 dedup (VM-1) swaptions (VM-2) vips (VM-1) 0.0 swaptions (VM-2) gmake (VM-1) swaptions (VM-2) 1.0 0.2 0.5 0.1 0.0 baseline static-ideal dynamic

  17. I/O Performance TCP UDP 800 0 899.5 -0.05 799.5 700 -0.1 699.5 600 Bandwidth (Mbps) -0.15 599.5 Workloads iPerf lookbusy lookbusy 500 jitters (ms) -0.2 499.5 400 -0.25 399.5 VM-1 -0.3 299.5 300 VM-2 -0.35 199.5 200 -0.4 100 -0.45 99.5 0 -0.5 -0.5 Baseline micro-sliced

  18. Conclusions We introduced a new approach to mitigate the virtual time discontinuity problem Three distinct contributions Precise detection of urgent tasks Guest OS transparency Dynamic adjustment of the micro-sliced cores Overhead is very low for applications which do not frequently use OS services

  19. Thank You! jsahn@ajou.ac.kr Accelerating Critical OS Services in Virtualized Systems with Flexible Micro-sliced Cores Jeongseob Ahn*, Chang Hyun Park , Taekyung Heo , Jaehyuk Huh *

Related


More Related Content