Dynamic Core Boosting for Heterogeneous Computing

Slide Note
Embed
Share

Exploring the challenges of workload heterogeneity in parallel programming, focusing on the impact of asymmetric hardware on performance and synchronization. Insights on modeling workload imbalance and boosting critical paths for efficient computation in heterogeneous multicores.


Uploaded on Sep 18, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Embracing Heterogeneity with Dynamic Core Boosting Hyoun Kyu Cho and Scott Mahlke University of Michigan May 20, 2014 1 University of Michigan Electrical Engineering and Computer Science

  2. Parallel Programming Core1 Core2 Workload Core3 Core4 2 University of Michigan Electrical Engineering and Computer Science

  3. Workload Imbalance Among Threads Asymmetric S/W Control flow divergence Non-deterministic memory latencies Synchronization operations Asymmetric H/W Heterogeneous multicores Core-to-core process variation 3 University of Michigan Electrical Engineering and Computer Science

  4. Performance Impact of Asymmetric H/W Symmetric 8 Cores vs. 8 Cores w/ variations 4 University of Michigan Electrical Engineering and Computer Science

  5. CPU Time Wasted for Synchronization Homogeneous Heterogeneous 5 University of Michigan Electrical Engineering and Computer Science

  6. Thread Criticality due to Workload Imbalance Barrier Idle T1 T2 T3 T4 T5 time T1 T2 T3 T4 T5 time 6 University of Michigan Electrical Engineering and Computer Science

  7. Accelerating Critical Path w/ Core Boosting Barrier Idle T1 T2 T3 T4 T5 time T1 T1 T2 T2 T3 T3 T4 T4 T5 T5 time time 7 University of Michigan Electrical Engineering and Computer Science

  8. Modeling Workload Imbalance & Boosting 8 University of Michigan Electrical Engineering and Computer Science

  9. Boosting Assignment Data parallel programs Worker Worker Worker Worker Worker Pipeline parallel programs Stage4 Stage3 Stage1 Stage2 9 University of Michigan Electrical Engineering and Computer Science

  10. Boosting Data Parallel Programs Greedy scheduling 10 University of Michigan Electrical Engineering and Computer Science

  11. Boosting Pipeline Parallel Programs Epoch-based scheduling Monitors CPU utilization with H/W performance counter Assigns boosting budget at the end of epoch 11 University of Michigan Electrical Engineering and Computer Science

  12. Dynamic Core Boosting 12 University of Michigan Electrical Engineering and Computer Science

  13. Progress Monitoring Example pthread_barrier_wait(barrier); period = calc_period_LID_007(start, end); for ( i = start ; i < end ; i++ ) { compute( ); if ( side_exit ) { SET_PROGRESS_TO(MAX_PROGRESS_007); break; } if ( ( ( end i ) % period ) == 0 ) PROGRESS_STEP_FORWARD; } pthread_barrier_wait(barrier); 13 University of Michigan Electrical Engineering and Computer Science

  14. Evaluation Methodology Asymmetry emulation with Dynamic Binary Translation Slow down proportionally instead of accelerating 8 cores with frequency variation 1 core boosted, boosting rate = 1.5x Compares Heterogeneous Reactive DCB 14 University of Michigan Electrical Engineering and Computer Science

  15. Performance Improvement Heterogeneous Reactive DCB 1.0 Normalized Execution Time 0.9 0.8 0.7 0.6 0.5 15 University of Michigan Electrical Engineering and Computer Science

  16. Synchronization Overheads Heterogeneous Reactive DCB 80% 70% Relative CPU Time 60% 50% 40% 30% 20% 10% 0% 16 University of Michigan Electrical Engineering and Computer Science

  17. Thread Arrival Time 17 University of Michigan Electrical Engineering and Computer Science

  18. Conclusion DCB mitigates workload imbalance in performance asymmetric CMPs Accelerating critical threads Coordinating compiler, runtime, and architecture for near-optimal assignment Overall, improves performance by 33%, outperforming a reactive boosting scheme by 10% 18 University of Michigan Electrical Engineering and Computer Science

  19. Thank you! 19 University of Michigan Electrical Engineering and Computer Science

  20. Core Boosting with Frequency Scaling Transition time < 10ns [Dreslinski`12] 20 University of Michigan Electrical Engineering and Computer Science

  21. Asymmetry Emulation with DBT 21 University of Michigan Electrical Engineering and Computer Science

  22. Evaluation Platform Accuracy 12% 10% Relative Error 8% 6% 4% 2% 0% 22 University of Michigan Electrical Engineering and Computer Science

Related


More Related Content