Dynamic Core Boosting for Heterogeneous Computing
Exploring the challenges of workload heterogeneity in parallel programming, focusing on the impact of asymmetric hardware on performance and synchronization. Insights on modeling workload imbalance and boosting critical paths for efficient computation in heterogeneous multicores.
Uploaded on Sep 18, 2024 | 0 Views
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Embracing Heterogeneity with Dynamic Core Boosting Hyoun Kyu Cho and Scott Mahlke University of Michigan May 20, 2014 1 University of Michigan Electrical Engineering and Computer Science
Parallel Programming Core1 Core2 Workload Core3 Core4 2 University of Michigan Electrical Engineering and Computer Science
Workload Imbalance Among Threads Asymmetric S/W Control flow divergence Non-deterministic memory latencies Synchronization operations Asymmetric H/W Heterogeneous multicores Core-to-core process variation 3 University of Michigan Electrical Engineering and Computer Science
Performance Impact of Asymmetric H/W Symmetric 8 Cores vs. 8 Cores w/ variations 4 University of Michigan Electrical Engineering and Computer Science
CPU Time Wasted for Synchronization Homogeneous Heterogeneous 5 University of Michigan Electrical Engineering and Computer Science
Thread Criticality due to Workload Imbalance Barrier Idle T1 T2 T3 T4 T5 time T1 T2 T3 T4 T5 time 6 University of Michigan Electrical Engineering and Computer Science
Accelerating Critical Path w/ Core Boosting Barrier Idle T1 T2 T3 T4 T5 time T1 T1 T2 T2 T3 T3 T4 T4 T5 T5 time time 7 University of Michigan Electrical Engineering and Computer Science
Modeling Workload Imbalance & Boosting 8 University of Michigan Electrical Engineering and Computer Science
Boosting Assignment Data parallel programs Worker Worker Worker Worker Worker Pipeline parallel programs Stage4 Stage3 Stage1 Stage2 9 University of Michigan Electrical Engineering and Computer Science
Boosting Data Parallel Programs Greedy scheduling 10 University of Michigan Electrical Engineering and Computer Science
Boosting Pipeline Parallel Programs Epoch-based scheduling Monitors CPU utilization with H/W performance counter Assigns boosting budget at the end of epoch 11 University of Michigan Electrical Engineering and Computer Science
Dynamic Core Boosting 12 University of Michigan Electrical Engineering and Computer Science
Progress Monitoring Example pthread_barrier_wait(barrier); period = calc_period_LID_007(start, end); for ( i = start ; i < end ; i++ ) { compute( ); if ( side_exit ) { SET_PROGRESS_TO(MAX_PROGRESS_007); break; } if ( ( ( end i ) % period ) == 0 ) PROGRESS_STEP_FORWARD; } pthread_barrier_wait(barrier); 13 University of Michigan Electrical Engineering and Computer Science
Evaluation Methodology Asymmetry emulation with Dynamic Binary Translation Slow down proportionally instead of accelerating 8 cores with frequency variation 1 core boosted, boosting rate = 1.5x Compares Heterogeneous Reactive DCB 14 University of Michigan Electrical Engineering and Computer Science
Performance Improvement Heterogeneous Reactive DCB 1.0 Normalized Execution Time 0.9 0.8 0.7 0.6 0.5 15 University of Michigan Electrical Engineering and Computer Science
Synchronization Overheads Heterogeneous Reactive DCB 80% 70% Relative CPU Time 60% 50% 40% 30% 20% 10% 0% 16 University of Michigan Electrical Engineering and Computer Science
Thread Arrival Time 17 University of Michigan Electrical Engineering and Computer Science
Conclusion DCB mitigates workload imbalance in performance asymmetric CMPs Accelerating critical threads Coordinating compiler, runtime, and architecture for near-optimal assignment Overall, improves performance by 33%, outperforming a reactive boosting scheme by 10% 18 University of Michigan Electrical Engineering and Computer Science
Thank you! 19 University of Michigan Electrical Engineering and Computer Science
Core Boosting with Frequency Scaling Transition time < 10ns [Dreslinski`12] 20 University of Michigan Electrical Engineering and Computer Science
Asymmetry Emulation with DBT 21 University of Michigan Electrical Engineering and Computer Science
Evaluation Platform Accuracy 12% 10% Relative Error 8% 6% 4% 2% 0% 22 University of Michigan Electrical Engineering and Computer Science