Redesigning the GPU Memory Hierarchy for Multi-Application Concurrency

Slide Note

This presentation delves into the innovative reimagining of GPU memory hierarchy to accommodate multiple applications concurrently. It explores the challenges of GPU sharing with address translation, high-latency page walks, and inefficient caching, offering insights into a translation-aware memory hierarchy. The discussion highlights key sources of inefficiency in translation processes, shedding light on high TLB contention and latency-sensitive address translation issues.

quentin Follow

Uploaded on Aug 08, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Jayneel Gandhi Adwait Jog Christopher J. Rossbach Saugata Ghose Onur Mutlu GPU 2 (Virginia EF) Tuesday 2PM-3PM

Enabling GPU Sharing with Address Translation Enabling GPU Sharing with Address Translation Virtual Address GPU Core GPU Core GPU Core GPU Core Page Table Walkers App 1 Page Table (in main memory) App 2 2

Enabling GPU Sharing with Address Translation Enabling GPU Sharing with Address Translation Virtual Address GPU Core GPU Core GPU Core GPU Core Page Table Walkers App 1 High latency page walks Page Table (in main memory) App 2 3

State State- -of of- -the the- -Art Translation Support in GPUs Art Translation Support in GPUs Virtual Address GPU Core GPU Core GPU Core GPU Core Private TLB Private TLB Private TLB Private TLB Private Shared Shared TLB Page Table Walkers App 1 High latency page walks Page Table (in main memory) App 2 4

Three Sources of Inefficiency in Translation Three Sources of Inefficiency in Translation High TLB contention n Inefficient caching Bypass Address-translation is latency sensitive MASK: A Translation-aware Memory Hierarchy 5

Three Sources of Inefficiency in Translation Three Sources of Inefficiency in Translation High TLB contention n 6

Three Sources of Inefficiency in Translation Three Sources of Inefficiency in Translation High TLB contention n Inefficient caching Bypass 7

Three Sources of Inefficiency in Translation Three Sources of Inefficiency in Translation High TLB contention n Inefficient caching Bypass Address translation is latency-sensitive 8

Our Solution Our Solution MASK: A Translation-aware Memory Hierarchy 9

Three Components of MASK Three Components of MASK 10

Three Components of MASK Three Components of MASK TLB-fill Tokens Reduces TLB contention Shared TLB 11

Three Components of MASK Three Components of MASK TLB-fill Tokens Reduces TLB contention Shared TLB Translation Data Translation-aware L2 Bypass Improves L2 cache utilization L2 Data Cache 12

Three Components of MASK Three Components of MASK TLB-fill Tokens Reduces TLB contention Shared TLB Translation Data Translation-aware L2 Bypass Improves L2 cache utilization L2 Data Cache Translation Data Address-space-aware Memory Scheduler Lowers address translation latency Main Memory 13

Three Components of MASK Three Components of MASK TLB-fill Tokens Reduces TLB contention Shared TLB Translation Data Translation-aware L2 Bypass Improves L2 cache utilization L2 Data Cache Translation Data Address-space-aware Memory Scheduler Lowers address translation latency Main Memory MASK improves performance by 57.8% 14

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency Rachata Ausavarungnirun Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Christopher J. Rossbach Onur Mutlu GPU 2 (Virginia EF) Tuesday 2PM-3PM

Redesigning the GPU Memory Hierarchy for Multi-Application Concurrency

Download Presentation

Presentation Transcript

Related

More Related Content