TOM: Enabling Programmer-Transparent Near-Data Processing in GPU Systems

Slide Note
Embed
Share

This paper discusses Transparent Offloading and Mapping (TOM) for enabling programmer-transparent near-data processing in GPU systems. It addresses the opportunity of processing data directly in 3D-stacked memories, the challenges involved, and introduces a new mechanism for identifying and deciding what code portions to offload. The approach includes a compiler that identifies code portions to potentially offload based on memory profiles.


Uploaded on Sep 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O Connor, NanditaVijaykumar, Onur Mutlu, Stephen W. Keckler

  2. Opportunity 3D-stacked memory (memory stack) SM (Streaming Multiprocessor) Logic layer Logic layer SM Main GPU Crossbar switch Vault Ctrl Vault Ctrl . Processing data directly in 3D Processing data directly in 3D- -stacked memories is a promising direction memories is a promising direction stacked 2

  3. The Problem 3D-stacked memory (memory stack) SM (Streaming Multiprocessor) Logic layer Logic layer SM Main GPU Crossbar switch Vault Ctrl Vault Ctrl . However, it requires However, it requires significant programmer effort significant programmer effort 3

  4. Key Challenge 1 3D-stacked memory (memory stack) SM (Streaming Multiprocessor) Logic layer Logic layer SM Main GPU Crossbar switch Vault Ctrl Vault Ctrl . 4

  5. Key Challenge 1 Challenge 1: Which operations should be executed on the logic layer SMs? ? 3D-stacked memory (memory stack) SM (Streaming Multiprocessor) ? Logic layer Logic layer SM Main GPU Crossbar switch Vault Ctrl Vault Ctrl . 5

  6. Key Challenge 2 Challenge 2: How should data be mapped to different 3D memory stacks? 3D-stacked memory (memory stack) SM (Streaming Multiprocessor) Logic layer Logic layer SM Main GPU Crossbar switch Vault Ctrl Vault Ctrl . 6

  7. Our Approach: TOM Component 1: A new programmer-transparent mechanism to identify and decide what code portions to offload 7

  8. Our Approach: TOM Component 1: A new programmer-transparent mechanism to identify and decide what code portions to offload The compiler identifies code portions to potentially offload based on memory profile. 8

  9. Our Approach: TOM Component 1: A new programmer-transparent mechanism to identify and decide what code portions to offload The compiler identifies code portions to potentially offload based on memory profile. The runtime system decides whether or not to offload each code portion based on runtime characteristics. 9

  10. Our Approach: TOM Component 1: A new programmer-transparent mechanism to identify and decide what code portions to offload The compiler identifies code portions to potentially offload based on memory profile. The runtime system decides whether or not to offload each code portion based on runtime characteristics. Component 2: A new, simple, programmer-transparent data mapping mechanism to maximize code/data co-location 10

  11. Our Approach: TOM Component 1: A new programmer-transparent mechanism to identify and decide what code portions to offload The compiler identifies code portions to potentially offload based on memory profile. The runtime system decides whether or not to offload each code portion based on runtime characteristics. Component 2: A new, simple, programmer-transparent data mapping mechanism to maximize code/data co-location Key Results: 30% average (76% max) performance improvement in GPU workloads 11

  12. Talk at Monday 2:50pm (Session 3B) Talk at Monday 2:50pm (Session 3B) Transparent Offloading and Mapping (TOM) Kevin Hsieh Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O Connor, Nandita Vijaykumar, Onur Mutlu, Stephen W. Keckler

Related


More Related Content