Is RRTMGP Suited for GPU?

 
Is RRTMGP suited for GPU?
 
 
Expectations
 
1.
Embarrassing parallel
Columns can be split up and computed in parallel
 
2.
Memory intensive computations
Memory is faster on GPU than CPU
 
     
Answer to the title: 
Yes
 
 
Context
 
 
Speed-up GPU vs CPU
 
The expected speed up will be defined much
by the memory performance.
GDDR is faster than DDR. This advantage is
expected to further grow in future GPU
generations.
Memory bandwidth speed is approximately 1.5 - 2
times faster on GPU.
Drawback: GDDR is typically smaller than DDR
 
Computations in RRTMGP
 
Multiple components to parallelize
Gas optics, flux solver, etc.
Multiple sub-components, each with its own logic
 
Computations are relatively lightweight
Terms and factors from multiple sources, often arrays,
are combined using basic arithmetic.
 
Static data can be parked on GPU memory
e.g. k-coefficients
 
Scale
 
Dimensions Approx.
40.000+ Columns
100 Layers
250 Pseudo-spectral
10 other
 
Memory Access Patterns
 
Memory access is mostly sequential.
There are local interpolations that interfere with a
perfect sequential memory access. These
disruptions are at a local scale only.
 
Indexing on arrays change for components.
Gas optics
  
(pseudo-spectral, layer, column)
Flux solver
  
(column, layer, pseudo-spectral)
 
Lessons learned
 
 
Overview
 
Compilers struggle with newer FORTRAN,
OpenACC, and libraries.
FORTRAN 2003
NetCDF library for I/O
 
For OpenACC we tested:
PGI and Cray
 
Without OpenACC we tested:
Intel, PGI, GNU, Cray, and NAG
 
Success: Cray and OpenACC
 
We got gas optics to work to the extend that it compiled
and computed the correct answers to 15 digit precision on
GPU.
 
With $ACC PARALLEL; $ACC KERNEL crashes
Error messages could be better
 
Issues
Member variables and OpenACC are not workable
Function calls within parallel regions are not supported by
compiler
Optional arguments and OpenACC are not workable
Defining dynamic dimensions of variables in member functions
 
PGI and NetCDF
 
Failure: PGI and NetCDF do not play nice
ERROR: Segmentation fault
pgi/15.3 netcdf/4.3.3.1 on Janus @ 
rc.colorado.edu
 
This prevented us from testing OpenACC and PGI.
The PGI compiler is one of the prime choices for
OpenACC.
 
Q: What is the standard NetCDF library for Python?
netCDF4, scipy.io.netcdf, or Scientific.IO.NetCDF
 
Intel
 
Does not support OpenACC for practical
purpose.
A few hick-ups with FORTAN 2003 standard,
but overall “thumbs up”.
Side note: The compiler is sometimes too
lenient in the syntax it accepts
intel/15.0.2 netcdf/4.3.3.1
 
GNU
 
Does not support OpenACC for practical
purpose.
A few hick-ups with FORTAN 2003 standard,
but overall “thumbs up”.
Does not support some FORTAN 2003 implicit
memory allocations
Expected to be slower than other compilers
gnu/4.9.2 netcdf/4.3.3.1
 
Extra slides
 
 
Parallelism in RRTMGP
 
Columns
Layers
Pseudo-spectral (gpts)
other
 
Strategies for OpenACC Parallelism
 
Solver
 
Gas Optics
 
OpenACC – example gas optics
 
Future Outlook
 
C++ implementation, Hackathon, etc.
http://www.openacc.org/content/openacc-
hackathon-tu-dresdenforschungzentrum-julich
Slide Note
Embed
Share

RRTMGP offers the potential for significant speed-up on GPU due to its embarrassingly parallel nature and memory-intensive computations. The access patterns, memory performance, and scale dimensions also contribute to its suitability for GPU utilization. However, challenges are present in compiler compatibility and certain limitations with OpenACC. Successful implementations have been achieved with Cray and OpenACC, demonstrating promising results in gas optics computations."

  • RRTMGP
  • GPU computing
  • parallelization
  • compiler challenges
  • OpenACC

Uploaded on Mar 01, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Is RRTMGP suited for GPU?

  2. Expectations 1. Embarrassing parallel Columns can be split up and computed in parallel 2. Memory intensive computations Memory is faster on GPU than CPU Answer to the title: Yes

  3. Context

  4. Speed-up GPU vs CPU The expected speed up will be defined much by the memory performance. GDDR is faster than DDR. This advantage is expected to further grow in future GPU generations. Memory bandwidth speed is approximately 1.5 - 2 times faster on GPU. Drawback: GDDR is typically smaller than DDR

  5. Computations in RRTMGP Multiple components to parallelize Gas optics, flux solver, etc. Multiple sub-components, each with its own logic Computations are relatively lightweight Terms and factors from multiple sources, often arrays, are combined using basic arithmetic. Static data can be parked on GPU memory e.g. k-coefficients

  6. Scale Dimensions Approx. 40.000+ Columns 100 Layers 250 Pseudo-spectral 10 other

  7. Memory Access Patterns Memory access is mostly sequential. There are local interpolations that interfere with a perfect sequential memory access. These disruptions are at a local scale only. Indexing on arrays change for components. Gas optics (pseudo-spectral, layer, column) Flux solver (column, layer, pseudo-spectral)

  8. Lessons learned

  9. Overview Compilers struggle with newer FORTRAN, OpenACC, and libraries. FORTRAN 2003 NetCDF library for I/O For OpenACC we tested: PGI and Cray Without OpenACC we tested: Intel, PGI, GNU, Cray, and NAG

  10. Success: Cray and OpenACC We got gas optics to work to the extend that it compiled and computed the correct answers to 15 digit precision on GPU. With $ACC PARALLEL; $ACC KERNEL crashes Error messages could be better Issues Member variables and OpenACC are not workable Function calls within parallel regions are not supported by compiler Optional arguments and OpenACC are not workable Defining dynamic dimensions of variables in member functions

  11. PGI and NetCDF Failure: PGI and NetCDF do not play nice ERROR: Segmentation fault pgi/15.3 netcdf/4.3.3.1 on Janus @ rc.colorado.edu This prevented us from testing OpenACC and PGI. The PGI compiler is one of the prime choices for OpenACC. Q: What is the standard NetCDF library for Python? netCDF4, scipy.io.netcdf, or Scientific.IO.NetCDF

  12. Intel Does not support OpenACC for practical purpose. A few hick-ups with FORTAN 2003 standard, but overall thumbs up . Side note: The compiler is sometimes too lenient in the syntax it accepts intel/15.0.2 netcdf/4.3.3.1

  13. GNU Does not support OpenACC for practical purpose. A few hick-ups with FORTAN 2003 standard, but overall thumbs up . Does not support some FORTAN 2003 implicit memory allocations Expected to be slower than other compilers gnu/4.9.2 netcdf/4.3.3.1

  14. Extra slides

  15. Parallelism in RRTMGP Columns Layers Pseudo-spectral (gpts) other

  16. Strategies for OpenACC Parallelism Solver Gas Optics

  17. OpenACC example gas optics

  18. Future Outlook C++ implementation, Hackathon, etc. http://www.openacc.org/content/openacc- hackathon-tu-dresdenforschungzentrum-julich

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#