E3SM Next-Gen Architecture & Strategy Insights

E
3
S
M
 
N
e
x
t
-
G
e
n
 
A
r
c
h
i
t
e
c
t
u
r
e
S
t
r
a
t
e
g
y
Mark Taylor
mataylo@sandia.gov
E
3
S
M
 
M
a
c
h
i
n
e
 
R
o
a
d
m
a
p
CPUs:
BER purchased dedicated systems
Anvil, CompyMcNodeFace:    6.2M node-hours per year
Great machines for moderate resolutions – fastest performance
Not large enough for high-res science campaigns
KNL Architecture (Cori, Theta)
Most of v1/v2 simulations will be run on KNL, through 2021)
2019 allocation:   10.5M  node-hours  (ERCAP, INCITE, ALCC)
Our future is GPUs:
2019 allocation:  6M  GPU hours
OLCF:     Summit           2019
NERSC:  Perlmutter:      2020
ALCF:     Aurora             2021
S
t
r
a
t
e
g
y
CPU systems are not going away - other modeling centers
may continue to run on large CPU systems
DOE will be nearly 100% GPU systems by 2021
E3SM computational mission:  An Earth system model than runs
effectively on DOE computers
E3SM must run on GPU systems
E3SM phase 2 will focus on simulation regimes where
GPU systems can outperform similar size CPU systems
W
h
e
r
e
 
c
a
n
 
G
P
U
s
 
o
u
t
p
e
r
f
o
r
m
 
C
P
U
s
?
Good understanding of how GPU and KNL architectures perform
based on detailed benchmarks
100% of the code running on KNL
25% of the code running well on GPUs (atmosphere dycore).  50% partially
complete (ocean/ice components)
GPU systems require large amount of work per node in order to
outperform CPU systems (per watt)
E3SM has excellent MPI performance allowing us to obtain
throughput by running in the strong scaling limit (little work per node).
In this regime, GPUs are not competitive with CPUs (per watt)
W
h
e
r
e
 
c
a
n
 
G
P
U
s
 
o
u
t
p
e
r
f
o
r
m
 
C
P
U
s
?
Ultra-high resolution  (SCREAM project)
At 3km resolutions and higher, there will be sufficient work per node to keep the
GPU happy
At these resolutions, GPU systems could be up to 3x faster than CPU systems
Large allocations could afford to make O(10) year simulations
Can start to address many large uncertainties in climate models, but doesn’t
replace climate models
Low and High Resolutions (E3SM v2/v3)
At these resolutions, maximum throughput will be obtained in the strong scaling
regime, where GPUs will not provide any significant benefit over CPUs
If maximum throughput is not required, GPUs will allow more efficient simulations
Best estimate:  If willing to run 2-5x slower, can obtain more simulations per Watt.
Increased complexity
GPUs will allow us to increase complexity with proportionally less increase in cost
e.g.: super-parameterization, more tracers
C
o
d
e
 
R
e
a
d
i
n
e
s
s
 
1
E3SMv1
All configurations run well on CPUs, KNL systems
low-res model for climate science,  simulate O(5000) years per year
high-res model for limited climate science, O(100) years
KNL provides a lot of cycles but with similar performance to CPUs (per
watt)
E3SMv2:
Atmosphere dycore and MPAS components running on the GPU
Ocean/Ice “G” cases running on GPUs (2019)
Expect performance (per watt) similar to CPUs
Coupled “B” or atmosphere/land “F” cases:
Can we port the physics to GPU in time for v2?
Can we get competitive performance? 
C
o
d
e
 
R
e
a
d
i
n
e
s
s
 
2
E3SM-MMF   “superparameterization”
Atm/lnd “F” case running on GPUs (2019)
Performance estimated 2x faster (per watt) compared to CPU
SCREAM
NH dycore running on GPUs 2019
idealized test cases
NH cloud resolving, prescribed aerosol physics:  2020 or 2021
3km global model, simulation length: O(10) years
First global CRM climate sensitivity studies?
GPUs will outperform CPUs
S
u
m
m
a
r
y
We have a lot of capability to use GPU systems:
E3SM-MMF running well on GPU systems
MPAS G cases planned for 2019
SCREAM (UHR atmosphere) will run well on GPU systems ~2020
What about multi-decadal simulation campaigns on GPU systems?
Problem is the atmosphere physics. Several options:
(1) Port v1/v2 physics to GPUs, run slower but more efficiently
(2) Run these problems on CPU systems
(3) Use E3SM-MMF (superparameterization)
(4) Expand SCREAM effort to support lower resolutions  
B
a
c
k
u
p
 
S
l
i
d
e
s
 
C
o
d
e
 
R
e
a
d
i
n
e
s
s
 
3
E3SMv3   Full GPU support, 2 possible atmosphere options
openACC port of MPAS components
Atmosphere (1)  NH C++ dycore + openACC port of v2 physics
Atmosphere (2):  
E3SM-MMF
E3SMv4
Atmosphere: Merge SCREAM and E3SMv3 aerosols/convection physics
MPAS components:   transition to task based parallel version?
G
P
U
 
P
o
r
t
i
n
g
 
W
o
r
k
Super-parameterization (SP) in atmosphere using
Fortran+OpenACC/OpenMP under ECP (first prototype by end of 2018)
Ongoing porting of MPAS ocean/ice using Fortran/OpenACC under ECP
(completion early 2019)
SCREAM physics (C++/Kokkos)  First prototype by end of 2019
SCREAM NH dynamics (C++/Kokkos)  completion: 2019
E3SM Performance group:   Feasibility study:  evaluation of v1/v2 physics
using Fortran/OpenACC (Spring 2019)
G
P
U
 
S
i
m
u
l
a
t
i
o
n
s
E3SM-MMF Super-parameterization:   2019
Outperform CPU systems
NH dynamics 3km                                 2019
Outperform CPU systems
SCREAM 3km model                            2020
Outperform CPU systems
MPAS G cases                                      2019
Competitive with CPU systems?
E3SM v2 low & high resolution          2021?
If we can get the E3SM v2 physics ported ( large, expensive effort)
GPUs wont speed up the model, but will enable more efficient simulations
A
2
1
 
S
t
r
a
t
e
g
y
Fortran/openACC code:
Early indications suggest code restructuring needed to obtain good openACC GPU
performance appears to be appropriate for A21
Push loops down the call stack to expose maximum parallelism
Keep loops to relatively small lines of code – reduce GPU register pressure and
easier compiler processing for improved vectorization and A21 loop analysis
Transition openACC code to openMPI by 2021
C++/Kokkos code
Current code obtains excellent performance on CPUs, KNL and GPU architectures,
suggesting this abstraction is sufficiently powerful to also obtain good performance
on A21
Slide Note
Embed
Share

E3SM is transitioning to GPU systems for enhanced performance while maintaining computational mission. Detailed benchmarks show where GPUs outperform CPUs, particularly in ultra-high resolutions. The strategic roadmap outlines the shift towards GPU-centric simulations for optimal efficiency and complexity in climate modeling.

  • E3SM
  • Architecture
  • Strategy
  • GPUs
  • Climate Modeling

Uploaded on Feb 19, 2025 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. E3SM Next-Gen Architecture Strategy Mark Taylor mataylo@sandia.gov

  2. E3SM Machine Roadmap CPUs: BER purchased dedicated systems Anvil, CompyMcNodeFace: 6.2M node-hours per year Great machines for moderate resolutions fastest performance Not large enough for high-res science campaigns KNL Architecture (Cori, Theta) Most of v1/v2 simulations will be run on KNL, through 2021) 2019 allocation: 10.5M node-hours (ERCAP, INCITE, ALCC) Our future is GPUs: 2019 allocation: 6M GPU hours OLCF: Summit 2019 NERSC: Perlmutter: 2020 ALCF: Aurora 2021

  3. Strategy CPU systems are not going away - other modeling centers may continue to run on large CPU systems DOE will be nearly 100% GPU systems by 2021 E3SM computational mission: An Earth system model than runs effectively on DOE computers E3SM must run on GPU systems E3SM phase 2 will focus on simulation regimes where GPU systems can outperform similar size CPU systems

  4. Where can GPUs outperform CPUs? Good understanding of how GPU and KNL architectures perform based on detailed benchmarks 100% of the code running on KNL 25% of the code running well on GPUs (atmosphere dycore). 50% partially complete (ocean/ice components) GPU systems require large amount of work per node in order to outperform CPU systems (per watt) E3SM has excellent MPI performance allowing us to obtain throughput by running in the strong scaling limit (little work per node). In this regime, GPUs are not competitive with CPUs (per watt)

  5. Where can GPUs outperform CPUs? Ultra-high resolution (SCREAM project) At 3km resolutions and higher, there will be sufficient work per node to keep the GPU happy At these resolutions, GPU systems could be up to 3x faster than CPU systems Large allocations could afford to make O(10) year simulations Can start to address many large uncertainties in climate models, but doesn t replace climate models Low and High Resolutions (E3SM v2/v3) At these resolutions, maximum throughput will be obtained in the strong scaling regime, where GPUs will not provide any significant benefit over CPUs If maximum throughput is not required, GPUs will allow more efficient simulations Best estimate: If willing to run 2-5x slower, can obtain more simulations per Watt. Increased complexity GPUs will allow us to increase complexity with proportionally less increase in cost e.g.: super-parameterization, more tracers

  6. Code Readiness 1 E3SMv1 All configurations run well on CPUs, KNL systems low-res model for climate science, simulate O(5000) years per year high-res model for limited climate science, O(100) years KNL provides a lot of cycles but with similar performance to CPUs (per watt) E3SMv2: Atmosphere dycore and MPAS components running on the GPU Ocean/Ice G cases running on GPUs (2019) Expect performance (per watt) similar to CPUs Coupled B or atmosphere/land F cases: Can we port the physics to GPU in time for v2? Can we get competitive performance?

  7. Code Readiness 2 E3SM-MMF superparameterization Atm/lnd F case running on GPUs (2019) Performance estimated 2x faster (per watt) compared to CPU SCREAM NH dycore running on GPUs 2019 idealized test cases NH cloud resolving, prescribed aerosol physics: 2020 or 2021 3km global model, simulation length: O(10) years First global CRM climate sensitivity studies? GPUs will outperform CPUs

  8. Summary We have a lot of capability to use GPU systems: E3SM-MMF running well on GPU systems MPAS G cases planned for 2019 SCREAM (UHR atmosphere) will run well on GPU systems ~2020 What about multi-decadal simulation campaigns on GPU systems? Problem is the atmosphere physics. Several options: (1) Port v1/v2 physics to GPUs, run slower but more efficiently (2) Run these problems on CPU systems (3) Use E3SM-MMF (superparameterization) (4) Expand SCREAM effort to support lower resolutions

  9. Backup Slides

  10. Code Readiness 3 E3SMv3 Full GPU support, 2 possible atmosphere options openACC port of MPAS components Atmosphere (1) NH C++ dycore + openACC port of v2 physics Atmosphere (2): E3SM-MMF E3SMv4 Atmosphere: Merge SCREAM and E3SMv3 aerosols/convection physics MPAS components: transition to task based parallel version?

  11. GPU Porting Work Super-parameterization (SP) in atmosphere using Fortran+OpenACC/OpenMP under ECP (first prototype by end of 2018) Ongoing porting of MPAS ocean/ice using Fortran/OpenACC under ECP (completion early 2019) SCREAM physics (C++/Kokkos) First prototype by end of 2019 SCREAM NH dynamics (C++/Kokkos) completion: 2019 E3SM Performance group: Feasibility study: evaluation of v1/v2 physics using Fortran/OpenACC (Spring 2019)

  12. GPU Simulations E3SM-MMF Super-parameterization: 2019 Outperform CPU systems NH dynamics 3km 2019 Outperform CPU systems SCREAM 3km model 2020 Outperform CPU systems MPAS G cases 2019 Competitive with CPU systems? E3SM v2 low & high resolution 2021? If we can get the E3SM v2 physics ported ( large, expensive effort) GPUs wont speed up the model, but will enable more efficient simulations

  13. A21 Strategy Fortran/openACC code: Early indications suggest code restructuring needed to obtain good openACC GPU performance appears to be appropriate for A21 Push loops down the call stack to expose maximum parallelism Keep loops to relatively small lines of code reduce GPU register pressure and easier compiler processing for improved vectorization and A21 loop analysis Transition openACC code to openMPI by 2021 C++/Kokkos code Current code obtains excellent performance on CPUs, KNL and GPU architectures, suggesting this abstraction is sufficiently powerful to also obtain good performance on A21

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#