Enhancing I/O Performance Through Adaptive Data Compression in Climate Simulations

Adaptive Compression to Improve I/O
Performance for Climate Simulations
Swati Singhal
1
Alan Sussman
UMIACS and Department of Computer Science
 
D
ata reduction is growing concern for scientific computing
Motivating Example: Meso-Scale Climate Simulation Application from
Department of Atmospheric and Oceanic Sciences
 
at UMD
An ensemble simulation with data assimilation on 1.3 million grid points that
forecasts for a nine hour period, repeatedly
Simulation time
: ~65 mins on a cluster for a 9 hour forecast (~60% of the
time is spend in I/O and extraneous work)
Generates ~238 GB of data per simulation
 
 
P
ossible solution
Data compression
Reduces data volume for I/O and increases effective I/O bandwidth
Scientific Data Compression?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
S
c
i
e
n
t
i
f
i
c
 
d
a
t
a
 
o
f
t
e
n
 
a
r
e
 
m
u
l
t
i
d
i
m
e
n
s
i
o
n
a
l
 
a
r
r
a
y
s
 
o
f
 
 
f
l
o
a
t
i
n
g
 
p
o
i
n
t
 
n
u
m
b
e
r
s
,
 
s
t
o
r
e
d
i
n
 
s
e
l
f
-
d
e
s
c
r
i
b
i
n
g
 
d
a
t
a
 
f
o
r
m
a
t
s
 
(
e
.
g
.
,
 
n
e
t
C
D
F
,
 
H
D
F
,
 
e
t
c
.
)
D
i
f
f
i
c
u
l
t
 
t
o
 
c
o
m
p
r
e
s
s
 
-
>
 
h
i
g
h
 
e
n
t
r
o
p
y
 
i
n
 
l
o
w
e
r
 
o
r
d
e
r
 
b
y
t
e
s
0
.
0
0
5
8
9
0
.
0
0
5
9
0
W
h
a
t
 
m
e
t
h
o
d
s
 
a
r
e
 
a
v
a
i
l
a
b
l
e
 
t
o
 
c
o
m
p
r
e
s
s
 
s
c
i
e
n
t
i
f
i
c
 
d
a
t
a
?
T
w
o
 
c
a
t
e
g
o
r
i
e
s
 
o
f
 
d
a
t
a
 
c
o
m
p
r
e
s
s
i
o
n
 
m
e
t
h
o
d
s
L
o
s
s
y
 
c
o
m
p
r
e
s
s
i
o
n
E
.
g
.
,
 
Z
F
P
 
(
L
i
n
s
t
r
o
m
)
,
 
S
Z
 
(
D
i
)
,
I
S
A
B
E
L
A
 
(
L
a
k
s
h
m
i
n
a
r
a
s
i
m
h
a
n
)
L
o
s
s
l
e
s
s
 
c
o
m
p
r
e
s
s
i
o
n
E
.
g
.
,
 
Z
L
I
B
,
 
L
Z
O
,
 
B
Z
I
P
2
,
 
F
P
C
 
(
B
u
r
t
s
c
h
e
r
)
,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
I
S
O
B
A
R
 
(
S
c
h
e
n
d
e
l
)
Scientific Data Compression
Lossy methods 
provide high compression but
precision is lost
Lossless methods retain 
precision but are not
sufficient to achieve high compression
   - 
require preprocessing techniques
Hard to achieve good
compression ratio.
Scientific Data Compression
Which compression method to use for the given data?
Typically one-time offline analysis on small subset of data
Criteria is based on either compression ratio or compression speed
depending on application needs
One compression method for all the data variables
ISSUES?
Manual effort required to select a compression scheme
 
Limited measurements to define performance criteria
 
Loss of compression benefits -> best compression method differs
for different variables (and may also change for same variable
over time)
                       
Can we do better?
Best compression method differs for different variables
Results from a single WRF output file
Log
 
scale
Log
 
scale
Compression
Speed in MB/s
Compression ratio
ACOMPS:
Adaptive Compression Scheme
An Adaptive compression tool that 
Supports a set of lossless compression methods combined with different memory
preprocessing techniques
Automatically selects the best compression method for each variable in the
dataset
Allows flexible criteria to select the best compression method
Allows compressing data in smaller units/chunks for selective decompression to
increase effective I/O bandwidth
ACOMPS: Adaptive Compression Scheme
Preprocessing techniques supported 
***
 Lossless compression
methods supported 
***
Total 9 compression
techniques
***Other preprocessing and compression(both Lossy
and Lossless) techniques can be added
Bytes segregation (B)
Byte-Wise segregation (BW)
Byte-Wise segregation and XOR (BWXOR)
B - LZO
B - ZLIB
B - BZIP2
BW - LZO
BW - ZLIB
BW - BZIP2
BWXOR - LZO
BWXOR - ZLIB
BWXOR - BZIP2
LZO
ZLIB
BZIP2
Segregate compressible bytes and group these bytes
based on their position in the floating point number.
------------------
Identify and segregate compressible bytes(based on
skewness) for compression
Byte-Wise segregation + XOR
----------
----------------------
------------------
----------
First compressible byte of all grid
cells
Second compressible byte of all grid
cells
Grid cell 1
Grid cell 2
----------------------
-----------------
 
------
 
-----------------
ACOMPS : Adaptive Compression Scheme
performance
x
 = compression_speed
x
 * W
CS
 + compression_ratio
x
 * W
CR
User tunable parameters:
W
CR
 
 => compression ratio
weighting for deciding best
compression method.
W
CS
 => compression speed
weighting for deciding best
compression method.
 => small delta limit to define
acceptance range.
Criteria to evaluate the performance of any compression technique X
 
Variable 
A
 to be
compressed at 
time step 0
Compress the data using technique
Best
A
 and
record 
latestPerf
A
 = performance
BestA
Determine the best technique, 
T
x
.
Record
Best
A
 = T
x
BestPerf
A
 = performance
Tx
ACOMPS : Adaptive Compression Scheme
performance
x
 = compression_speed
x
 * W
CS
 + compression_ratio
x
 * W
CR
User tunable parameters:
W
CR
 
 => compression ratio
weighting for deciding best
compression method.
W
CS
 => compression speed
weighting for deciding best
compression method.
 => small delta limit to define
acceptance range.
Criteria to evaluate the performance of any compression technique X
Variable 
A
 to be
compressed at 
time step t
Compress the data using technique
Best
A
 and
record 
latestPerf
A
 = performance
BestA
No
Determine the best technique, 
T
x
.
Record
Best
A
 = T
x
BestPerf
A
 = performance
Tx
BestPerf
A 
- 
 <
latestPerf
A 
 < BestPerf
A 
+
Compression performance
didn’t change beyond the
limit 
. Continue to use
the current Best
A
YES
 
WRF-LETKF based climate simulations
WRF n
LETKFm
WRF n
LETKF m
Binary
netCDF
Binary
 
initial
conditions
Merge with the initial conditions to guide the simulation 
convert
Revised state
output based on
the observed data
OBSERV
ATIONS
n parallel WRF ensembles
m parallel LETKF processes
DRBSD-2
 
WRF-LETKF based climate simulations
WRF n
LETKFm
WRF n
LETKF m
Binary
netCDF
Binary
 
initial
conditions
Merge with the initial conditions to guide the simulation 
convert
Revised state
output based on
the observed data
OBSERV
ATIONS
n parallel WRF ensembles
m parallel LETKF processes
Example: n = 55, m = 400.  Max grid size : 181 x 151 x 51
Single cycle WRF + single cycle LETKF (9 hours simulation time)
Total simulation time (single cycle WRF + single cycle LETKF)  : ~ 65 mins on cluster
High conversion cost (9 x 55) files => 36.7 minutes  => ~56% of the total simulation time
Large output data size : ~283 GB
 
WRF-LETKF based climate simulations
WRF n
LETKFm
WRF n
LETKF m
Binary
netCDF
Binary
 
initial
conditions
Merge with the initial conditions to guide the simulation 
convert
Revised state
output based on
the observed data
OBSERV
ATIONS
n parallel WRF ensembles
m parallel LETKF processes
WRF ADIOS I/O
Plugin
LETKF ADIOS
I/O Plugin
ADIOS
 +
ACOMPS data
transformation plugin
 Any format supported by ADIOS.
 No conversion required
Results
Deepthought2
Campus cluster at UMD
Number of nodes 
: 484 with 20 cores/node + 4 nodes with 40 core/node
Memory/node
 ~ 128 GB (DDR3 at 1866 Mhz)
Processor :
 dual Intel Ivy Bridge E5-2680v2 at 2.8 GHz
Parallel File system 
: Lustre
Domain size : 181 x 151 grid cells
Vertical levels : 51
Majority of variables are float type
3D variable : XLAT, XLONG, F, T
4D variables: U, V, W, P, PB, RAINC..
Experimental Setup
Climate simulations with  WRF-LETKF
WRF Ensemble 
n = 55 
=> each uses 1 node
No. of MPI processes = (55 x 20) = 1100
LETKF => uses 20 nodes
No. of MPI processes = (20 x 20) = 400
Adaptive Vs Non-adaptive methods :   Output sizes
ACOMPS achieves better
compression
77% improvement in
size over original
13% better than
ADIOS + Bzip2
Only CR (Best compression ratio,  slower) => W
CR
 = 1, W
cs
 = 0
Only CS (Best speed, not as good compression) => W
CR
 = 0, W
cs
 = 1
Best compression method differs for different variables
Adaptive Vs Non-adaptive methods :  Compression time
ACOMPS incurs low
overhead
Close to fastest
ADIOS + LZO
Lower overhead
than   ADIOS +
Bzip2 with  much
better compression
Only CR(Best compression ratio,  slower) => W
CR
 = 1, W
cs
 = 0
Only CS(Best speed, not as good compression) => W
CR
 = 0, W
cs
 = 1
Extend to support more compression methods including both lossless and lossy
compression methods
Thoroughly analyze how the best compression method for a given variable
changes over time
How often it is advantageous to do the re-analysis?
How to enhance the criteria to decide when to re-evaluate in order to adapt
to the changes quickly
Parallelize the analysis phase using threads
Future Directions
Adaptive Compression to Improve I/O
Performance for Climate Simulations
Swati Singhal
swati@cs.umd.edu
17
Alan Sussman
als@cs.umd.edu
UMIACS and Department of Computer Science
Lossless Compression
E. R. Schendel, 
Y. Jin, N. Shah, J. Chen, C. Chang, S.-H. Ku, S. Ethier, S. Klasky, R. Latham, R. Ross, and N. F. Samatova, 
“ISOBAR 
precon-
ditioner for effective and high-throughput lossless data compression,” 
(ICDE
. 2012)
M. Burtscher 
and P. Ratanaworabhan, “
FPC:
 A high-speed compressor for double-precision floating-point data,” 
IEEE Transactions on
Com- puters
, 2009
 Martin Burtscher and Paruj Ratanaworabhan
. gFPC
: A Self-Tuning Compression Algorithm. In Data Compression Conference (DCC),
2010
I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,” 
Communications of the ACM 
1987.
S. Bhattacherjee
, A. Deshpande, and A. Sussman, “
PStore:
 An efficient storage framework for managing scientific data
 (SSDBM 
2014)
S. W. Son,
 Z. Chen, W. Hendrix, A. Agrawal, W. keng Liao, and A. Choudhary, “Data compression for the exascale computing era –
survey,” 
Journal of Supercomputing Frontiers and Innovations
, 2014.
Lossy Compression
Sheng Di and Franck Cappello. Fast Error-Bounded Lossy HPC Data Compression with SZ. IPDPS, 2016
Woody Austin, Grey Ballard, and Tamara G. Kolda. Parallel tensor compression for large-scale scientific data. IPDPS, 2016
Martin Burtscher, Hari Mukka, Annie Yang, and Farbod Hesaaraki. Real-time synthesis of compression algorithms for
scientific data. High Performance Computing, Networking, Storage and Analysis, SC16, 2016
Related Work
Slide Note
Embed
Share

This research focuses on improving I/O performance for climate simulations by employing adaptive data compression techniques. Scientific data compression methods, such as lossy and lossless compression, are explored to reduce data volume and increase effective I/O bandwidth. The study highlights the challenges in selecting the best compression method for different data variables and suggests potential solutions for enhancing compression efficiency in scientific computing applications.

  • Data Compression
  • I/O Performance
  • Scientific Computing
  • Climate Simulations
  • Adaptive Techniques

Uploaded on Sep 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Adaptive Compression to Improve I/O Performance for Climate Simulations Swati Singhal Alan Sussman UMIACS and Department of Computer Science The 2nd International Workshop on Data Reduction for Big Scientific Data 1

  2. Scientific Data Compression? Data reduction is growing concern for scientific computing Motivating Example: Meso-Scale Climate Simulation Application from Department of Atmospheric and Oceanic Sciences at UMD An ensemble simulation with data assimilation on 1.3 million grid points that forecasts for a nine hour period, repeatedly Simulation time: ~65 mins on a cluster for a 9 hour forecast (~60% of the time is spend in I/O and extraneous work) Generates ~238 GB of data per simulation Possible solution Data compression Reduces data volume for I/O and increases effective I/O bandwidth

  3. Scientific Data Compression Scientific data often are multidimensional arrays of floating point numbers, stored in self-describing data formats (e.g., netCDF, HDF, etc.) Difficult to compress -> high entropy in lower order bytes 0.00589 0.00590 00111011 1100000 100000000 11100111 Hard to achieve good compression ratio. 00111011 1100000 101010100 11001010 What methods are available to compress scientific data? Two categories of data compression methods Lossy compression E.g., ZFP (Linstrom), SZ (Di), ISABELA (Lakshminarasimhan) Lossless compression E.g., ZLIB, LZO, BZIP2, FPC (Burtscher), ISOBAR (Schendel) Lossy methods provide high compression but precision is lost Lossless methods retain precision but are not sufficient to achieve high compression - require preprocessing techniques

  4. Scientific Data Compression Which compression method to use for the given data? Typically one-time offline analysis on small subset of data Criteria is based on either compression ratio or compression speed depending on application needs One compression method for all the data variables ISSUES? Manual effort required to select a compression scheme Limited measurements to define performance criteria Loss of compression benefits -> best compression method differs for different variables (and may also change for same variable over time) Can we do better?

  5. Best compression method differs for different variables Results from a single WRF output file COMPRESSION SPEED (MB/S) COMPRESSION RATIO T SR F T SR F 1000 10000.00 100 1000.00 100.00 10 10.00 1.00 1 Prep1 + LZO Prep1 + BZIP2 Prep2 + BZIP2 Prep2 + LZO Prep3 + BZIP2 Prep1 + LZO Prep1 + BZIP2 Prep2 + BZIP2 Prep2 + LZO Prep3 + BZIP2 Log scale Log scale

  6. ACOMPS: Adaptive Compression Scheme An Adaptive compression tool that Supports a set of lossless compression methods combined with different memory preprocessing techniques Automatically selects the best compression method for each variable in the dataset Allows flexible criteria to select the best compression method Allows compressing data in smaller units/chunks for selective decompression to increase effective I/O bandwidth

  7. ACOMPS: Adaptive Compression Scheme Preprocessing techniques supported *** Lossless compression methods supported *** Total 9 compression techniques ---------------------- Grid cell 1 Grid cell 2 Bytes segregation (B) Identify and segregate compressible bytes(based on skewness) for compression ---------------------- B - LZO B - ZLIB B - BZIP2 LZO ZLIB Byte-Wise segregation (BW) BW - LZO BW - ZLIB BW - BZIP2 Segregate compressible bytes and group these bytes based on their position in the floating point number. ------------------ BZIP2 ---------- Second compressible byte of all grid cells First compressible byte of all grid cells BWXOR - LZO BWXOR - ZLIB BWXOR - BZIP2 ------------------ Byte-Wise segregation and XOR (BWXOR) ----------------- Byte-Wise segregation + XOR ---------- ***Other preprocessing and compression(both Lossy and Lossless) techniques can be added ----------------- ------

  8. ACOMPS : Adaptive Compression Scheme Criteria to evaluate the performance of any compression technique X performancex = compression_speedx * WCS + compression_ratiox * WCR Variable A to be compressed at time step 0 User tunable parameters: WCR=> compression ratio weighting for deciding best compression method. WCS => compression speed weighting for deciding best compression method. Determine the best technique, Tx. Record BestA = Tx BestPerfA = performanceTx => small delta limit to define acceptance range. Compress the data using technique BestA and record latestPerfA = performanceBestA

  9. ACOMPS : Adaptive Compression Scheme Criteria to evaluate the performance of any compression technique X performancex = compression_speedx * WCS + compression_ratiox * WCR Variable A to be compressed at time step t User tunable parameters: WCR=> compression ratio weighting for deciding best compression method. BestPerfA - < latestPerfA < BestPerfA + YES Compression performance didn t change beyond the limit . Continue to use the current BestA No WCS => compression speed weighting for deciding best compression method. Determine the best technique, Tx. Record BestA = Tx BestPerfA = performanceTx => small delta limit to define acceptance range. Compress the data using technique BestA and record latestPerfA = performanceBestA

  10. WRF-LETKF based climate simulations Merge with the initial conditions to guide the simulation convert initial conditions LETKFm LETKF m WRF n WRF n Binary Binary netCDF Revised state output based on the observed data m parallel LETKF processes OBSERV ATIONS n parallel WRF ensembles

  11. WRF-LETKF based climate simulations Merge with the initial conditions to guide the simulation convert initial conditions LETKFm LETKF m WRF n WRF n Binary Binary netCDF Revised state output based on the observed data m parallel LETKF processes OBSERV ATIONS n parallel WRF ensembles Example: n = 55, m = 400. Max grid size : 181 x 151 x 51 Single cycle WRF + single cycle LETKF (9 hours simulation time) Total simulation time (single cycle WRF + single cycle LETKF) : ~ 65 mins on cluster High conversion cost (9 x 55) files => 36.7 minutes => ~56% of the total simulation time Large output data size : ~283 GB DRBSD-2

  12. WRF-LETKF based climate simulations Merge with the initial conditions to guide the simulation convert initial conditions LETKFm LETKF m WRF n WRF n Binary Binary netCDF Revised state output based on the observed data m parallel LETKF processes OBSERV ATIONS n parallel WRF ensembles Any format supported by ADIOS. No conversion required LETKF ADIOS I/O Plugin WRF ADIOS I/O Plugin ADIOS + ACOMPS data transformation plugin

  13. Experimental Setup Deepthought2 Campus cluster at UMD Number of nodes : 484 with 20 cores/node + 4 nodes with 40 core/node Memory/node ~ 128 GB (DDR3 at 1866 Mhz) Processor : dual Intel Ivy Bridge E5-2680v2 at 2.8 GHz Parallel File system : Lustre Results Climate simulations with WRF-LETKF WRF Ensemble n = 55 => each uses 1 node No. of MPI processes = (55 x 20) = 1100 Domain size : 181 x 151 grid cells Vertical levels : 51 LETKF => uses 20 nodes No. of MPI processes = (20 x 20) = 400 Majority of variables are float type 3D variable : XLAT, XLONG, F, T 4D variables: U, V, W, P, PB, RAINC..

  14. Adaptive Vs Non-adaptive methods : Output sizes OUTPUT SIZES (IN GIGABYTES) 77% improvement in size over original 283 191 13% better than ADIOS + Bzip2 80.03 73 72.6 67.5 62.8 netCDF + Binary ADIOS ADIOS + Zlib ADIOS + Bzip2 ADIOS + LZO ADIOS + ACOMPS (Only CS) ADIOS + ACOMPS (Only CR) Only CR (Best compression ratio, slower) => WCR = 1, Wcs = 0 Only CS (Best speed, not as good compression) => WCR = 0, Wcs = 1 ACOMPS achieves better compression

  15. Best compression method differs for different variables Adaptive Vs Non-adaptive methods : Compression time LETKF END to END TIMES 70.00 conversion 60.00 Lower overhead than ADIOS + Bzip2 with much better compression WRF TIME IN MINUTES 50.00 Close to fastest ADIOS + LZO 40.00 WRF preprocessing 30.00 20.00 10.00 0.00 netCDF + Binary ADIOS ADIOS + Zlib ADIOS + Bzip2 ADIOS + LZO ADIOS + ACOMPS (Only CS) ADIOS + ACOMPS (Only CR) ACOMPS incurs low overhead Only CR(Best compression ratio, slower) => WCR = 1, Wcs = 0 Only CS(Best speed, not as good compression) => WCR = 0, Wcs = 1

  16. Future Directions Extend to support more compression methods including both lossless and lossy compression methods Thoroughly analyze how the best compression method for a given variable changes over time How often it is advantageous to do the re-analysis? How to enhance the criteria to decide when to re-evaluate in order to adapt to the changes quickly Parallelize the analysis phase using threads

  17. Adaptive Compression to Improve I/O Performance for Climate Simulations Alan Sussman als@cs.umd.edu Swati Singhal swati@cs.umd.edu UMIACS and Department of Computer Science The 2nd International Workshop on Data Reduction for Big Scientific Data 17

  18. Related Work Lossless Compression E. R. Schendel, Y. Jin, N. Shah, J. Chen, C. Chang, S.-H. Ku, S. Ethier, S. Klasky, R. Latham, R. Ross, and N. F. Samatova, ISOBAR precon- ditioner for effective and high-throughput lossless data compression, (ICDE. 2012) M. Burtscher and P. Ratanaworabhan, FPC: A high-speed compressor for double-precision floating-point data, IEEE Transactions on Com- puters, 2009 Martin Burtscher and Paruj Ratanaworabhan. gFPC: A Self-Tuning Compression Algorithm. In Data Compression Conference (DCC), 2010 I. H. Witten, R. M. Neal, and J. G. Cleary, Arithmetic coding for data compression, Communications of the ACM 1987. S. Bhattacherjee, A. Deshpande, and A. Sussman, PStore: An efficient storage framework for managing scientific data (SSDBM 2014) S. W. Son, Z. Chen, W. Hendrix, A. Agrawal, W. keng Liao, and A. Choudhary, Data compression for the exascale computing era survey, Journal of Supercomputing Frontiers and Innovations, 2014. Lossy Compression Sheng Di and Franck Cappello. Fast Error-Bounded Lossy HPC Data Compression with SZ. IPDPS, 2016 Woody Austin, Grey Ballard, and Tamara G. Kolda. Parallel tensor compression for large-scale scientific data. IPDPS, 2016 Martin Burtscher, Hari Mukka, Annie Yang, and Farbod Hesaaraki. Real-time synthesis of compression algorithms for scientific data. High Performance Computing, Networking, Storage and Analysis, SC16, 2016

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#