Enhancing I/O Performance Through Adaptive Data Compression in Climate Simulations

Adaptive Compression to Improve I/O

Performance for Climate Simulations

Swati Singhal

Alan Sussman

UMIACS and Department of Computer Science

•

ata reduction is growing concern for scientific computing

•

Motivating Example: Meso-Scale Climate Simulation Application from

Department of Atmospheric and Oceanic Sciences

at UMD

•

An ensemble simulation with data assimilation on 1.3 million grid points that

forecasts for a nine hour period, repeatedly

•

Simulation time

: ~65 mins on a cluster for a 9 hour forecast (~60% of the

time is spend in I/O and extraneous work)

•

Generates ~238 GB of data per simulation

•

ossible solution

•

Data compression

•

Reduces data volume for I/O and increases effective I/O bandwidth

Scientific Data Compression?

•

•

•

•

•

•

•

Scientific Data Compression

Lossy methods

provide high compression but

precision is lost

Lossless methods retain

precision but are not

sufficient to achieve high compression

require preprocessing techniques

Hard to achieve good

compression ratio.

Scientific Data Compression

•

Which compression method to use for the given data?

•

Typically one-time offline analysis on small subset of data

•

Criteria is based on either compression ratio or compression speed

depending on application needs

•

One compression method for all the data variables

ISSUES?

•

Manual effort required to select a compression scheme

•

Limited measurements to define performance criteria

•

Loss of compression benefits -> best compression method differs

for different variables (and may also change for same variable

over time)

Can we do better?

Best compression method differs for different variables

Results from a single WRF output file

Log

scale

Log

scale

Compression

Speed in MB/s

Compression ratio

ACOMPS:

Adaptive Compression Scheme

•

An Adaptive compression tool that

•

Supports a set of lossless compression methods combined with different memory

preprocessing techniques

•

Automatically selects the best compression method for each variable in the

dataset

•

Allows flexible criteria to select the best compression method

•

Allows compressing data in smaller units/chunks for selective decompression to

increase effective I/O bandwidth

ACOMPS: Adaptive Compression Scheme

Preprocessing techniques supported

***

 Lossless compression

methods supported

***

Total 9 compression

techniques

***Other preprocessing and compression(both Lossy

and Lossless) techniques can be added

Bytes segregation (B)

Byte-Wise segregation (BW)

Byte-Wise segregation and XOR (BWXOR)

B - LZO

B - ZLIB

B - BZIP2

BW - LZO

BW - ZLIB

BW - BZIP2

BWXOR - LZO

BWXOR - ZLIB

BWXOR - BZIP2

LZO

ZLIB

BZIP2

Segregate compressible bytes and group these bytes

based on their position in the floating point number.

------------------

Identify and segregate compressible bytes(based on

skewness) for compression

Byte-Wise segregation + XOR

----------

----------------------

------------------

----------

First compressible byte of all grid

cells

Second compressible byte of all grid

cells

Grid cell 1

Grid cell 2

----------------------

-----------------

------

-----------------

ACOMPS : Adaptive Compression Scheme

performance

 = compression_speed

* W

CS

 + compression_ratio

* W

CR

User tunable parameters:

CR

 => compression ratio

weighting for deciding best

compression method.

CS

 => compression speed

weighting for deciding best

compression method.

△

 => small delta limit to define

acceptance range.

Criteria to evaluate the performance of any compression technique X

Variable

 to be

compressed at

time step 0

Compress the data using technique

Best

and

record

latestPerf

 = performance

BestA

Determine the best technique,

Record

Best

= T

BestPerf

 = performance

Tx

ACOMPS : Adaptive Compression Scheme

performance

 = compression_speed

* W

CS

 + compression_ratio

* W

CR

User tunable parameters:

CR

 => compression ratio

weighting for deciding best

compression method.

CS

 => compression speed

weighting for deciding best

compression method.

△

 => small delta limit to define

acceptance range.

Criteria to evaluate the performance of any compression technique X

Variable

 to be

compressed at

time step t

Compress the data using technique

Best

and

record

latestPerf

 = performance

BestA

No

Determine the best technique,

Record

Best

= T

BestPerf

 = performance

Tx

BestPerf

△

latestPerf

 < BestPerf

△

Compression performance

didn’t change beyond the

limit

△

. Continue to use

the current Best

YES

WRF-LETKF based climate simulations

WRF n

LETKFm

WRF n

LETKF m

Binary

netCDF

Binary

initial

conditions

Merge with the initial conditions to guide the simulation

convert

Revised state

output based on

the observed data

OBSERV

ATIONS

n parallel WRF ensembles

m parallel LETKF processes

DRBSD-2

WRF-LETKF based climate simulations

WRF n

LETKFm

WRF n

LETKF m

Binary

netCDF

Binary

initial

conditions

Merge with the initial conditions to guide the simulation

convert

Revised state

output based on

the observed data

OBSERV

ATIONS

n parallel WRF ensembles

m parallel LETKF processes

Example: n = 55, m = 400.  Max grid size : 181 x 151 x 51

Single cycle WRF + single cycle LETKF (9 hours simulation time)

Total simulation time (single cycle WRF + single cycle LETKF)  : ~ 65 mins on cluster

High conversion cost (9 x 55) files => 36.7 minutes  => ~56% of the total simulation time

Large output data size : ~283 GB

WRF-LETKF based climate simulations

WRF n

LETKFm

WRF n

LETKF m

Binary

netCDF

Binary

initial

conditions

Merge with the initial conditions to guide the simulation

convert

Revised state

output based on

the observed data

OBSERV

ATIONS

n parallel WRF ensembles

m parallel LETKF processes

WRF ADIOS I/O

Plugin

LETKF ADIOS

I/O Plugin

ADIOS

ACOMPS data

transformation plugin

 Any format supported by ADIOS.

 No conversion required

Results

•

Deepthought2

Campus cluster at UMD

Number of nodes

: 484 with 20 cores/node + 4 nodes with 40 core/node

Memory/node

 ~ 128 GB (DDR3 at 1866 Mhz)

Processor :

 dual Intel Ivy Bridge E5-2680v2 at 2.8 GHz

Parallel File system

: Lustre

Domain size : 181 x 151 grid cells

Vertical levels : 51

Majority of variables are float type

3D variable : XLAT, XLONG, F, T

…

4D variables: U, V, W, P, PB, RAINC..

…

Experimental Setup

•

Climate simulations with  WRF-LETKF

WRF Ensemble

n = 55

=> each uses 1 node

No. of MPI processes = (55 x 20) = 1100

LETKF => uses 20 nodes

No. of MPI processes = (20 x 20) = 400

Adaptive Vs Non-adaptive methods :   Output sizes

ACOMPS achieves better

compression

77% improvement in

size over original

13% better than

ADIOS + Bzip2

Only CR (Best compression ratio,  slower) => W

CR

 = 1, W

cs

= 0

Only CS (Best speed, not as good compression) => W

CR

 = 0, W

cs

= 1

Best compression method differs for different variables

Adaptive Vs Non-adaptive methods :  Compression time

ACOMPS incurs low

overhead

Close to fastest

ADIOS + LZO

Lower overhead

than   ADIOS +

Bzip2 with  much

better compression

Only CR(Best compression ratio,  slower) => W

CR

 = 1, W

cs

= 0

Only CS(Best speed, not as good compression) => W

CR

 = 0, W

cs

= 1

•

Extend to support more compression methods including both lossless and lossy

compression methods

•

Thoroughly analyze how the best compression method for a given variable

changes over time

•

How often it is advantageous to do the re-analysis?

•

How to enhance the criteria to decide when to re-evaluate in order to adapt

to the changes quickly

•

Parallelize the analysis phase using threads

Future Directions

Adaptive Compression to Improve I/O

Performance for Climate Simulations

Swati Singhal

swati@cs.umd.edu

Alan Sussman

als@cs.umd.edu

UMIACS and Department of Computer Science

Lossless Compression

•

E. R. Schendel,

Y. Jin, N. Shah, J. Chen, C. Chang, S.-H. Ku, S. Ethier, S. Klasky, R. Latham, R. Ross, and N. F. Samatova,

“ISOBAR

precon-

ditioner for effective and high-throughput lossless data compression,”

(ICDE

. 2012)

•

M. Burtscher

and P. Ratanaworabhan, “

FPC:

 A high-speed compressor for double-precision floating-point data,”

IEEE Transactions on

Com- puters

, 2009

•

 Martin Burtscher and Paruj Ratanaworabhan

. gFPC

: A Self-Tuning Compression Algorithm. In Data Compression Conference (DCC),

•

I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression,”

Communications of the ACM

1987.

•

S. Bhattacherjee

, A. Deshpande, and A. Sussman, “

PStore:

 An efficient storage framework for managing scientific data

 (SSDBM

2014)

•

S. W. Son,

 Z. Chen, W. Hendrix, A. Agrawal, W. keng Liao, and A. Choudhary, “Data compression for the exascale computing era –

survey,”

Journal of Supercomputing Frontiers and Innovations

, 2014.

Lossy Compression

•

Sheng Di and Franck Cappello. Fast Error-Bounded Lossy HPC Data Compression with SZ. IPDPS, 2016

•

Woody Austin, Grey Ballard, and Tamara G. Kolda. Parallel tensor compression for large-scale scientific data. IPDPS, 2016

•

Martin Burtscher, Hari Mukka, Annie Yang, and Farbod Hesaaraki. Real-time synthesis of compression algorithms for

scientific data. High Performance Computing, Networking, Storage and Analysis, SC16, 2016

Related Work

Slide Note

Embed Share

Download

This research focuses on improving I/O performance for climate simulations by employing adaptive data compression techniques. Scientific data compression methods, such as lossy and lossless compression, are explored to reduce data volume and increase effective I/O bandwidth. The study highlights the challenges in selecting the best compression method for different data variables and suggests potential solutions for enhancing compression efficiency in scientific computing applications.

gaus_id Follow

Uploaded on Sep 15, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Adaptive Compression to Improve I/O Performance for Climate Simulations Swati Singhal Alan Sussman UMIACS and Department of Computer Science The 2nd International Workshop on Data Reduction for Big Scientific Data 1

Scientific Data Compression? Data reduction is growing concern for scientific computing Motivating Example: Meso-Scale Climate Simulation Application from Department of Atmospheric and Oceanic Sciences at UMD An ensemble simulation with data assimilation on 1.3 million grid points that forecasts for a nine hour period, repeatedly Simulation time: ~65 mins on a cluster for a 9 hour forecast (~60% of the time is spend in I/O and extraneous work) Generates ~238 GB of data per simulation Possible solution Data compression Reduces data volume for I/O and increases effective I/O bandwidth

Scientific Data Compression Scientific data often are multidimensional arrays of floating point numbers, stored in self-describing data formats (e.g., netCDF, HDF, etc.) Difficult to compress -> high entropy in lower order bytes 0.00589 0.00590 00111011 1100000 100000000 11100111 Hard to achieve good compression ratio. 00111011 1100000 101010100 11001010 What methods are available to compress scientific data? Two categories of data compression methods Lossy compression E.g., ZFP (Linstrom), SZ (Di), ISABELA (Lakshminarasimhan) Lossless compression E.g., ZLIB, LZO, BZIP2, FPC (Burtscher), ISOBAR (Schendel) Lossy methods provide high compression but precision is lost Lossless methods retain precision but are not sufficient to achieve high compression - require preprocessing techniques

Scientific Data Compression Which compression method to use for the given data? Typically one-time offline analysis on small subset of data Criteria is based on either compression ratio or compression speed depending on application needs One compression method for all the data variables ISSUES? Manual effort required to select a compression scheme Limited measurements to define performance criteria Loss of compression benefits -> best compression method differs for different variables (and may also change for same variable over time) Can we do better?

Best compression method differs for different variables Results from a single WRF output file COMPRESSION SPEED (MB/S) COMPRESSION RATIO T SR F T SR F 1000 10000.00 100 1000.00 100.00 10 10.00 1.00 1 Prep1 + LZO Prep1 + BZIP2 Prep2 + BZIP2 Prep2 + LZO Prep3 + BZIP2 Prep1 + LZO Prep1 + BZIP2 Prep2 + BZIP2 Prep2 + LZO Prep3 + BZIP2 Log scale Log scale

ACOMPS: Adaptive Compression Scheme An Adaptive compression tool that Supports a set of lossless compression methods combined with different memory preprocessing techniques Automatically selects the best compression method for each variable in the dataset Allows flexible criteria to select the best compression method Allows compressing data in smaller units/chunks for selective decompression to increase effective I/O bandwidth

ACOMPS: Adaptive Compression Scheme Preprocessing techniques supported *** Lossless compression methods supported *** Total 9 compression techniques ---------------------- Grid cell 1 Grid cell 2 Bytes segregation (B) Identify and segregate compressible bytes(based on skewness) for compression ---------------------- B - LZO B - ZLIB B - BZIP2 LZO ZLIB Byte-Wise segregation (BW) BW - LZO BW - ZLIB BW - BZIP2 Segregate compressible bytes and group these bytes based on their position in the floating point number. ------------------ BZIP2 ---------- Second compressible byte of all grid cells First compressible byte of all grid cells BWXOR - LZO BWXOR - ZLIB BWXOR - BZIP2 ------------------ Byte-Wise segregation and XOR (BWXOR) ----------------- Byte-Wise segregation + XOR ---------- ***Other preprocessing and compression(both Lossy and Lossless) techniques can be added ----------------- ------

ACOMPS : Adaptive Compression Scheme Criteria to evaluate the performance of any compression technique X performancex = compression_speedx * WCS + compression_ratiox * WCR Variable A to be compressed at time step 0 User tunable parameters: WCR=> compression ratio weighting for deciding best compression method. WCS => compression speed weighting for deciding best compression method. Determine the best technique, Tx. Record BestA = Tx BestPerfA = performanceTx => small delta limit to define acceptance range. Compress the data using technique BestA and record latestPerfA = performanceBestA

ACOMPS : Adaptive Compression Scheme Criteria to evaluate the performance of any compression technique X performancex = compression_speedx * WCS + compression_ratiox * WCR Variable A to be compressed at time step t User tunable parameters: WCR=> compression ratio weighting for deciding best compression method. BestPerfA - < latestPerfA < BestPerfA + YES Compression performance didn t change beyond the limit . Continue to use the current BestA No WCS => compression speed weighting for deciding best compression method. Determine the best technique, Tx. Record BestA = Tx BestPerfA = performanceTx => small delta limit to define acceptance range. Compress the data using technique BestA and record latestPerfA = performanceBestA

WRF-LETKF based climate simulations Merge with the initial conditions to guide the simulation convert initial conditions LETKFm LETKF m WRF n WRF n Binary Binary netCDF Revised state output based on the observed data m parallel LETKF processes OBSERV ATIONS n parallel WRF ensembles

WRF-LETKF based climate simulations Merge with the initial conditions to guide the simulation convert initial conditions LETKFm LETKF m WRF n WRF n Binary Binary netCDF Revised state output based on the observed data m parallel LETKF processes OBSERV ATIONS n parallel WRF ensembles Example: n = 55, m = 400. Max grid size : 181 x 151 x 51 Single cycle WRF + single cycle LETKF (9 hours simulation time) Total simulation time (single cycle WRF + single cycle LETKF) : ~ 65 mins on cluster High conversion cost (9 x 55) files => 36.7 minutes => ~56% of the total simulation time Large output data size : ~283 GB DRBSD-2

WRF-LETKF based climate simulations Merge with the initial conditions to guide the simulation convert initial conditions LETKFm LETKF m WRF n WRF n Binary Binary netCDF Revised state output based on the observed data m parallel LETKF processes OBSERV ATIONS n parallel WRF ensembles Any format supported by ADIOS. No conversion required LETKF ADIOS I/O Plugin WRF ADIOS I/O Plugin ADIOS + ACOMPS data transformation plugin

Experimental Setup Deepthought2 Campus cluster at UMD Number of nodes : 484 with 20 cores/node + 4 nodes with 40 core/node Memory/node ~ 128 GB (DDR3 at 1866 Mhz) Processor : dual Intel Ivy Bridge E5-2680v2 at 2.8 GHz Parallel File system : Lustre Results Climate simulations with WRF-LETKF WRF Ensemble n = 55 => each uses 1 node No. of MPI processes = (55 x 20) = 1100 Domain size : 181 x 151 grid cells Vertical levels : 51 LETKF => uses 20 nodes No. of MPI processes = (20 x 20) = 400 Majority of variables are float type 3D variable : XLAT, XLONG, F, T 4D variables: U, V, W, P, PB, RAINC..

Adaptive Vs Non-adaptive methods : Output sizes OUTPUT SIZES (IN GIGABYTES) 77% improvement in size over original 283 191 13% better than ADIOS + Bzip2 80.03 73 72.6 67.5 62.8 netCDF + Binary ADIOS ADIOS + Zlib ADIOS + Bzip2 ADIOS + LZO ADIOS + ACOMPS (Only CS) ADIOS + ACOMPS (Only CR) Only CR (Best compression ratio, slower) => WCR = 1, Wcs = 0 Only CS (Best speed, not as good compression) => WCR = 0, Wcs = 1 ACOMPS achieves better compression

Best compression method differs for different variables Adaptive Vs Non-adaptive methods : Compression time LETKF END to END TIMES 70.00 conversion 60.00 Lower overhead than ADIOS + Bzip2 with much better compression WRF TIME IN MINUTES 50.00 Close to fastest ADIOS + LZO 40.00 WRF preprocessing 30.00 20.00 10.00 0.00 netCDF + Binary ADIOS ADIOS + Zlib ADIOS + Bzip2 ADIOS + LZO ADIOS + ACOMPS (Only CS) ADIOS + ACOMPS (Only CR) ACOMPS incurs low overhead Only CR(Best compression ratio, slower) => WCR = 1, Wcs = 0 Only CS(Best speed, not as good compression) => WCR = 0, Wcs = 1

Future Directions Extend to support more compression methods including both lossless and lossy compression methods Thoroughly analyze how the best compression method for a given variable changes over time How often it is advantageous to do the re-analysis? How to enhance the criteria to decide when to re-evaluate in order to adapt to the changes quickly Parallelize the analysis phase using threads

Adaptive Compression to Improve I/O Performance for Climate Simulations Alan Sussman als@cs.umd.edu Swati Singhal swati@cs.umd.edu UMIACS and Department of Computer Science The 2nd International Workshop on Data Reduction for Big Scientific Data 17

Related Work Lossless Compression E. R. Schendel, Y. Jin, N. Shah, J. Chen, C. Chang, S.-H. Ku, S. Ethier, S. Klasky, R. Latham, R. Ross, and N. F. Samatova, ISOBAR precon- ditioner for effective and high-throughput lossless data compression, (ICDE. 2012) M. Burtscher and P. Ratanaworabhan, FPC: A high-speed compressor for double-precision floating-point data, IEEE Transactions on Com- puters, 2009 Martin Burtscher and Paruj Ratanaworabhan. gFPC: A Self-Tuning Compression Algorithm. In Data Compression Conference (DCC), 2010 I. H. Witten, R. M. Neal, and J. G. Cleary, Arithmetic coding for data compression, Communications of the ACM 1987. S. Bhattacherjee, A. Deshpande, and A. Sussman, PStore: An efficient storage framework for managing scientific data (SSDBM 2014) S. W. Son, Z. Chen, W. Hendrix, A. Agrawal, W. keng Liao, and A. Choudhary, Data compression for the exascale computing era survey, Journal of Supercomputing Frontiers and Innovations, 2014. Lossy Compression Sheng Di and Franck Cappello. Fast Error-Bounded Lossy HPC Data Compression with SZ. IPDPS, 2016 Woody Austin, Grey Ballard, and Tamara G. Kolda. Parallel tensor compression for large-scale scientific data. IPDPS, 2016 Martin Burtscher, Hari Mukka, Annie Yang, and Farbod Hesaaraki. Real-time synthesis of compression algorithms for scientific data. High Performance Computing, Networking, Storage and Analysis, SC16, 2016

Enhancing I/O Performance Through Adaptive Data Compression in Climate Simulations

Download Presentation

Presentation Transcript

Related

More Related Content