Introduction to HDF5: Hierarchical Data Format

undefined
A Brief Introduction to HDF5
Quincey Koziol
Director of Core Software and HPC
The HDF Group
koziol@hdfgroup.org
March 5, 2015
1
HPC Oil & Gas Workshop
http://bit.ly/HDF5-HPCOGW-2015
Why use HDF5?
C
h
a
l
l
e
n
g
i
n
g
 
D
a
t
a
:
Application data that pushes the limits of traditional
solutions.
S
o
f
t
w
a
r
e
 
S
o
l
u
t
i
o
n
s
:
For very large and/or complex data
With very fast access requirements
Easily share data across a platforms
Use different programming languages and OSs.
Take advantage of the tools that understand HDF5.
Enable long-term preservation of data.
March 5, 2015
2
HPC Oil & Gas Workshop
http://bit.ly/HDF5-HPCOGW-2015
HDF5 is like …
March 5, 2015
HPC Oil & Gas Workshop
3
What is HDF5?
March 5, 2015
HPC Oil & Gas Workshop
4
 
HDF5 == Hierarchical Data Format, v5
 
 
A
 
f
l
e
x
i
b
l
e
 
d
a
t
a
 
m
o
d
e
l
Structures for data organization and specific
ation
 
O
p
e
n
 
s
o
u
r
c
e
 
s
o
f
t
w
a
r
e
Implements the data model
 
P
o
r
t
a
b
l
e
 
f
i
l
e
 
f
o
r
m
a
t
Designed for high volume or complex data
March 5, 2015
5
HDF5 Data Model
Groups – provide structure among objects
Datasets – where the primary data goes
Data arrays
Rich set of datatype options
Flexible, efficient storage and I/O
Attributes - for metadata
 
Everything else is built essentially from
Everything else is built essentially from
these parts.
these parts.
HPC Oil & Gas Workshop
HDF5 Software
HDF5 home page:
 
  
http://hdfgroup.org/HDF5/
March 5, 2015
HPC Oil & Gas Workshop
6
Useful Tools For New Users
March 5, 2015
HPC Oil & Gas Workshop
7
Recent HPC Success Story
Performance results on Blue Waters @ NCSA
I/O Kernel of a DOE Plasma Physics
application
R
u
n
n
i
n
g
 
o
n
 
2
9
8
,
0
4
8
 
c
o
r
e
s
~10 Trillion particles
S
i
n
g
l
e
 
2
9
1
T
B
 
H
D
F
5
 
f
i
l
e
Achieved 52 GB/s
~50% of the peak performance
Using 1 GB stripe size and 160 Lustre OSTs
March 5, 2015
8
HPC Oil & Gas Workshop
HDF5 in Oil & Gas
REMSQL: Standard for reservoir data
(Energistics)
http://www.energistics.org/reservoir/resqml-
standards/current-standards
H5EM-TS: Exchange standard for field EM data
(EMGS, Statoil, Interaction)
ftp://fileformats.emgs.com/H5EM-
TS_1.0/documentation/H5EM-
TS_information_sheet.pdf
March 5, 2015
HPC Oil & Gas Workshop
9
HDF5 in Oil & Gas
TEMHDF: Exchange standard for
MetalMapper and other EMI data
ftp://geom.geometrics.com/pub/Data/TEM2H5_
Deliverables/TEM2HDF_RefManual.pdf
PH5: Archival format for active source seismic
data (moving away from SEG-Y, to HDF5)
http://www.passcal.nmt.edu/content/ph5-what-it
 
Petrel: E&P Workflow and Visualization
http://www.software.slb.com/products/platform/
Pages/petrel.aspx
March 5, 2015
HPC Oil & Gas Workshop
10
HDF5 in Oil & Gas
Globe Claritas: HDF5 is format for their seismic
processing software
SEG-Y vs. HDF5 Whitepaper:
http://www.globeclaritas.com/content/download/10
303/55223/file/HDF5%20For%20Seismic%20Refle
ction%20Datasets.pdf
News release:
http://www.globeclaritas.com/Claritas/Overview/Lat
est-Release
PDF data sheet:
http://www.globeclaritas.com/content/download/88
39/47774/file/Claritas%20HDF5.pdf
Powerpoint:
http://www.slideshare.net/guy_maslen/a-quick-
start-guide-to-using-hdf5-in-globe-claritas
March 5, 2015
HPC Oil & Gas Workshop
11
Where We’ll Be Soon: HDF5 1.10
Beta release: 
Fall 2015
Major Features:
Single-Writer/Multiple-Reader (SWMR)
Virtual Datasets
Improved scalability of chunked datasets
Parallel I/O performance and capabilities
March 5, 2015
12
HPC Oil & Gas Workshop
Other Items of Interest
We’re 
not
 planning to change current
multi-threaded concurrency behavior
H
D
F
5
 
E
x
c
e
l
 
A
d
d
-
i
n
:
 
H
E
X
A
D
REST-based service for HDF5 data
HDF Compass visualization package
March 5, 2015
13
HPC Oil & Gas Workshop
undefined
Thank You!
Questions & Comments?
 
March 5, 2015
14
HPC Oil & Gas Workshop
http://bit.ly/HDF5-HPCOGW-2015
The HDF Group Services
Helpdesk and Mailing Lists
Available to all users as a first level of support:
help@hdfgroup.org
, 
hdf-forum@lists.hdfgroup.org
Priority Support
Rapid issue resolution and advice
Consulting
Needs assessment, troubleshooting, design reviews, etc.
Training
Tutorials and hands-on practical experience
Enterprise Support
Coordinate HDF activities across departments
Special Projects
Adapting customer applications to HDF
New features and tools
Research and Development
March 5, 2015
15
HPC Oil & Gas Workshop
http://bit.ly/HDF5-HPCOGW-2015
HDF5 1.10 Planned Features: SWMR
Improves HDF5 for Data Acquisition:
Allows simultaneous data gathering and
monitoring/analysis
Focused on storing data sequences for
high-speed data sources
Supports ‘Ordered Updates’ to file:
Crash-proofs accessing HDF5 file
Possibly uses small amount of extra space
March 5, 2015
16
HPC Oil & Gas Workshop
HDF5 1.10 Planned Features
Virtual Object Layer (VOL)
Provides the HDF5 data model and API, but
allows different underlying storage
mechanisms
Intercepts all HDF5 API calls that can touch
the data on disk and routes them to a VOL
plugin
Possibly SEG-Y VOL plugin?
March 5, 2015
17
HPC Oil & Gas Workshop
HDF5 1.10 Planned Features
‘Virtual’ Datasets
Can “stitch together” multiple ‘source’
datasets into a single ‘virtual’ dataset
Supports unlimited dimensions in both source
and virtual datasets
March 5, 2015
18
HPC Oil & Gas Workshop
HDF5 1.10 Planned Features: Chunk Imp.
March 5, 2015
19
HPC Oil & Gas Workshop
HDF5 1.10 Planned Features: HPC
Continue to improve our use of MPI and
parallel file system features
Remove ‘truncate’ operation on file close, etc.
Reduce # of I/O accesses for metadata access
Collective Read/Write of metadata
Multi-dataset Collective I/O
Support for compression in parallel
Collective access mode only
Possibly Support Single-Write/Multiple-Reader
(SWMR) access in parallel
March 5, 2015
20
HPC Oil & Gas Workshop
HDF5 Roadmap
March 5, 2015
21
Concurrency
Single-Writer/Multiple-
Reader (SWMR)
Internal threading
Virtual Object Layer (VOL)
Data Analysis
Query / View / Index APIs
Native HDF5 client/server
Performance
Scalable chunk indices
Metadata aggregation
and Page buffering
Asynchronous I/O
Variable-length
records
Fault tolerance
Parallel I/O
I/O Autotuning
HPC Oil & Gas Workshop
The best way to predict the
future is to invent it.
– Alan Kay
Where We’re Not Going
We’re 
not
 changing multi-threaded
concurrency support
Keep “global lock” on library
Will
 focus on asynchronous I/O instead
Will
 be using threads internally though
March 5, 2015
22
HPC Oil & Gas Workshop
Codename “HEXAD”
H
D
F
5
 
E
x
c
e
l
 
A
d
d
-
i
n
:
 
H
E
X
A
D
Lets you do the usual things including:
Display content (file structure, detailed object info)
Create/read/write datasets
Create/read/update attributes
Plenty of ideas for bells & whistles
HDF5 Image & PyTables support, etc.
S
e
n
d
 
i
n
 
y
o
u
r
 
M
u
s
t
 
H
a
v
e
/
N
i
c
e
 
T
o
 
H
a
v
e
 
l
i
s
t
!
*
Stay tuned for the beta program
* 
help@hdfgroup.org
March 5, 2015
23
HPC Oil & Gas Workshop
HDF Server
REST-based service for HDF5 data
Reference Implementation for REST API
Developed in Python using Tornado Framework
Supports Read/Write operations
Clients can be Python/C/Fortran or Web Page
Let us know what specific features you’d like to
see.
March 5, 2015
24
HPC Oil & Gas Workshop
HDF Compass
“Simple” Python HDF5 Viewer application
Cross platform (Windows/Mac/Linux)
Native look and feel
Can display extremely large HDF5 files
View HDF5 files and OpenDAP resources
Plugin model enables different file
formats/remote resources to be supported
Community-based development model
March 5, 2015
25
HPC Oil & Gas Workshop
March 5, 2015
26
Brief History of HDF
1987
 
At NCSA (University of Illinois), forms task force to
 
create an architecture-independent file format and
library, 
 
which 
becomes HDF
Early 
 
NASA adopts HDF for Earth Observing System project
 1990’
s
1996 
 
DOE 
collaborates with the HDF group (at NCSA) to
  
create 
Big HDF”                which becomes HDF5
1998 
 
HDF5 released, with support from DOE, NASA & NCSA
2006 
 
The HDF Group spins out of University of Illinois as
 
non-profit corporation
HPC Oil & Gas Workshop
The HDF Group
Established in 1988
18 years at University of Illinois’ National Center
for Supercomputing Applications
8 years as independent non-profit company:
“The HDF Group”
The HDF Group owns HDF4 and HDF5
HDF4 & HDF5 formats, libraries, and tools are
open source and freely available with BSD-style
license
March 5, 2015
27
HPC Oil & Gas Workshop
Slide Note
Embed
Share

HDF5, a flexible data model, is designed for managing challenging data with fast access requirements. Its structure includes groups, datasets, and arrays, facilitating efficient storage and I/O operations. Various tools and examples enable users to work with HDF5 files effectively, making it a popular choice for handling large and complex data in diverse applications.

  • HDF5
  • Hierarchical Data Format
  • Data Model
  • Data Organization
  • Large Data

Uploaded on Oct 02, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. The HDF Group A Brief Introduction to HDF5 Quincey Koziol Director of Core Software and HPC The HDF Group koziol@hdfgroup.org http://bit.ly/HDF5-HPCOGW-2015 March 5, 2015 HPC Oil & Gas Workshop 1 www.hdfgroup.org

  2. Why use HDF5? Challenging Data: Application data that pushes the limits of traditional solutions. Software Solutions: For very large and/or complex data With very fast access requirements Easily share data across a platforms Use different programming languages and OSs. Take advantage of the tools that understand HDF5. Enable long-term preservation of data. http://bit.ly/HDF5-HPCOGW-2015 March 5, 2015 HPC Oil & Gas Workshop 2 www.hdfgroup.org

  3. HDF5 is like March 5, 2015 HPC Oil & Gas Workshop 3 www.hdfgroup.org

  4. What is HDF5? HDF5 == Hierarchical Data Format, v5 A flexible data model Structures for data organization and specification Open source software Implements the data model Portable file format Designed for high volume or complex data March 5, 2015 HPC Oil & Gas Workshop 4 www.hdfgroup.org

  5. HDF5 Data Model Groups provide structure among objects Datasets where the primary data goes Data arrays Rich set of datatype options Flexible, efficient storage and I/O Attributes - for metadata Everything else is built essentially from these parts. March 5, 2015 HPC Oil & Gas Workshop 5 www.hdfgroup.org

  6. HDF5 Software HDF5 home page: http://hdfgroup.org/HDF5/ March 5, 2015 HPC Oil & Gas Workshop 6 www.hdfgroup.org

  7. Useful Tools For New Users h5dump, h5ls: Tools to dump or list contents of HDF5 file HDFView: Java browser for HDF5 files http://www.hdfgroup.org/hdf-java-html/hdfview/ HDF5 Examples (C, Fortran, Java, Python, Matlab) http://www.hdfgroup.org/ftp/HDF5/examples/ h5cc, h5c++, h5fc: Scripts to compile applications March 5, 2015 HPC Oil & Gas Workshop 7 www.hdfgroup.org

  8. Recent HPC Success Story Performance results on Blue Waters @ NCSA I/O Kernel of a DOE Plasma Physics application Running on 298,048 cores ~10 Trillion particles Single 291TB HDF5 file Achieved 52 GB/s ~50% of the peak performance Using 1 GB stripe size and 160 Lustre OSTs March 5, 2015 HPC Oil & Gas Workshop 8 www.hdfgroup.org

  9. HDF5 in Oil & Gas REMSQL: Standard for reservoir data (Energistics) http://www.energistics.org/reservoir/resqml- standards/current-standards H5EM-TS: Exchange standard for field EM data (EMGS, Statoil, Interaction) ftp://fileformats.emgs.com/H5EM- TS_1.0/documentation/H5EM- TS_information_sheet.pdf March 5, 2015 HPC Oil & Gas Workshop 9 www.hdfgroup.org

  10. HDF5 in Oil & Gas TEMHDF: Exchange standard for MetalMapper and other EMI data ftp://geom.geometrics.com/pub/Data/TEM2H5_ Deliverables/TEM2HDF_RefManual.pdf PH5: Archival format for active source seismic data (moving away from SEG-Y, to HDF5) http://www.passcal.nmt.edu/content/ph5-what-it Petrel: E&P Workflow and Visualization http://www.software.slb.com/products/platform/ Pages/petrel.aspx March 5, 2015 HPC Oil & Gas Workshop 10 www.hdfgroup.org

  11. HDF5 in Oil & Gas Globe Claritas: HDF5 is format for their seismic processing software SEG-Y vs. HDF5 Whitepaper: http://www.globeclaritas.com/content/download/10 303/55223/file/HDF5%20For%20Seismic%20Refle ction%20Datasets.pdf News release: http://www.globeclaritas.com/Claritas/Overview/Lat est-Release PDF data sheet: http://www.globeclaritas.com/content/download/88 39/47774/file/Claritas%20HDF5.pdf Powerpoint: http://www.slideshare.net/guy_maslen/a-quick- start-guide-to-using-hdf5-in-globe-claritas March 5, 2015 HPC Oil & Gas Workshop 11 www.hdfgroup.org

  12. Where Well Be Soon: HDF5 1.10 Beta release: Fall 2015 Major Features: Single-Writer/Multiple-Reader (SWMR) Virtual Datasets Improved scalability of chunked datasets Parallel I/O performance and capabilities March 5, 2015 HPC Oil & Gas Workshop 12 www.hdfgroup.org

  13. Other Items of Interest We re not planning to change current multi-threaded concurrency behavior HDF5 Excel Add-in: HEXAD REST-based service for HDF5 data HDF Compass visualization package March 5, 2015 HPC Oil & Gas Workshop 13 www.hdfgroup.org

  14. The HDF Group Thank You! Questions & Comments? http://bit.ly/HDF5-HPCOGW-2015 March 5, 2015 HPC Oil & Gas Workshop 14 www.hdfgroup.org

  15. The HDF Group Services Helpdesk and Mailing Lists Available to all users as a first level of support: help@hdfgroup.org, hdf-forum@lists.hdfgroup.org Priority Support Rapid issue resolution and advice Consulting Needs assessment, troubleshooting, design reviews, etc. Training Tutorials and hands-on practical experience Enterprise Support Coordinate HDF activities across departments Special Projects Adapting customer applications to HDF New features and tools Research and Development http://bit.ly/HDF5-HPCOGW-2015 March 5, 2015 HPC Oil & Gas Workshop 15 www.hdfgroup.org

  16. HDF5 1.10 Planned Features: SWMR Improves HDF5 for Data Acquisition: Allows simultaneous data gathering and monitoring/analysis Focused on storing data sequences for high-speed data sources Supports Ordered Updates to file: Crash-proofs accessing HDF5 file Possibly uses small amount of extra space March 5, 2015 HPC Oil & Gas Workshop 16 www.hdfgroup.org

  17. HDF5 1.10 Planned Features Virtual Object Layer (VOL) Provides the HDF5 data model and API, but allows different underlying storage mechanisms Intercepts all HDF5 API calls that can touch the data on disk and routes them to a VOL plugin Possibly SEG-Y VOL plugin? March 5, 2015 HPC Oil & Gas Workshop 17 www.hdfgroup.org

  18. HDF5 1.10 Planned Features Virtual Datasets Can stitch together multiple source datasets into a single virtual dataset Supports unlimited dimensions in both source and virtual datasets March 5, 2015 HPC Oil & Gas Workshop 18 www.hdfgroup.org

  19. HDF5 1.10 Planned Features: Chunk Imp. Dataset type Index type Space Speed improvements Same storage space as contiguous dataset storage (no index) improvements Constant time lookups Faster parallel I/O no unlimited dimensions, no I/O filters, no missing chunks no unlimited dimensions implicit no actual chunk index Smaller index overhead Constant time lookups fixed sized smaller chunk index extensible array 1 unlimited dimension Smaller index overhead Constant time lookups and appends Faster 2+ unlimited dimension Improved B-tree* Smaller index overhead March 5, 2015 HPC Oil & Gas Workshop 19 www.hdfgroup.org

  20. HDF5 1.10 Planned Features: HPC Continue to improve our use of MPI and parallel file system features Remove truncate operation on file close, etc. Reduce # of I/O accesses for metadata access Collective Read/Write of metadata Multi-dataset Collective I/O Support for compression in parallel Collective access mode only Possibly Support Single-Write/Multiple-Reader (SWMR) access in parallel March 5, 2015 HPC Oil & Gas Workshop 20 www.hdfgroup.org

  21. HDF5 Roadmap Concurrency Single-Writer/Multiple- Reader (SWMR) Internal threading Virtual Object Layer (VOL) Data Analysis Query / View / Index APIs Native HDF5 client/server Performance Scalable chunk indices Metadata aggregation and Page buffering Asynchronous I/O Variable-length records Fault tolerance Parallel I/O I/O Autotuning The best way to predict the future is to invent it. Alan Kay HPC Oil & Gas Workshop 21 www.hdfgroup.org March 5, 2015

  22. Where Were Not Going We re not changing multi-threaded concurrency support Keep global lock on library Will focus on asynchronous I/O instead Will be using threads internally though March 5, 2015 HPC Oil & Gas Workshop 22 www.hdfgroup.org

  23. Codename HEXAD HDF5 Excel Add-in: HEXAD Lets you do the usual things including: Display content (file structure, detailed object info) Create/read/write datasets Create/read/update attributes Plenty of ideas for bells & whistles HDF5 Image & PyTables support, etc. Send in your Must Have/Nice To Have list!* Stay tuned for the beta program * help@hdfgroup.org March 5, 2015 HPC Oil & Gas Workshop 23 www.hdfgroup.org

  24. HDF Server REST-based service for HDF5 data Reference Implementation for REST API Developed in Python using Tornado Framework Supports Read/Write operations Clients can be Python/C/Fortran or Web Page Let us know what specific features you d like to see. March 5, 2015 HPC Oil & Gas Workshop 24 www.hdfgroup.org

  25. HDF Compass Simple Python HDF5 Viewer application Cross platform (Windows/Mac/Linux) Native look and feel Can display extremely large HDF5 files View HDF5 files and OpenDAP resources Plugin model enables different file formats/remote resources to be supported Community-based development model March 5, 2015 HPC Oil & Gas Workshop 25 www.hdfgroup.org

  26. Brief History of HDF 1987 At NCSA (University of Illinois), forms task force to create an architecture-independent file format and library, which becomes HDF Early NASA adopts HDF for Earth Observing System project 1990 s 1996 DOE collaborates with the HDF group (at NCSA) to create Big HDF which becomes HDF5 1998 HDF5 released, with support from DOE, NASA & NCSA 2006 The HDF Group spins out of University of Illinois as non-profit corporation March 5, 2015 HPC Oil & Gas Workshop 26 www.hdfgroup.org

  27. The HDF Group Established in 1988 18 years at University of Illinois National Center for Supercomputing Applications 8 years as independent non-profit company: The HDF Group The HDF Group owns HDF4 and HDF5 HDF4 & HDF5 formats, libraries, and tools are open source and freely available with BSD-style license March 5, 2015 HPC Oil & Gas Workshop 27 www.hdfgroup.org

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#