Enhancing Data Archiving and Stewardship Efforts

Data archiving and
stewardship
Presentation of initial draft by Martine De Mazière and Tove Svendby
Alberto Redondas and Victoria Sofieva, Rapporteurs
Data centers
Recommendations from ORM-11
Continue to encourage data providers to submit or link to established databases to avoid
a proliferation of databases, and especially to avoid the loss of data after the end of a
measurement or (inter)calibration campaign or project, and to enable possible
reprocessing of the data.
Further target enhanced linkage among data centres. This requires that data centres
coordinate more and make further progress with the exchange of metadata and
interoperability
ORM-11 recommendation is up-to-date:
The creation of central data portals (e.g., at the World Data Centres) that
provide visibility and linkage to the ensemble of existing data centres for ozone
research-related data would enhance the possibility of making synergistic use
of all the data and, as such, increase the effectiveness and valorization of data
acquisition efforts”
Enhanced coordination/collaboration between data centers on data formats,
especially on metadata and data availability and discovery would be useful
ORM-12 recommendation:
WMO World Data Centers to  urgently advance data interoperability
Traceability and metadata
The delegates encourage the instrument based  central
processing with  the storage of the raw data and metadata,  which
allow  reproducibility, reprocessing, and improves uncertainty
evaluation  and data harmonization
The delegates emphasized importance of including all metadata
that are required for data use and reprocessing. This is especially
important when data conversion  (for example, from pressure to
altitude grid) to the standards of datacenters has been applied
The format and content for metadata(
dataset- or community-
specific
) should be negotiated. GCOS recommendations on
metadata  should be considered.
Encouraging full curation of data
Digitize and curate historical data
It is important
A big effort, resources should be allocated
Long-term data preservation – ground-based and satellite data
Curation of metadata
Curation of data processing software
Experience for future
Save enough information for future data reprocessing
Strongly encourage the full curation of data, including historical data. In particular, the curated data should
include all metadata and ancillary data.
Address the  need to allocate resources for digitizing and curating historical data for ozone and related species,
as well as for ancillary data (e.g., laboratory spectroscopic data, station information), where available and
before the information and knowledge get lost, in order to include the data in modern database systems.
Data availability and traceability
Data availability must be implemented according to FAIR data
principles. This is supported by the assignment of a DOI and data
license to the data sets. Data publishing with an associated DOI
should be encouraged to provide data to the scientific community
and to give recognition to scientists and the funding agencies for
providing the data. This may also offer a good solution for the
archiving (including traceability) of model output or single data or
versioning of data processing codes.
 An open data policy is recommended, but with the requirement to
give appropriate credit to the data originator. A way must be found
to ensure that these credits are given, as they are often taken as a
key performance indicator for funding agencies.
Data format
A user-friendly data format is recommended
A decision about a common data format and metadata standard would
facilitate the exploitation of data retrieved from different data centres.
Several common data standards, like netCDF-CF or GEOMS HDF are used by
several Earth observation communities (e.g., satellite data providers and the
climate modelling community) and are supported by a number of tools for
extracting and visualizing the data.
It is most important that the formats enable a good structuring of the data
and metadata; the ‘packaging’ of data and metadata, whether it is netCDF or
HDF or else, is less important as there are many tools available to convert
from one to another. Datacenters should make the data available in different
standard formats or provide the appropriate conversion tools.
More and sustainable resources should be allocated to the data centres.
Other recommendations
Satellite overpass data coincident with ground-based network station
must be readily available
There is a progress in addressing this (e.g., tools at EVDC)
Potentially the same overpass dataset can be  created with
models/reanalyses (e.g., NDACC stores MERRA-2 ‘overpass’ data)
A service through Copernicus?
Campaign data  should be  stored, together with metadata,  and
potentially made also publicly available.
Calibration  data must be preserved
Achievements
Enhanced  and more timely  availability of ground-based, satellite  and
modelling data  through several data centers (NDACC, WOUDS, ESA,
NASA, ACTRIS, EUBREWNET, Copernicus CDS, CAMS and others )
Central data processing systems are taken further in several monitoring
networks, such as EUBREWNET, for selected NDACC-type data, Pandora
data in the framework of ACTRIS, FRM4DOAS programme
A progress has been made on enhanced linkage among data centres 
Progress has been made on data publishing with an associated digital
object identifier (DOI).
 Data centres have made progress in providing data in several accepted
standard formats and providing different data versions
Slide Note
Embed
Share

Encouraging full curation of historical data and promoting data availability through traceability and metadata enhancement. Recommendations for data centers to improve coordination and interoperability for efficient data utilization.

  • Data stewardship
  • Metadata enhancement
  • Historical data curation
  • Data traceability
  • Data centers

Uploaded on Feb 25, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Data archivingand stewardship Presentation of initial draft by Martine De Mazi re and Tove Svendby Alberto Redondas and Victoria Sofieva, Rapporteurs

  2. Data centers Recommendationsfrom ORM-11 Continue to encourage data providers to submit or link to established databases to avoid a proliferation of databases, and especially to avoid the loss of data after the end of a measurement or (inter)calibration campaign or project, and to enable possible reprocessing of the data. Further target enhanced linkage among data centres. This requires that data centres coordinate more and make further progress with the exchange of metadata and interoperability ORM-11 recommendation is up-to-date: The creation of central data portals (e.g., at the World Data Centres) that provide visibility and linkage to the ensemble of existing data centres for ozone research-related data would enhance the possibility of making synergistic use of all the data and, as such, increase the effectiveness and valorization of data acquisition efforts Enhanced coordination/collaboration between data centers on data formats, especially on metadata and data availability and discovery would be useful ORM-12 recommendation: WMO World Data Centers to urgently advance data interoperability

  3. Traceabilityand metadata The delegates encourage the instrument based central processing with the storage of the raw data and metadata, which allow reproducibility, reprocessing, and improves uncertainty evaluation and data harmonization The delegates emphasized importance of including all metadata that are required for data use and reprocessing. This is especially important when data conversion (for example, from pressure to altitude grid) to the standards of datacenters has been applied The format and content for metadata(dataset- or community- specific) should be negotiated. GCOS recommendations on metadata should be considered.

  4. Encouragingfullcurationof data Digitizeand curatehistorical data It is important A big effort, resources should beallocated Long-term data preservation ground-based and satellite data Curationof metadata Curationof data processingsoftware Experience for future Saveenoughinformationfor futuredata reprocessing Strongly encourage the full curation of data, including historical data. In particular, the curated data should include all metadata and ancillary data. Address the need to allocate resources for digitizing and curating historical data for ozone and related species, as well as for ancillary data (e.g., laboratory spectroscopic data, station information), where available and before the information and knowledge get lost, in order to include the data in modern database systems.

  5. Data availability and traceability Data availability must be implemented according to FAIR data principles. This is supported by the assignment of a DOI and data license to the data sets. Data publishing with an associated DOI should be encouraged to provide data to the scientific community and to give recognition to scientists and the funding agencies for providing the data. This may also offer a good solution for the archiving (including traceability) of model output or single data or versioning of data processing codes. An open data policy is recommended, but with the requirement to give appropriate credit to the data originator. A way must be found to ensure that these credits are given, as they are often taken as a key performance indicator for funding agencies.

  6. Data format A user-friendly data format is recommended A decision about a common data format and metadata standard would facilitate the exploitation of data retrieved from different data centres. Several common data standards, like netCDF-CF or GEOMS HDF are used by several Earth observation communities (e.g., satellite data providers and the climate modelling community) and are supported by a number of tools for extracting and visualizing the data. It is most important that the formats enable a good structuring of the data and metadata; the packaging of data and metadata, whether it is netCDF or HDF or else, is less important as there are many tools available to convert from one to another. Datacenters should make the data available in different standard formats or provide the appropriate conversion tools. More and sustainable resources should be allocated to the data centres.

  7. Other recommendations Satellite overpass data coincident with ground-based network station must be readily available There is a progress in addressing this (e.g., tools at EVDC) Potentially the same overpass dataset can be created with models/reanalyses (e.g., NDACC stores MERRA-2 overpass data) A service through Copernicus? Campaign data should be stored, together with metadata, and potentially made also publicly available. Calibration data must be preserved

  8. Achievements Enhanced and more timely availability of ground-based, satellite and modelling data through several data centers (NDACC, WOUDS, ESA, NASA, ACTRIS, EUBREWNET, Copernicus CDS, CAMS and others ) Central data processing systems are taken further in several monitoring networks, such as EUBREWNET, for selected NDACC-type data, Pandora data in the framework of ACTRIS, FRM4DOAS programme A progress has been made on enhanced linkage among data centres Progress has been made on data publishing with an associated digital object identifier (DOI). Data centreshave made progress in providing data in several accepted standard formats and providing different data versions

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#