GEDE Focus Area Repositories
Successful aggregation and transformation of key documents into essential assertions, clearing up uncertainties across communities and enabling agreement on core messages. The GEDE Focus Group plays a crucial role in overcoming barriers and confusion to drive action. Repositories and registries store and provide access to data and metadata via standard protocols, with trustworthy practices essential for an effective data infrastructure.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
GEDE Focus Area Repositories - motivation - Peter Wittenburg
PID Focus Area Successful many differing documents about PIDs - confusing for communities all communities suggested key documents (about 30) for this topic all documents have been studied and transformed in to 61 essential assertions these assertions were aggregated, grouped and compared documents from expert organisations/initiatives were also included results: clarification of many uncertainties and terminology across communities detecting agreement on core messages if wording differences are ignored some ongoing discussions on usage aspects (granularity, versioning, etc.) GEDE Focus Group is very important to overcome barriers and confusions GEDE Focus Group helped to turn to action
Repository Focus Area - What are repositories? store and give access to data and metadata via standard protocols may give services on top of the stored data (products) have a team that does data management and stewardship on stored data activities are guided by openly described policies and procedures take care of long-term preservation of data and metadata take care that schemas and concepts being used are registered register themselves in repository registries such as re3data trustworthy repositories are key pillars of our emerging data infrastructure they are trustworthy if they participate in regular quality assessments according to widely accepted standards (DSA/WDS) it is the task of the certification standards to include essentials (long term funding, etc.)
Repository Focus Area - What are registries? registries also store and give access to data via standard protocols is there a difference? from an IT perspective the difference is in functions and roles in general registries do not store the resources (data) but aggregate metadata about data there is a wide variety of metadata types and thus registries registries have a team that takes care of proper management of metadata activities of registries are guided by openly described policies and procedures registries take care that schemas and concepts being used are registered registires should be findable and thus be registered trustworthy registries are key pillars of our emerging data infrastructure they are trustworthy if they participate in regular quality assessments
Repository Focus Area - How are repositories organised? repositories can be organised as discipline specific entities with deep knowledge about the discipline or domain repositories can be organised according to organisational boundaries (institutes, research organisations, countries, etc.) repositories can be organised according to commercial interests metadata should be open and free to use they may give access to data requesting to sign a license, to describe the intended use, to adhere to ethical and rights norms, to pay a certain fee, etc. increasingly often repositories are part of a number of trust federations
Repository Focus Area - What are the open Questions? what are the requirements, tasks and roles in the different communities? are there special functional needs? what are the typical policies and procedures being applied? can we specify ONE generic API to repositories? do all repositories need to participate in quality assessment procedures? what is the status of data of not certified repositories? is there typical software that can be recommended? which standards are out there describing the characteristics of repositories?
Repository Focus Area - why topic in GEDE there are a few RDA groups working on matters related with repositories do they have a comprehensive picture about the view on repositories in the different communities - if not can GEDE contribute to improve their results there are already good recommendations and best-practices to be disseminated between research infrastructures there could be certification needs that go beyond DSA/WDS rules there could be an exchange of policies and procedures between the communities and also the RDA Practical Policy group specific training sessions on this topic can be organised standard software installations could be discussed (such as OAI-PMH, etc.)
Data Citation landscape need clarification A potential GEDE topic Carlo Maria Zw lf VAMDC e-infrastructure
Persistent Identifiers and data citation/publication PIDs are a key element for stable data referencing The initial GEDE activity was focused on Persistent Identifiers (PIDs) bundle We converged on a wide consensus set of statements PIDs are a fundamental building block for data citation mechanisms The road linking PIDs with data citation/publication is not direct: The data citation landscape is highly fragmented (several standards & platform exist). Rapidly changes (new standards appears, some platforms disappear, new ones arrive) Orientation may be difficult for data producers, providers and consumers
The data citation/publication landscape One cite published (in the sense of reachable) material Data citation and publication are two faces of the same coin: Since some material is published, it can be referred to
The data citation/publication landscape One cite published (in the sense of reachable) material Data citation and publication are two faces of the same coin: Since some material is published, it can be referred to Data Citation Working group (https://www.rd-alliance.org/groups/data-citation-wg.html) RDA RDA/WDS Publishing Data Services WG (https://www.rd-alliance.org/groups/rdawds-publishing-data-services-wg.html) and best practices landscape Existing Recommendations ICSU / WDS Joint declaration of Data Citation Principle (WDS/Force11) (http://www.force11.org/datacitation) Force11 data citation Manifesto (https://www.force11.org/about/manifesto) Force 11 CODATA-ICSTI Data citation Standards and Practices (http://www.codata.org/task-groups/data-citation-standards-and-practices) Codata
The data citation/publication landscape One cite published (in the sense of reachable) material Data citation and publication are two faces of the same coin: Since some material is published, it can be referred to Data Cite (http://datacite.org) Zenodo Assign DOI Existing platforms and services for (http://zenodo.org) data citation/publication EUDAT (http://eudat.eu) OpenAire (http://openaire.eu) The Dataverse project (http://datverse.org former thedata.org) Scholix (http://scholix.org)
The data citation/publication landscape One cite published (in the sense of reachable) material Data citation and publication are two faces of the same coin: Since some material is published, it can be referred to Data Cite (http://datacite.org) May be based on the recalled Zenodo Existing platforms and services for (http://zenodo.org) recommendations data citation/publication EUDAT (http://eudat.eu) OpenAire (http://openaire.eu) The Dataverse project (http://datverse.org former thedata.org) Scholix (http://scholix.org)
A disoriented community Data citation/publication services and platforms Data citation/publication recommendations ? ? ? ?? ? Users and data providers communities
A disoriented community Data citation/publication services and platforms Data citation/publication recommendations ? ? ? ?? ? Users and data providers communities A true (very recent) story Very known editor EU e-infrastructure
A disoriented community Data citation/publication services and platforms Data citation/publication recommendations ? ? ? ?? ? Users and data providers communities A true (very recent) story We would like to cite your data. What could you propose? Very known editor EU e-infrastructure
A disoriented community Data citation/publication services and platforms Data citation/publication recommendations ? ? ? ?? ? Users and data providers communities A true (very recent) story We have implemented the RDA recommendation on citation of dynamic data Very known editor EU e-infrastructure
A disoriented community Data citation/publication services and platforms Data citation/publication recommendations ? ? ? ?? ? Users and data providers communities A true (very recent) story What about using Scholix instead? Very known editor EU e-infrastructure
A disoriented community Data citation/publication services and platforms Data citation/publication recommendations ? ? ? ?? ? Users and data providers communities A true (very recent) story Very known editor EU e-infrastructure
A suggestion for the next GEDE activity By summarizing In the PIDs context GEDE succeeded in clearing up the confusion The data citation wild context is similar to the PIDs one Existing solutions and best practices By clustering It is a logical continuity of the PIDs work A similar rationalization work may be done for data- citation fragmented landscape It will answer a real urgent need
Versioning of data - in support of FAIRness and trustability Maggie Hellstr m ICOS Carbon Portal
Why version stuff? W3C document Data on the Web Best Practices says: data sets may change over time new data is appended (in time or space) some old data no longer relevant error(s) discovered & fixed no consensus on when changes constitute a new version (major or minor) of existing data set a new data set should be defined who is responsible maintaining trust and reusability requires consistent approach to versioning clear information metadata pointers ( is-deprecated , is-replaced-by , replaces ... https://www.w3.org/TR/dwbp/ (from 31 January 2017)
Recent RDA discussions Proposed interest group (IG) on data versioning discussions started at P8 (Denver) & continued at P9 (Barcelona) Current effort led by Lesley Wyborn & Jens Klump Example use cases Australia: large remote sensing & geospatial datasets; composites of numerous (tiled) sources. If a (small) subset is updated, how to best describe this? USA: NASA s Socioeconomic Data and Applications Center (SEDAC)
Related initiatives RDA Data Citation WG Dynamic data Query-centered approach: store actual queries together with timestamp information, and PID them RDA Data Collection WG Collection objects can be used to point to dynamic datasets that belong together Collections can themselves be versionable EUDAT/RDA/COOPEUS: Identification & citation of open-ended data, subsets
What can GEDE do? Survey European RIs current needs & requirements applied practices available support from e-services Investigate global efforts (search white & gray literature) Gather concrete use cases already defined via RDA groups new ones