Enhancing Data Reusability: Challenges and Strategies
The WDS/RDA Assessment of Data Fitness for Use Working Group addresses common challenges faced by researchers in utilizing data from repositories, emphasizing the importance of comprehensive assessment and reliability. The focus is on improving the reusability of datasets by ensuring they meet quality standards and are fit for diverse user needs.
Uploaded on Dec 12, 2024 | 0 Views
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
The WDS/RDA Assessment of Data Fitness for Use Working Group Jonathan Petters (Virginia Tech) Marina Soares e Silva (Elsevier) Claire Austin (Department of the Environment, Government of Canada) Michael Diepenbroek (PANGAEA) RDA 12 - November 2018
Shared Google Doc notes https://tinyurl.com/DataFitnessForUse 2 RDA 12 - November 2018
Problem: I have the data but cant use it I have found data in a domain/generic repository that I can access but - I can t be sure it s complete - The metadata contains conflicting information - I am having issues with the format and I just wasted 6 hours of my time figuring out I can t use it! 3 RDA 12 - November 2018
Problem: I have the data but cant use it Provider gives access to a dataset which is FAIRly deposited by creator BUT The same dataset might not be fit for the data user! 4 RDA 12 - November 2018
Problem: I have the data but cant use it Provider gives access to a dataset which is FAIRly deposited by creator Challenge How to make research data fit for the widest possible use? BUT The same dataset might not be fit for the data user! 5 RDA 12 - November 2018
Our working groups approach Data fitness for use Assessment of the fitness of use for individual data sets should consolidate current efforts and be thorough & comprehensive reliable & of efficient application high impact & visibility 6 RDA 12 - November 2018
Our working groups approach Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality Our target group: Data repositories 7 RDA 12 - November 2018
Our working groups approach Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality a checklist Our target group: Data repositories 8 RDA 12 - November 2018
Our working groups approach Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality a checklist Our target group: Data repositories for use by repository managers/external evaluator such as CoreTrustSeal 9 RDA 12 - November 2018
Our working groups approach Data fitness for use RDA/WDS Working Group Specify criteria of reusability expanding on FAIR to support providers in assessing data quality a checklist + rating system! Our target group: Data repositories for use by repository managers/external evaluator such as CoreTrustSeal 10 RDA 12 - November 2018
Criteria to assess data fitness for use Categories Metadata completeness (R) Accessibility (A) Data completeness and correctness (R) Findability & interoperability (F, I) Curation (leading to FAIRness) Expanding on reusability of FAIR 12 RDA 12 - November 2018
Assessing data fitness for use (data correctness) Repository hosts weather observation data in a spreadsheet Spreadsheet is findable, accessible But is it fit for use? 13 RDA 12 - November 2018
Assessing data fitness for use (data correctness) 14 RDA 12 - November 2018
Assessing data fitness for use (data correctness) 15 RDA 12 - November 2018
Initial Feedback on Checklist - ICPSR 16 RDA 12 - November 2018
Initial Feedback on Checklist - ICPSR How to evaluate level of curation for dataset? 17 RDA 12 - November 2018
Initial Feedback on Checklist - ICPSR How to evaluate level of curation for dataset? Through standard curation procedures for repository 18 RDA 12 - November 2018
Initial Feedback on Checklist - ICPSR How to evaluate level of curation for dataset? Through standard curation procedures for repository Some questions are general, making it hard to evaluate 19 RDA 12 - November 2018
Initial Feedback on Checklist - ICPSR How to evaluate level of curation for dataset? Through standard curation procedures for repository Some questions are general, making it hard to evaluate Might envision multiple reviewers ala CoreTrustSeal certification 20 RDA 12 - November 2018
Initial Feedback on Checklist - ICPSR How to evaluate level of curation for dataset? Through standard curation procedures for repository Some questions are general, making it hard to evaluate Might envision multiple reviewers ala CoreTrustSeal certification Evaluation (and time to evaluate) dataset properties will vary with heterogeneity of dataset how to address? 21 RDA 12 - November 2018
Challenges - Volunteer effort - Inherent to our approach - Level of expertise of repository manager matters - How do repository managers currently evaluate data fitness? - Sample size might influence result of assessment - Manual labor 22 RDA 12 - November 2018
Challenges - Rating system - How to weigh criteria to determine - How to implement: potential automation - Resources to implement 23 RDA 12 - November 2018
Outlook Implementation of rating system Maybe (semi) automation of assessment refer https://fairshake.cloud/ as an example of something that could work for semi automated assessment (users evaluate datasets) Draft article for peer-reviewed journal 24 RDA 12 - November 2018
Outlook Roll work into new RDA working groups? Proposed RDA WG on a FAIR Data Maturity Model Propose new RDA WG for automating data quality for verification - coordinating role? Rolloff new WG from Domain Repositories IG on data/metadata standards in communities Collaboration with other groups such as GOFAIR? Get involved!!! https://www.rd-alliance.org/groups/assessment-data-fitness-use 25 RDA 12 - November 2018
Next in this session Luiz Bonino (GOFAIR automation approach Wade Bishop (Univ. Tennessee, USA data provider perspective) Discussion time 26 RDA 12 - November 2018
The WDS/RDA Assessment of Data Fitness for Use Working Group Jonathan Petters - jpetters@vt.edu Marina Soares e Silva - m.soaresesilva@elsevier.com data-fitness@rda-groups.org RDA 12 - November 2018
Working group wrapping up Where do we go from here? Do we have the interest and resources to continue checklist development? 28
Challenges - Efforts on volunteer basis - here was the plan 2017-08 Terminology & Definition of Terms 2017-12 - Pilot assessment of criteria 2018-02 - Development/design of badge system and integration with current certification schemes 2018-08 - Concept for integration of data repository service components. Piloting Integration of badge system 29 RDA 12 - November 2018
Challenges - What has been accomplished Terminology for data fitness Creation and comparisons of data fitness criteria (spreadsheet) Data fitness for use checklist (Google Form) - Minimal testing 30 RDA 12 - November 2018
Presented at Domain Repositories IG Heterogeneity of datasets leads to difficulty in evaluating datasets with domain expertise (not just a time sink) Sampling 6 to 12 datasets is not representative for a repository with 40,000 datasets Should we expect the same level of curation for all datasets? Not all have the same perceived value For some repositories, use analytics for datasets are available and should be used Need for agreement on data/metadata standards within communities could roll out of this work 31
Annex 32
Intensive work on data quality There are many initiatives to define data standards Data publishing and repository certification Principles of data FAIRness There are various approaches to assess quality Ratings vs. good practices Data can be assessed by different entities Independent certification bodies Data curators (e.g in repositories) Users (e.g. through social media) F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads 33 RDA 12 - November 2018
Implementing FAIR principles 1. Requirements to create new data 2. Assessing existing data 3. Transformation tools to make data FAIR (Go-FAIR initiative) F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads 34
Implementing FAIR principles 1. Requirements to create new data 2. Assessing existing data 3. Transformation tools to make data FAIR (Go-FAIR initiative) Certification Reviewer Data center/repository Curation Data center/repository Downloads, social tagging Users F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads 35
Assessing data fitness for use (data correctness) 36 RDA 12 - November 2018
Assessing data fitness for use (data correctness) 37 RDA 12 - November 2018
1. Before getting started: who is who 2. Initiatives 3. Data reusability: why? 4. Data Fitness for use 5. Defining fitness for use 6. Assessing data fitness for use 7. Outlook 38
Many approaches to assess data quality Also: Open data institute (UK) Centre for open science (US) BUT these do not define good practice They certify that a particular practice was followed. Open Data rating by Tim Berners- Lee https://5stardata.info/en/ 39
Defining fitness for use Glossaries Data Quality Vocabulary, W3C Working Group Note Science Europe Data Glossary RDA Term Definition Tool (TeD-T) Standard Glossary for Research Data Management (IRiDiuM) Data quality degree to which a set of characteristics of data fulfills requirements (ISO 9000) Any data are usable as long as they fit the purpose Assessment of usability implies definition of requirements 40
Initiatives RDA/WDS Data Publishing Workflows WG Certification of data centers/repositories GEO label facets (DMP) FAIR principles 41
Data Reusability: why? https://www.nature.com/articles/sdata201618 42
Data Fitness for Use and FAIR Challenge Provider Data provider & user not necessarily aligned User Provider Define fitness for use How/Who User 43