FAIR Re-use: Implications for AI-Readiness

 
FAIR Re-use: Implications for
AI-Readiness
 
Lydia Fletcher
Texas Advanced Computing Center
April 16, 2025
 
A bit about me
 
Joined TACC in 2022 after over 10 years in academic science librarianship
I focus on whole-lifecycle data management strategies and tools, and am
particularly interested in provenance
This might have to do with my “fun fact” – my original course of studying was medieval manuscripts
palaeography and codicology, including a master degree at the University of Oxford
My first project at TACC was creating a disaster information system for the state of
Texas funded by Hurricane Harvey relief money
I've only worked with iHARP for a couple of months
This presentation draws on my experience working with state agency staff for a
project and my mandate to help iHARP negotiate Open Science issues
 
Garbage In, Garbage Out
 
The quality of output from a model is
directly dependent on the quality of input
High-quality, well-curated datasets are
critical for AI-readiness even if the
datasets aren’t “AI-ready”
 
Image credit: 
https://thedailyomnivore.net/2015/12/02/garbage-in-garbage-out/
 
How can FAIR help with “Garbage In, Garbage Out”?
 
Findable
Accessible
Interoperable
Reusable
 
High quality datasets are easy to locate using GOFAIRUS,
fairsharing.org or other resources.
 
Datasets are available to download for training, benchmarking, and
validation of models.
 
Well described provenance in the form of metadata and
documentation, as well as availability in AI-ready formats.
 
Following FAIR principles makes digital objects reusable and also
act as guidelines for reusing data.
 
A closer look at Reusability
 
To achieve FAIR Reusability, these requirements must be met:
R1. (Meta)data are richly described with a plurality of accurate and relevant
attributes
R1.1. (Meta)data are released with a clear and accessible data usage license
R1.2. (Meta)data are associated with detailed provenance
R1.3. (Meta)data meet domain-relevant community standards
 
Taken from 
https://www.go-fair.org/fair-principles/
 
What does Reuse 
really
 mean?
 
Understanding Provenance
Where did it come from? How has it been cleaned or
manipulated?
Developing Search Skills
Becoming Familiar With Disciplinary Repositories
 
How FAIR principles can be used as guidelines
 
Understand Data Context
Document Your Usage
Engage in Transparent Research
Contribute to the Open Data Ecosystem
 
Improving Traceability Throughout the Data Lifecycle
https://doi.org/10.26153/tsw/49483
 
A key component to Reuse is licensing
 
Acknowledge Sources
Respect Terms of Use and/or Restrictions
Lots of data is public domain, but a lot isn’t
When using data from a private-public partnership it’s
essential to understand restrictions on publication
Respect Privacy and Confidentiality
 
Challenges
 
Time
Data Literacy Education
Assessing Data Quality and
Consistency
Ensuring Data Integration
and Interoperability
 
Enhanced Collaboration
and Innovation
Improved Model
Performance and
Replicability
Increased Transparency
and Accountability
Extending FAIR Principles
 
Opportunities
 
Contact info:
 
lfletcher@tacc.utexas.edu
Slide Note
Embed
Share

In this insightful presentation, discover how FAIR principles can enhance AI-readiness by ensuring high-quality, accessible, and reusable datasets. Explore the importance of provenance, metadata, and community standards in achieving FAIR reusability for effective data management in AI environments.

  • FAIR principles
  • AI readiness
  • Reusability
  • Provenance
  • Metadata

Uploaded on Feb 19, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. FAIR Re-use: Implications for AI-Readiness Lydia Fletcher Texas Advanced Computing Center April 16, 2025

  2. A bit about me Joined TACC in 2022 after over 10 years in academic science librarianship I focus on whole-lifecycle data management strategies and tools, and am particularly interested in provenance This might have to do with my fun fact my original course of studying was medieval manuscripts palaeography and codicology, including a master degree at the University of Oxford My first project at TACC was creating a disaster information system for the state of Texas funded by Hurricane Harvey relief money I've only worked with iHARP for a couple of months This presentation draws on my experience working with state agency staff for a project and my mandate to help iHARP negotiate Open Science issues

  3. Garbage In, Garbage Out The quality of output from a model is directly dependent on the quality of input High-quality, well-curated datasets are critical for AI-readiness even if the datasets aren t AI-ready Image credit: https://thedailyomnivore.net/2015/12/02/garbage-in-garbage-out/

  4. How can FAIR help with Garbage In, Garbage Out? How can FAIR help with Garbage In, Garbage Out ? High quality datasets are easy to locate using GOFAIRUS, fairsharing.org or other resources. Findable Datasets are available to download for training, benchmarking, and validation of models. Accessible Well described provenance in the form of metadata and documentation, as well as availability in AI-ready formats. Interoperable Following FAIR principles makes digital objects reusable and also act as guidelines for reusing data. Reusable

  5. A closer look at Reusability To achieve FAIR Reusability, these requirements must be met: R1. (Meta)data are richly described with a plurality of accurate and relevant attributes R1.1. (Meta)data are released with a clear and accessible data usage license R1.2. (Meta)data are associated with detailed provenance R1.3. (Meta)data meet domain-relevant community standards Taken from https://www.go-fair.org/fair-principles/

  6. What does Reuse reallymean? Understanding Provenance Where did it come from? How has it been cleaned or manipulated? Developing Search Skills Becoming Familiar With Disciplinary Repositories

  7. How FAIR principles can be used as guidelines Understand Data Context Document Your Usage Engage in Transparent Research Contribute to the Open Data Ecosystem Improving Traceability Throughout the Data Lifecycle https://doi.org/10.26153/tsw/49483

  8. A key component to Reuse is licensing Acknowledge Sources Respect Terms of Use and/or Restrictions Lots of data is public domain, but a lot isn t When using data from a private-public partnership it s essential to understand restrictions on publication Respect Privacy and Confidentiality

  9. Challenges Opportunities Enhanced Collaboration and Innovation Improved Model Performance and Replicability Increased Transparency and Accountability Extending FAIR Principles Time Data Literacy Education Assessing Data Quality and Consistency Ensuring Data Integration and Interoperability

  10. Contact info: lfletcher@tacc.utexas.edu

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#