FAIR Re-use: Implications for AI-Readiness

FAIR Re-use: Implications for

AI-Readiness

Lydia Fletcher

Texas Advanced Computing Center

April 16, 2025

A bit about me

●

Joined TACC in 2022 after over 10 years in academic science librarianship

●

I focus on whole-lifecycle data management strategies and tools, and am

particularly interested in provenance

○

This might have to do with my “fun fact” – my original course of studying was medieval manuscripts

palaeography and codicology, including a master degree at the University of Oxford

●

My first project at TACC was creating a disaster information system for the state of

Texas funded by Hurricane Harvey relief money

●

I've only worked with iHARP for a couple of months

●

This presentation draws on my experience working with state agency staff for a

project and my mandate to help iHARP negotiate Open Science issues

Garbage In, Garbage Out

●

The quality of output from a model is

directly dependent on the quality of input

●

High-quality, well-curated datasets are

critical for AI-readiness even if the

datasets aren’t “AI-ready”

Image credit:

https://thedailyomnivore.net/2015/12/02/garbage-in-garbage-out/

How can FAIR help with “Garbage In, Garbage Out”?

Findable

Accessible

Interoperable

Reusable

High quality datasets are easy to locate using GOFAIRUS,

fairsharing.org or other resources.

Datasets are available to download for training, benchmarking, and

validation of models.

Well described provenance in the form of metadata and

documentation, as well as availability in AI-ready formats.

Following FAIR principles makes digital objects reusable and also

act as guidelines for reusing data.

A closer look at Reusability

To achieve FAIR Reusability, these requirements must be met:

R1. (Meta)data are richly described with a plurality of accurate and relevant

attributes

R1.1. (Meta)data are released with a clear and accessible data usage license

R1.2. (Meta)data are associated with detailed provenance

R1.3. (Meta)data meet domain-relevant community standards

Taken from

https://www.go-fair.org/fair-principles/

What does Reuse

really

 mean?

●

Understanding Provenance

○

Where did it come from? How has it been cleaned or

manipulated?

●

Developing Search Skills

●

Becoming Familiar With Disciplinary Repositories

How FAIR principles can be used as guidelines

●

Understand Data Context

●

Document Your Usage

●

Engage in Transparent Research

●

Contribute to the Open Data Ecosystem

Improving Traceability Throughout the Data Lifecycle

https://doi.org/10.26153/tsw/49483

A key component to Reuse is licensing

●

Acknowledge Sources

●

Respect Terms of Use and/or Restrictions

○

Lots of data is public domain, but a lot isn’t

○

When using data from a private-public partnership it’s

essential to understand restrictions on publication

●

Respect Privacy and Confidentiality

Challenges

●

Time

●

Data Literacy Education

●

Assessing Data Quality and

Consistency

●

Ensuring Data Integration

and Interoperability

●

Enhanced Collaboration

and Innovation

●

Improved Model

Performance and

Replicability

●

Increased Transparency

and Accountability

●

Extending FAIR Principles

Opportunities

Contact info:

lfletcher@tacc.utexas.edu

Slide Note

Embed Share

Download

In this insightful presentation, discover how FAIR principles can enhance AI-readiness by ensuring high-quality, accessible, and reusable datasets. Explore the importance of provenance, metadata, and community standards in achieving FAIR reusability for effective data management in AI environments.

sergeant_j Follow

Uploaded on Feb 19, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

FAIR Re-use: Implications for AI-Readiness Lydia Fletcher Texas Advanced Computing Center April 16, 2025

A bit about me Joined TACC in 2022 after over 10 years in academic science librarianship I focus on whole-lifecycle data management strategies and tools, and am particularly interested in provenance This might have to do with my fun fact my original course of studying was medieval manuscripts palaeography and codicology, including a master degree at the University of Oxford My first project at TACC was creating a disaster information system for the state of Texas funded by Hurricane Harvey relief money I've only worked with iHARP for a couple of months This presentation draws on my experience working with state agency staff for a project and my mandate to help iHARP negotiate Open Science issues

Garbage In, Garbage Out The quality of output from a model is directly dependent on the quality of input High-quality, well-curated datasets are critical for AI-readiness even if the datasets aren t AI-ready Image credit: https://thedailyomnivore.net/2015/12/02/garbage-in-garbage-out/

How can FAIR help with Garbage In, Garbage Out? How can FAIR help with Garbage In, Garbage Out ? High quality datasets are easy to locate using GOFAIRUS, fairsharing.org or other resources. Findable Datasets are available to download for training, benchmarking, and validation of models. Accessible Well described provenance in the form of metadata and documentation, as well as availability in AI-ready formats. Interoperable Following FAIR principles makes digital objects reusable and also act as guidelines for reusing data. Reusable

A closer look at Reusability To achieve FAIR Reusability, these requirements must be met: R1. (Meta)data are richly described with a plurality of accurate and relevant attributes R1.1. (Meta)data are released with a clear and accessible data usage license R1.2. (Meta)data are associated with detailed provenance R1.3. (Meta)data meet domain-relevant community standards Taken from https://www.go-fair.org/fair-principles/

What does Reuse reallymean? Understanding Provenance Where did it come from? How has it been cleaned or manipulated? Developing Search Skills Becoming Familiar With Disciplinary Repositories

How FAIR principles can be used as guidelines Understand Data Context Document Your Usage Engage in Transparent Research Contribute to the Open Data Ecosystem Improving Traceability Throughout the Data Lifecycle https://doi.org/10.26153/tsw/49483

A key component to Reuse is licensing Acknowledge Sources Respect Terms of Use and/or Restrictions Lots of data is public domain, but a lot isn t When using data from a private-public partnership it s essential to understand restrictions on publication Respect Privacy and Confidentiality

Challenges Opportunities Enhanced Collaboration and Innovation Improved Model Performance and Replicability Increased Transparency and Accountability Extending FAIR Principles Time Data Literacy Education Assessing Data Quality and Consistency Ensuring Data Integration and Interoperability

Contact info: lfletcher@tacc.utexas.edu

FAIR Re-use: Implications for AI-Readiness

Download Presentation

Presentation Transcript

Related

More Related Content