PCC Wikidata Pilot at Texas A&M University Libraries Progress Report

 
T
h
e
 
P
C
C
 
W
i
k
i
d
a
t
a
 
P
i
l
o
t
 
a
t
 
T
e
x
a
s
A
&
M
 
U
n
i
v
e
r
s
i
t
y
 
L
i
b
r
a
r
i
e
s
:
 
A Progress Report
 
Jeannette Ho
March 24, 2021
 
Our Goals for Getting Involved with the Pilot:
 
To gain familiarity with Wikidata and its tools
To compare the process of creating Wikidata entries with our traditional process
of authority control
To expose data about the contents of our collections on the wider web and make
it more discoverable to the public
To demonstrate the value of linked data by experimenting with Wikidata's
SPARQL Query service to explore relationships among the entities we plan to
create
To create items for persons and organizations at our university and affiliated
agencies that may not have identities established for them that can may help
disambiguate authors in our institutional repository
 
Our project:
 
We will focus on faculty advisors, graduate students, and their doctoral
dissertations from the Texas A&M Mechanical Engineering Department
Dissertations are deposited in our institutional repository (OAKTrust)
Creation and/or editing of Wikidata items for persons (faculty and thesis authors
affiliated with A&M) and dissertations. Other possibilities: associated
organizations, subject areas and disciplines.
Will do it both manually and experiment with various tools for automating this.
 matching entities in Wikidata, reconciling entities and batch uploading data
Will design SPARQL queries in Wikidata Query Service.
 
What We Have Done Until Now:
 
Created drafts of application profiles for faculty advisors, graduate students
(authors of dissertations) and the dissertations themselves
Began to manually create and enhance items on Wikidata for these entities
Began to experiment with methods of automating this process
QuickStatements
OpenRefine Wikidata extension
Began to document these processes and workflows
This is an ongoing effort as we refine them
Documentation is on our team’s internal Microsoft Teams web channel.
Will share publicly when finalized
 
Consideration when drafting application
profiles:
 
What entities to create items for?
Authors of dissertations, Faculty advisors, dissertations themselves
But members of doctoral thesis committees?
What properties to include?
Gender—to include or not to include? Wikidata policy says we should, but we can’t always
assume we know what it is.
We reviewed:
Existing application profiles from Stanford and University of Washington
Guidelines for WikiProject Books on properties recommended for work vs. edition items
What properties should be “core”?
Consulted with our colleagues in the Office of Scholarly Communications
What questions were they interested in?  What statistics would they like to gather?
Looked at properties we tended to include in COMMON when creating Wikidata items
 
Properties of interest to Scholarly
Communications:
 
Year that someone earns degree:
 
Gender and other demographic info
 
Academic subject or discipline
 
If student published with a professor
 
Professional activity of students after they graduate
 
Name changes for organizations
 
Academic advisors
 
Drafting application profiles: core elements
 
Drafting application profiles: optional
elements
 
Uploading existing metadata about dissertations
from our repository into Wikidata:
 
Screenshot of Wikidata schema in
OpenRefine for faculty advisor
 
Screenshot of Wikidata schema in
OpenRefine for author of a dissertation
 
Screenshot of Wikidata schema in
OpenRefine for a dissertation
 
Example of Wikidata items batch created or
enhanced through OpenRefine
 
Jorge Alvarado (Q59297625)
https://www.wikidata.org/wiki/Q59297625
 
Qibo Li (Q105752010)
https://www.wikidata.org/wiki/Q105752010
 
Numerical Fluid Dynamics and Combustion Study of Emulsified
Canola Oil Droplets in a Swirl Promoted Combustion Chamber
https://www.wikidata.org/wiki/Q105752011
 
 
What we learned:
 
Our institutional repository 
already has plenty of metadata that we can use to
automate this process.  If we can automate the batch creation of items on
Wikidata, it could save us time from having to do each one manually
Lots of dissertation titles are embargoed.  Is it OK to put them on Wikidata?
We had to batch create items first for entities before we could add statements to
the items to create reciprocal links between them
For example
:
Wikidata items for author and dissertation need to be created first.  Then we needed to
create a separate schema to add linking statements between them and run it separately
 
We 
can potentially automate most of these entities where the
same faculty member advised multiple students
 
Next steps:
 
Test how this can work in practice:
Split it up among team members so everyone gets a chance to automatically
create items on Wikidata
Start small: work with dissertations downloaded from our repository that
contain ORCID IDs for the authors
E
ach team member will upload data for a single advisor and 2-3 associated
ETDs and authors at a time
Do the rest of the entities manually for advisors in this sample with only one
student.  See what “extra” information gets added that we can’t automate
Pull data from our local VIVO instance of faculty profiles and batch upload
additional data as a separate step
Last name, first name, positions, where they were educated at, degrees received, etc.
 
 
 
Other steps to take in the future:
 
Explore other tools:
Mix’n’Match
Author Disambiguator
Cradle
Others?
Design and carry out queries for the Wikidata Query Service
Once we have a large enough sample of items on Wikidata
 
 
 
 
Questions?
 
 
 
Contact:  jaho@library.tamu.edu
Slide Note
Embed
Share

Initiating the PCC Wikidata Pilot at Texas A&M University Libraries, the project aims to familiarize with Wikidata, experiment with linked data, and enhance discoverability of institutional records. Focusing on Mechanical Engineering Department's faculty advisors, graduate students, and dissertations, the team is drafting application profiles, creating Wikidata items, and exploring automation methods to enrich data and promote transparency in academic information management.

  • Wikidata
  • Texas A&M University
  • Pilot Project
  • Linked Data
  • Academic Information

Uploaded on Aug 05, 2024 | 4 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. The PCC Wikidata Pilot at Texas The PCC Wikidata Pilot at Texas A&M University Libraries: A&M University Libraries: A Progress Report Jeannette Ho March 24, 2021

  2. Our Goals for Getting Involved with the Pilot: To gain familiarity with Wikidata and its tools To compare the process of creating Wikidata entries with our traditional process of authority control To expose data about the contents of our collections on the wider web and make it more discoverable to the public To demonstrate the value of linked data by experimenting with Wikidata's SPARQL Query service to explore relationships among the entities we plan to create To create items for persons and organizations at our university and affiliated agencies that may not have identities established for them that can may help disambiguate authors in our institutional repository

  3. Our project: We will focus on faculty advisors, graduate students, and their doctoral dissertations from the Texas A&M Mechanical Engineering Department Dissertations are deposited in our institutional repository (OAKTrust) Creation and/or editing of Wikidata items for persons (faculty and thesis authors affiliated with A&M) and dissertations. Other possibilities: associated organizations, subject areas and disciplines. Will do it both manually and experiment with various tools for automating this. matching entities in Wikidata, reconciling entities and batch uploading data Will design SPARQL queries in Wikidata Query Service.

  4. What We Have Done Until Now: Created drafts of application profiles for faculty advisors, graduate students (authors of dissertations) and the dissertations themselves Began to manually create and enhance items on Wikidata for these entities Began to experiment with methods of automating this process QuickStatements OpenRefine Wikidata extension Began to document these processes and workflows This is an ongoing effort as we refine them Documentation is on our team s internal Microsoft Teams web channel. Will share publicly when finalized

  5. Consideration when drafting application profiles: What entities to create items for? Authors of dissertations, Faculty advisors, dissertations themselves But members of doctoral thesis committees? What properties to include? Gender to include or not to include? Wikidata policy says we should, but we can t always assume we know what it is. We reviewed: Existing application profiles from Stanford and University of Washington Guidelines for WikiProject Books on properties recommended for work vs. edition items What properties should be core ? Consulted with our colleagues in the Office of Scholarly Communications What questions were they interested in? What statistics would they like to gather? Looked at properties we tended to include in COMMON when creating Wikidata items

  6. Properties of interest to Scholarly Communications: Year that someone earns degree: Gender and other demographic info Academic subject or discipline If student published with a professor Professional activity of students after they graduate Name changes for organizations Academic advisors

  7. Drafting application profiles: core elements

  8. Drafting application profiles: optional elements

  9. Uploading existing metadata about dissertations from our repository into Wikidata:

  10. Screenshot of Wikidata schema in OpenRefine for faculty advisor

  11. Screenshot of Wikidata schema in OpenRefine for author of a dissertation

  12. Screenshot of Wikidata schema in OpenRefine for a dissertation

  13. Example of Wikidata items batch created or enhanced through OpenRefine Jorge Alvarado (Q59297625) https://www.wikidata.org/wiki/Q59297625 Qibo Li (Q105752010) https://www.wikidata.org/wiki/Q105752010 Numerical Fluid Dynamics and Combustion Study of Emulsified Canola Oil Droplets in a Swirl Promoted Combustion Chamber https://www.wikidata.org/wiki/Q105752011

  14. What we learned: Our institutional repository already has plenty of metadata that we can use to automate this process. If we can automate the batch creation of items on Wikidata, it could save us time from having to do each one manually Lots of dissertation titles are embargoed. Is it OK to put them on Wikidata? We had to batch create items first for entities before we could add statements to the items to create reciprocal links between them For example: Wikidata items for author and dissertation need to be created first. Then we needed to create a separate schema to add linking statements between them and run it separately

  15. We can potentially automate most of these entities where the same faculty member advised multiple students

  16. Next steps: Test how this can work in practice: Split it up among team members so everyone gets a chance to automatically create items on Wikidata Start small: work with dissertations downloaded from our repository that contain ORCID IDs for the authors Each team member will upload data for a single advisor and 2-3 associated ETDs and authors at a time Do the rest of the entities manually for advisors in this sample with only one student. See what extra information gets added that we can t automate Pull data from our local VIVO instance of faculty profiles and batch upload additional data as a separate step Last name, first name, positions, where they were educated at, degrees received, etc.

  17. Other steps to take in the future: Explore other tools: Mix n Match Author Disambiguator Cradle Others? Design and carry out queries for the Wikidata Query Service Once we have a large enough sample of items on Wikidata

  18. Questions? Contact: jaho@library.tamu.edu

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#