Digital Preservation in Academic Research Data Reuse
The Society for the Preservation of Natural History Collections conducted a project funded by IMLS to study data reuse and digital preservation in academic disciplines. The research team explored significant properties of social science, archaeological, and zoological data for effective preservation and reuse. Methods included interviews, data collection, and mapping properties as representation information, revealing insights into data reusers and their practices.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
The Society for the Preservation of Natural History Collections, June 17- 22, 2013, Rapid City, South Dakota Inside Zoological Collections: Perspectives of the Academic (Re)user Ixchel M. Faniel, Ph.D. OCLC Research fanieli@oclc.org The world s libraries. Connected.
Institute for Museum and Library Services (IMLS) funded project led by Drs. Ixchel Faniel (PI) & Elizabeth Yakel (co-PI) Studying the intersection between data reuse and digital preservation in three academic disciplines to identify how contextual information about the data that supports reuse can best be created and preserved. Focuses on research data produced and used by quantitative social scientists, archaeologists, and zoologists. The intended audiences of this project are researchers who use secondary data and the digital curators, digital repository managers, data center staff, and others who collect, manage, and store digital information. For more information, please visit http://www.dipir.org The world s libraries. Connected.
Research Motivations & Questions 1. What are the significant properties of quantitative social science, archaeological, and zoological data that facilitate reuse? 2. How can these significant properties be expressed as representation information to ensure the preservation of meaning and enable data reuse? Faniel & Yakel 2011 The world s libraries. Connected.
The Research Team Nancy McGovern ICPSR/MIT Elizabeth Yakel University of Michigan (Co-PI) Ixchel Faniel OCLC Research (PI) DIPIR Project William Fink UM Museum of Zoology Eric Kansa Open Context The world s libraries. Connected.
Methods Overview ICPSR Open Context UMMZ Phase 1: Project Start up Interviews Staff 10 4 10 Winter 2011 Winter 2011 Spring 2011 Phase 2: Collecting and analyzing user data Interviews data consumers 44 22 27 Winter 2012 Winter 2012 Fall 2012 Survey data consumers Over 1,600 Summer 2012 Web analytics data consumers Server logs Ongoing Observations data consumers 10 Ongoing Phase 3: Mapping significant properties as representation information The world s libraries. Connected.
A Snapshot of the 27 Data Reusers reuse data from other repositories and websites 96% reuse data from museums and archives 93% 63% are systematists 37% study ecological trends reuse data from colleagues 26% reuse data from journal articles 26% The world s libraries. Connected.
The Data Discovery Process I m at the point now where people know that this is kind of one of those things that I do. And so, people say Oh, I have this dataset or I know someone who has this dataset... (CAU11). we started from that [author] paper and then added to it from other people s work So mostly from reading other people s papers(CAU22). I knew from prior experience which museums had large collections of material from the part of the world I was interested in (CAU19). that [aggregator repository]targets so many different collections that once you have access you know pretty much You can identify very quickly what you need (CAU13). The world s libraries. Connected.
Data Selection Criteria Condition of specimen Data coverage Results of pre-analysis Identification or location errors Geographic precision Matches another dataset Availability of voucher specimen Relevant taxonomically Sequence has been published Time period specimen collected The world s libraries. Connected.
Data selection based on research objectives that s the first filter looking for specific species. And then for me, yeah, it s been mostly about the geographic precision of the data, to say whether or not I can use that record for something. (CAU26). For things like measurement, you want a well-preserved specimenthat s relatively straight and intact. (CAU01) often when it doesn t meet my needs the most obvious reasons would be there s just not enough data or it doesn t cover Like geographically it doesn t cover the area I m interested in well enough(CAU03). The world s libraries. Connected.
Data selection based on other datasets we decide, okay, these georeferences have an error that is probably higher than, let s say, five kilometers but our climate data is the resolution, the pixel size, is may be 4.5 kilometers. So, anything that is above that size of pixel that we have, we actually cannot use. (CAU14) I include it [the sequence] in my dataset, do the analyses I m going to do and then based on the results of those analysis look to see how those data match with the data that I ve collected. (CAU05) The world s libraries. Connected.
Trusting the data I can sort of qualitatively assess what the quality of taxonomic data might be just by it being, having some mention of the museum record. I know [a] museum worker who is often... I don't know about an expert in say, my group, but at least has access to the relevant literature to make goodtaxonomic decisions about those fishes from which they took the tissue. (CAU02) I would go back to the literature to look at the paper it came from. I guess there is also to some degree the particular researchers that actually produced that sequence; I might actually know their reputations or what they kind of work on and trust it more or less. (CAU12) The world s libraries. Connected.
Trusting the data A lot of times, it's just a matter of looking at what the Latin name is that they supply because I can't really make a decision based on the information that I'm given. If I had a picture, I could use that when I'm taking into account their ability to identify something. But the main way that I do it is by looking at the geography of where they claim a specimen is located. (CAU17) Well, if there's a voucher specimen available then I can request that specimen from the museum where it's housed, re-examine it, confirm or deny that it is that particular species. If the voucher's there and it's the right species, then I have to go with it. If the voucher is not there, and I really question the identification Because it's unreliable in my mind. (CAU20) The world s libraries. Connected.
Acknowledgements Institute of Museum and Library Services Elizabeth Yakel, Co-PI Partners: Nancy McGovern, Ph.D. (MIT), Eric Kansa, Ph.D. (Open Context), William Fink, Ph.D. (University of Michigan Museum of Zoology) OCLC Fellow: Julianna Barrera-Gomez Students: Adam Kriesberg, Morgan Daniels, Rebecca Frank, Jessica Schaengold, Gavin Strassel, Michele DeLia, Kathleen Fear, Mallory Hood, Molly Haig, Annelise Doll, Monique Lowe The world s libraries. Connected.
Ixchel Faniel fanieli@oclc.org Questions? The world s libraries. Connected.