Managing Discrepancies in Periodical Holdings Data

 
Periodical Holdings Audit:
 
Correcting Discrepancies and Improbabilities in Catalogs and
Periodical A-Z Lists
The Problem(s)
 
Two systems contain periodical holdings information:
The library catalog (III)
The A-Z list (EBSCO Holdings Management/Full Text Finder)
The two systems don’t always agree
Historically maintained by two different departments
Bad data has been copied from one system to another
Data corrupted by EBSCO on original ingest
 
The Problem(s)
 
 
Kinds of bad information:
Disagreement between systems
Disagreement between fields within a system
Impossible data (e.g. Holdings: 2010-1940)
Improbable data (e.g. 1869-Present)
Especially improbable for niche publications
Just plain incorrect (hard to detect)
The Solution(s)
 
1.
Identify known problems
Record problems found in real life
Model the problem in a way detectable by algorithm
“Algorithm”, sadly, does not imply the lack of grunt work
Fix ’em all
2.
Imagine problems you don’t know you have
You probably have those problems too
Fix them too
Ken, you
might be
projecting…
 
Photo:
Max Halberstadt,
Public Domain
 
What does “fix” mean
 
RESEARCH!
Where data imported poorly to EBSCO, sometime the catalog alone is
enough to clarify correct holdings statement
Often have to check with print holdings in person
 
Methods
 
A lot of Excel:
Filter
Copy filter results to new table
Filter again
 
PHP & MySQL
If you are or have access to a programmer, almost any scripting language
would do: Perl, PHP, Python, etc.
 
Catalog vs. EBSCO data structure
 
EBSCO’s format does not allow for volume/issue information in a structured way
Only in the free-text, optional CoverageStatement field
Example 1: Records exist in FTF, not in catalog
 
Scenario: Record was deleted from catalog after export to FTF, not
deleted from FTF
 
Process improvement: When records are removed or suppressed from
the catalog, change them in FTF too.
 
Disagreement
Example 1: Records exist in FTF, not in catalog
 
Detection:
Export “serlist” records from catalog
Export records from FTF
Trim urls / bib records 7-digits e.g. (
b1262517
)
Compare using:
“Compare two lists” from MIT Bioinformatics & Research Computing
http://jura.wi.mit.edu/bioc/tools/compare.php
Remove FTF-only titles from FTF
 
Example 1: Records exist in FTF, not in catalog
 
http://jura.wi.mit.edu/bioc/tools/compare.php
Example 2: Coverage “to Present” & End Date
 
EBSCO’s FTF metadata includes 3 columns related to holdings dates:
CustomCoverageBegin (date only)
CustomCoverageEnd (date only)
CoverageStatement (free text, supports volume #, date, etc.)
The use of multiple fields to cover the same information leads to the
potential for discrepancies
 
Disagreement
 
Example 2: Coverage “to Present” & End Date
 
Detection:
Excel filter: CustomCoverageEnd = ‘Present’
Excel find: Coverage statement contains ‘-v’ or ‘- v’
Example 3: Complex ≠ Simple holdings
Scenario: FTF shows complex holdings and simple coverage statement
OR Vice versa
 
Disagreement
Example 3: Complex ≠ Simple holdings
 
Detection:
Excel filter: ‘|’ (pipe) in CustomCoverage
Excel filter: does not contain ‘ , ’ (comma) in CoverageStatement
 
 
And then:
Decide what to do about it…
Example 4: Holdings "to present" but not
listed as Retains Current
 
In our library, most current subscriptions are held in “Current
Periodicals”
e.g.: “v.49(2012)-;Retains current volume in Current Periodicals.”
Places where that statement is missing are suspect
Some are legit, but many absences for current subscriptions indicate
errors
 
 
IMPROBABLE
Example 4: Holdings "to present" but not
listed as Retains Current
 
Detection:
Excel filter: CustomCoverageEnd = ‘Present’
Excel filter: Coverage Statement does not contain ‘Retain’
 
Results:
Some correct
Some withdrawn
Some should have had Current Periodicals statement
Example 4b: Vice Versa
 
“Retains current” but end date does not contain ‘Present’
Found four total erroneous records
Errors in catalog
Errors in EBSCO ingest
Example 5: Special Collections to ‘Present’
 
We have very few titles in Storage or Special Collections with current
subscriptions.
There were 55 questionable titles
Most: Catalog record was out of date
Some: sloppy data ingest (e.g. a single volume or issue was recorded as the
beginning of a series: e.g. n.10(1938) 
 n.10(1938)-
 
IMPROBABLe
 
Example 5: Special Collections to ‘Present’
 
Detection:
Limit holdings to PackageName = “THOMAS RARE”
Or one of several other special collections locations
Limit to CustomCoverageEnd contains “Present”
Example 6: Impossible Date Ranges
 
Items with non-sequential holdings
Lib. Has: n.16(1985),n.22(1987),n.24(1988),n.56(
1964
)-;
Lib. Has: v.21(1896)-v.22(
1987
),v.28(1900)-v.86(1929)
 
Detection:
Did not find a good way to do this!
Fixed them as we found them
 
IMPOSSIBLE
Example 7: Old News
 
We don’t have a science library anymore, but:
 
IMPOSSIBLE
 
Solution:
Create List of Bib Records WHERE
CHECKIN has ‘sci’
Example 8: LibHas vs. CoverageStmt
What if we just look for basic textual disagreement?
LibHas statement (catalog) is textual different from the Coverage
Statement in EBSCO
 
Disagreement
 
Example 8: LibHas vs. CoverageStmt
 
For each record, compare the catalog record with EBSCO’s url for the
item
Catalog: 
b1024258
2
URL: 
http://ezra.wittenberg.edu/record=b1024258~S0
Create List in catalog, export Record #, Title, LibHas.
Export titles from EBSCO, including URL
Example 8: LibHas vs. CoverageStmt
 
Match based on record # / URL, compare holdings statements
I did this with a PHP script & MySQL database, comparing strings
 
 
 
 
You could try Excel, something like:
=INDEX(‘ebsco
'!$V:$V
,MATCH(
B2
,‘ebsco
'!$E:$E
))
but I had trouble getting this to work
In the standard EBSCO export format 
$E:$E 
is the URL column, 
$V:$V 
is the
Coverage Statement
In this example, 
B2
 contains a catalog URL to match on
Example 8: LibHas vs. CoverageStmt
 
Results of this approach
All records in main periodical collection (n = 2029)
LibHas != CoverageStatement (498)
Eliminate blank CoverageStatement (276)
Control for varied spacing and quotation marks (219)
Newly introduced by weeding project (156)
Other errors (63)
Limitation:
Only works where CoverageStatement was defined
 
Future directions
 
Improve staff workflows
Periodic checks for data consistency
Exploring further mechanisms for comparisons/tests
 
Slide Note
Embed
Share

Correcting discrepancies and improbabilities in catalog and A-Z lists containing periodical holdings information, including identifying, modeling, and fixing known and unknown issues through research and various methods like Excel, scripting languages, and manual checks with print holdings.

  • Periodicals
  • Catalog Management
  • Data Accuracy
  • Problem Solving
  • Research

Uploaded on Aug 04, 2024 | 5 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Periodical Holdings Audit: Correcting Discrepancies and Improbabilities in Catalogs and Periodical A-Z Lists Ken Irwin Wittenberg University kirwin@wittenberg.edu

  2. The Problem(s) Two systems contain periodical holdings information: The library catalog (III) The A-Z list (EBSCO Holdings Management/Full Text Finder) The two systems don t always agree Historically maintained by two different departments Bad data has been copied from one system to another Data corrupted by EBSCO on original ingest

  3. The Problem(s) Kinds of bad information: Disagreement between systems Disagreement between fields within a system Impossible data (e.g. Holdings: 2010-1940) Improbable data (e.g. 1869-Present) Especially improbable for niche publications Just plain incorrect (hard to detect)

  4. The Solution(s) 1. Identify known problems Record problems found in real life Model the problem in a way detectable by algorithm Algorithm , sadly, does not imply the lack of grunt work Fix em all Ken, you might be projecting 2. Imagine problems you don t know you have You probably have those problems too Fix them too Photo: Max Halberstadt, Public Domain

  5. What does fix mean RESEARCH! Where data imported poorly to EBSCO, sometime the catalog alone is enough to clarify correct holdings statement Often have to check with print holdings in person

  6. Methods A lot of Excel: Filter Copy filter results to new table Filter again PHP & MySQL If you are or have access to a programmer, almost any scripting language would do: Perl, PHP, Python, etc.

  7. Catalog vs. EBSCO data structure Catalog EBSCO Lib. Has: text: v.1(1960)-v.4:1(1971) CustomCoverageBegin: date: 01/01/1960 CustomCoverageEnd: date: 12/31/1971 + CoverageStatement: text: v.1(1960)-v.4:1(1971) (optional) No real date fields EBSCO s format does not allow for volume/issue information in a structured way Only in the free-text, optional CoverageStatement field

  8. Example 1: Records exist in FTF, not in catalog Scenario: Record was deleted from catalog after export to FTF, not deleted from FTF Process improvement: When records are removed or suppressed from the catalog, change them in FTF too. Disagreement

  9. Example 1: Records exist in FTF, not in catalog Detection: Export serlist records from catalog Export records from FTF Trim urls / bib records 7-digits e.g. (b1262517) Compare using: Compare two lists from MIT Bioinformatics & Research Computing http://jura.wi.mit.edu/bioc/tools/compare.php Remove FTF-only titles from FTF

  10. Example 1: Records exist in FTF, not in catalog http://jura.wi.mit.edu/bioc/tools/compare.php

  11. Example 2: Coverage to Present & End Date EBSCO s FTF metadata includes 3 columns related to holdings dates: CustomCoverageBegin (date only) CustomCoverageEnd (date only) CoverageStatement (free text, supports volume #, date, etc.) The use of multiple fields to cover the same information leads to the potential for discrepancies Disagreement

  12. Example 2: Coverage to Present & End Date Detection: Excel filter: CustomCoverageEnd = Present Excel find: Coverage statement contains -v or - v

  13. Example 3: Complex Simple holdings Scenario: FTF shows complex holdings and simple coverage statement OR Vice versa Disagreement

  14. Example 3: Complex Simple holdings Detection: Excel filter: | (pipe) in CustomCoverage Excel filter: does not contain , (comma) in CoverageStatement And then: Decide what to do about it

  15. Example 4: Holdings "to present" but not listed as Retains Current In our library, most current subscriptions are held in Current Periodicals e.g.: v.49(2012)-;Retains current volume in Current Periodicals. Places where that statement is missing are suspect Some are legit, but many absences for current subscriptions indicate errors IMPROBABLE

  16. Example 4: Holdings "to present" but not listed as Retains Current Detection: Excel filter: CustomCoverageEnd = Present Excel filter: Coverage Statement does not contain Retain Results: Some correct Some withdrawn Some should have had Current Periodicals statement

  17. Example 4b: Vice Versa Retains current but end date does not contain Present Found four total erroneous records Errors in catalog Errors in EBSCO ingest

  18. Example 5: Special Collections to Present We have very few titles in Storage or Special Collections with current subscriptions. There were 55 questionable titles Most: Catalog record was out of date Some: sloppy data ingest (e.g. a single volume or issue was recorded as the beginning of a series: e.g. n.10(1938) n.10(1938)- IMPROBABLe

  19. Example 5: Special Collections to Present Detection: Limit holdings to PackageName = THOMAS RARE Or one of several other special collections locations Limit to CustomCoverageEnd contains Present

  20. Example 6: Impossible Date Ranges Items with non-sequential holdings Lib. Has: n.16(1985),n.22(1987),n.24(1988),n.56(1964)-; Lib. Has: v.21(1896)-v.22(1987),v.28(1900)-v.86(1929) Detection: Did not find a good way to do this! Fixed them as we found them IMPOSSIBLE

  21. Example 7: Old News We don t have a science library anymore, but: Solution: Create List of Bib Records WHERE CHECKIN has sci IMPOSSIBLE

  22. Example 8: LibHas vs. CoverageStmt What if we just look for basic textual disagreement? LibHas statement (catalog) is textual different from the Coverage Statement in EBSCO Disagreement

  23. Example 8: LibHas vs. CoverageStmt For each record, compare the catalog record with EBSCO s url for the item Catalog: b10242582 URL: http://ezra.wittenberg.edu/record=b1024258~S0 Create List in catalog, export Record #, Title, LibHas. Export titles from EBSCO, including URL

  24. Example 8: LibHas vs. CoverageStmt Match based on record # / URL, compare holdings statements I did this with a PHP script & MySQL database, comparing strings You could try Excel, something like: =INDEX( ebsco'!$V:$V,MATCH(B2, ebsco'!$E:$E)) but I had trouble getting this to work In the standard EBSCO export format $E:$E is the URL column, $V:$V is the Coverage Statement In this example, B2 contains a catalog URL to match on

  25. Example 8: LibHas vs. CoverageStmt Results of this approach All records in main periodical collection (n = 2029) LibHas != CoverageStatement (498) Eliminate blank CoverageStatement (276) Control for varied spacing and quotation marks (219) Newly introduced by weeding project (156) Other errors (63) Limitation: Only works where CoverageStatement was defined

  26. Future directions Improve staff workflows Periodic checks for data consistency Exploring further mechanisms for comparisons/tests

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#