Exploring the impact of automated indexing on completeness of MeSH terms

Slide Note

This study delves into the effects of automated indexing on the thoroughness of MeSH terms. It addresses the novelty of automated indexing, its implications for teaching, questions raised by students, observed missing index terms, and the significance of MeSH in practice. The explanation of how automated indexing works sheds light on TF-IDF methodology.

rupert Follow

Uploaded on Apr 19, 2024 | 4 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

Presentation Transcript

Exploring the impact of automated indexing on completeness of MeSH terms Alexandre Amar-ZifkinMLIS Universit de Montr al Virginie Paquet MIS Universit de Montr al Taline Ekmekjian MLIS Public Health Agency of Canada- Agence de la sant publique du Canada Tara Landry MLIS Universit de Montr al This work is licensed under CC BY-SA 4.0

Exploring the impact of automated indexing on completeness of MeSH terms Alexandre Amar-ZifkinMLIS Universit de Montr al Virginie Paquet MIS Universit de Montr al Taline Ekmekjian MLIS Public Health Agency of Canada- Agence de la sant publique du Canada Tara Landry MLIS Universit de Montr al This work is licensed under CC BY-SA 4.0, I guess

Exploring the impact of automated indexing on completeness of MeSH terms Why? - The novelty of Automated Indexing, and its impact on our teaching - Questions raised by students and residents about indexing - Our own incidental observations of fairly obvious missing index terms

Why? What do we mean by fairly obvious ?

Why? What do we mean by fairly obvious ?

MeSH in our practice In our teaching and searching practices, MeSH is positioned, understood and used as a reliable indicator of important (including when implicit) concepts within a publication aboutness . NLM Limiting your searches to MeSH provides precision that keyword searching cannot. MeSH searches allow you to search the set of indexed records with some assurance that the results are really about the topic you are searching, limiting the false hits. This is particularly true if you apply the Major Topic [majr] designation.

How does Automated Indexing work? Think automated term mapping. TF-IDF (less frequent terms from the entirety of Medline count for more / more frequent terms count for less) Titles count double Looks at neighbors -papers with similar titles-and-abstracts - and pulls their MeSH in to the list. It does not look at the full text or keywords; humans did It does not look at journal title, we assume that humans could.

How do we look into this? Unfiltered Medline is a lot. Didn t want to cherry-pick known issues

What did we end up testing? How often do automated-indexed Medline records receive MeSH which accurately (or inaccurately) represent their essential concepts? Within that, are there trends? What does this mean for how we teach new learners? how we search in a rush?

Methods gathering a sample 20-day range of Medline records (skipping weekends and holidays) from after the switch to automated indexing. For each day, 50 randomly selected records: automated-indexed, medline, English. Sample queries: 20230206.ed and automated.igand medline.st and english.la (3420) from 1 keep (50 unique random numbers between 1-3420)

Methods screening Each record (source-title-abstract) assessed by 2 librarians, who indicate Is the article health as opposed to life sciences ? (if so, keep) Is there enough information to ID concepts? (if so, keep) If yes, ID the essential concepts generally not type of study or location Expected this to map to P and I, but sometimes there had to be something else contextual Then, the same librarians look at the actual MeSH applied to the record, and indicate agreement between their essential concepts and the MeSH. Photo by davisuko on Unsplash

Methods tool PMID Screener 1 Journal Title Abstract OK? Concepts MeSH Agreement b/w Concepts & MeSH? Comments 36731271 Alex Epilepsy Research. 190:107088, 2023 02. Indications for continuous electro- encephalographic (cEEG) monitoring: What do they tell us? (omitted for space reasons) y EEG ; epilepsy ; length of monitoring? *Epilepsy Yes, but Lupus? Monitoring, Physiologic Also, Neuro- physiological Monitoring exists. Electro- encephalography *Lupus Erythematosus, Systemic Twice, for each record.

Discussion Issues in indexing 1. There s a MeSH here that shouldn t be 2. There are one or more MeSH where better ones exist (or, the subheadings got confused) 3. Something obvious is missing Photo by Nikon Unsplash

#1: Theres a MeSH here that shouldnt be Generally caused by: Acronyms and rhetoric but these are incidental and easy to skip? Neighbors these are more complicated. If everything else is ok, this isn t catastrophic unless you re searching for the irrelevant term. Then it s noisy.

#1: Theres a MeSH here that shouldnt be Acronyms https://pubmed.ncbi.nlm.nih.gov/36731271/

#1: Theres a MeSH here that shouldnt be Rhetoric https://pubmed.ncbi.nlm.nih.gov/34121610/

#1: Theres a MeSH here that shouldnt be Neighbours https://pubmed.ncbi.nlm.nih.gov/36440479/

#2: There are one or more MeSH where better ones exist / confused subheadings https://pubmed.ncbi.nlm.nih.gov/36744473/

#2: There are one or more MeSH where better ones exist / confused subheadings https://www.ncbi.nlm.nih.gov/mesh/?term=pregnancy+in+adolescence

#2: There are one or more MeSH where better ones exist / confused subheadings https://pubmed.ncbi.nlm.nih.gov/36820744/

#2: There are one or more MeSH where better ones exist / confused subheadings https://pubmed.ncbi.nlm.nih.gov/33617292/

#3: Something obvious is missing https://pubmed.ncbi.nlm.nih.gov/35075912/

#3: Something obvious is missing https://pubmed.ncbi.nlm.nih.gov/35428357/

#3: Something obvious is missing https://pubmed.ncbi.nlm.nih.gov/36730510/

Results From 998 records (1000, minus 2 duplicates) 287 excluded on the basis of not being Health, or not having enough information to go on. From 711 records: 61% (438) agreement (either MeSH is accurate or inaccurate) 39% (273) disagreement (one screener says MeSH is accurate; one disagrees)

Results, after reconciliation From 711 records: 53% (376): MeSH and our concepts were in accordance 47% (334): one or more inadequacies in the MeSH

Weaknesses and gaps WE ARE NOT INDEXERS Records with no abstracts were not handled consistently might ve inflated either side of the results? Subheadings were challenging to assess When both screeners said the indexing was fine, or missing, we didn t second-guess them might ve been different qualms No comparison to human-indexed records No sense of impact on retrieval in a given topic area

Conclusions MeSH no longer consistent indicator of aboutness Foundations that we took for granted are not here anymore. Mechanisms of control and precision are no longer reliable for searchers Work of indexing and QA has been reallocated to users

Conclusions, continued Absolutely can t rely on pure-mesh filters ex. not (child/ not (child/ and adult/)) Textwords then (catch the unindexed) and now (catch the poorly indexed) Blind spot for MeSH derived from full- text. One example of this: racialized groups in https://pubmed.ncbi.nlm.nih.gov/34573423/

Conclusions, continued (2) Can no longer honestly say that if the abstract says the study excluded diabetics, it won t have Diabetes in the MeSH Guidance to use most specific MeSH is now actively harmful.

Discussion / ways forward Write lifelessly Be wary of acronyms and neologisms and rhetorical flourishes https://pubmed.ncbi.nlm.nih.gov/36928038/

Discussion / ways forward NLM suggests trying out MeSH On Demand so we tried, and it s actually better than PubMed in the most egregious cases we tested: diet, stoma, pt edu as topic in all three of those cases it found what was missing from the Medline indexing. Just wait for the NLM to change the algorithm! Changes we d recommend testing: keywords, name-of-journal, abandoning Rule Of Three if the algorithm is using it Concerned about algorithmically indexed records serving as models for the algorithm in the future repeating its own mistakes over and over Look at your own publications, and complain

(were sorry) Questions, comments?

Exploring the impact of automated indexing on completeness of MeSH terms

Download Presentation

Presentation Transcript

Related

More Related Content