Exploring the impact of automated indexing on completeness of MeSH terms

Exploring the impact of
automated indexing on
completeness of MeSH
terms
Alexandre Amar-Zifkin MLIS
– Université de Montréal
Virginie Paquet MIS
– Université de Montréal
Taline Ekmekjian MLIS
– Public Health Agency of Canada-
Agence de la santé publique du Canada
Tara Landry MLIS
– Université de Montréal
 This work is licensed under CC BY-SA 4.0
Exploring the impact of
automated indexing on
completeness of MeSH
terms
Alexandre Amar-Zifkin MLIS
– Université de Montréal
Virginie Paquet MIS
– Université de Montréal
Taline Ekmekjian MLIS
– Public Health Agency of Canada-
Agence de la santé publique du Canada
Tara Landry MLIS
– Université de Montréal
 This work is licensed under CC BY-SA 4.0, I guess
Exploring the impact of
automated indexing on
completeness of MeSH
terms
Why?
-
The novelty of Automated Indexing,
and its impact on our teaching
-
Questions raised by students
and residents about indexing
-
Our own incidental observations
of fairly obvious missing index terms
Why?
What do we mean by ‘fairly obvious’?
Why?
What do we mean by ‘fairly obvious’?
MeSH in our practice
In our teaching and searching practices, MeSH is positioned,
understood and used as 
a reliable indicator of important (including
when implicit) concepts within a publication – ‘aboutness’.
NLM
Limiting your searches to MeSH provides precision that “keyword” searching
cannot.
MeSH searches allow you to search the set of indexed records with some
assurance that the results are really about the topic you are searching, limiting
the “false hits.”
This is particularly true if you apply the Major Topic [majr] designation. 
How does Automated Indexing work?
Think automated term mapping.
TF-IDF (less frequent terms from the entirety of Medline count for more /
more frequent terms count for less)
Titles count double
Looks at 
‘neighbors’
 - papers with similar titles-and-abstracts - and pulls their
MeSH in to the list.
It does not look at the full text or keywords; humans did
It does not look at journal title, we assume that humans could.
How do we look
into this?
Unfiltered Medline is a lot.
Didn’t want to cherry-pick
known issues
What did we end up testing?
How often do automated-indexed Medline records receive MeSH
which accurately (or inaccurately) represent their essential concepts?
Within that, are there trends?
What does this mean for
how we teach new learners?
how we search in a rush?
Methods – gathering a sample
20-day range of Medline records (skipping weekends and holidays)
from after the switch to automated indexing.
For each day, 50 randomly selected records:
automated-indexed, medline, English.
Sample queries:
20230206.ed and automated.ig and medline.st and english.la (3420)
from 1 keep (50 unique random numbers between 1-3420)
Methods – screening
Each record (source-title-abstract)
assessed by 2 librarians, who indicate
Is the article ‘health’ as opposed to ‘life
sciences’? (if so, keep)
Is there enough information to ID
concepts? (if so, keep)
If yes, ID the essential concepts
generally not ‘type of study’ or ‘location’
Expected this to map to P and I, but
sometimes there had to be something
else contextual
Then, the same librarians
look at the actual MeSH applied to the
record, and
indicate agreement between their
essential concepts and the MeSH.
Photo by 
davisuko
 on 
Unsplash
Methods – tool
Twice, for each record.
Discussion – Issues in indexing
1.
There’s 
a MeSH here that shouldn’t be
2.
There are one or more MeSH where better ones exist (or, the
subheadings got confused)
3.
Something obvious is 
 
   missing
Photo by 
Nik
 on 
Unsplash
#1: There’s 
a MeSH here that shouldn’t be
Generally caused by:
Acronyms and rhetoric – but these are incidental and easy to skip?
Neighbors – these are more complicated.
If everything else is ok, this isn’t catastrophic … unless you’re
searching for the irrelevant term. Then it’s noisy.
#1: There’s 
a MeSH here that shouldn’t be –
Acronyms
https://pubmed.ncbi.nlm.nih.gov/36731271/
#1: There’s 
a MeSH here that shouldn’t be –
Rhetoric
https://pubmed.ncbi.nlm.nih.gov/34121610/
#1: There’s 
a MeSH here that shouldn’t be –
Neighbours
https://pubmed.ncbi.nlm.nih.gov/36440479/
#2: 
There are one or more MeSH where better
ones exist 
/ confused subheadings
https://pubmed.ncbi.nlm.nih.gov/36744473/
https://www.ncbi.nlm.nih.gov/mesh/?term=pregnancy+in+adolescence
#2: 
There are one or more MeSH where better
ones exist 
/ confused subheadings
https://pubmed.ncbi.nlm.nih.gov/36820744/
#2: 
There are one or more MeSH where better
ones exist 
/ confused subheadings
https://pubmed.ncbi.nlm.nih.gov/33617292/
#2: 
There are one or more MeSH where better
ones exist 
/ 
confused subheadings
#3 : Something obvious is missing
https://pubmed.ncbi.nlm.nih.gov/35075912/
#3 : Something obvious is missing
https://pubmed.ncbi.nlm.nih.gov/35428357/
 
#3 : Something obvious is missing
https://pubmed.ncbi.nlm.nih.gov/36730510/
Results
From 998 records (1000, minus 2 duplicates)
287 excluded on the basis of not being Health, or not having enough
information to go on.
From 711 records:
61% (438) agreement (either MeSH is accurate or inaccurate)
39% (273) disagreement (one screener says MeSH is accurate; one disagrees)
Results, after reconciliation
From 998 records (1000, minus 2 duplicates)
287 excluded on the basis of not being Health, or not having enough
information to go on.
From 711 records:
53% (376): MeSH and our concepts were in accordance
47% (334): one or more inadequacies in the MeSH
Weaknesses and gaps
WE ARE NOT INDEXERS
Records with no abstracts were not handled consistently
might’ve inflated either side of the results?
Subheadings were challenging to assess
When both screeners said the indexing was fine, or missing, we
didn’t second-guess them – might’ve been different qualms
No comparison to human-indexed records
No sense of impact on retrieval in a given topic area
MeSH no longer consistent indicator of
‘aboutness’
Foundations that we took for granted are
… not here anymore.
Mechanisms of control and precision are no
longer reliable for searchers
Work of indexing and QA has been
reallocated to users
Conclusions
Absolutely can’t rely on pure-mesh filters
ex. not (child/ not (child/ and adult/))
Textwords then (catch the unindexed)
and now (catch the poorly indexed)
Blind spot for MeSH derived from full-
text.
One example of this: racialized groups in
https://pubmed.ncbi.nlm.nih.gov/34573423/
Conclusions, continued
Can no longer honestly say that ‘if
the abstract says the study
excluded diabetics, it won’t have
Diabetes in the MeSH’
Guidance to use most specific
MeSH is now actively harmful.
Conclusions, continued (2)
Write lifelessly
Be wary of acronyms and neologisms
and rhetorical flourishes
https://pubmed.ncbi.nlm.nih.gov/36928038/
Discussion / ways forward
Discussion / ways forward
NLM suggests trying out MeSH On Demand …
… so we tried, and it’s actually better than PubMed in
the most egregious cases we tested: diet, stoma,
pt edu as topic… 
in all three of those cases it found
what was missing from the Medline indexing.
Just wait for the NLM to change the algorithm!
Changes we’d recommend testing: keywords,
name-of-journal, abandoning ‘Rule Of Three’ if the
algorithm is using it
Concerned about algorithmically indexed records
serving as models for the algorithm in the future –
repeating its own mistakes over and over…
Look at your own publications, and
complain
(we’re sorry)
Questions, comments?
Slide Note

CC BY-SA 4.0

Attribution-ShareAlike 4.0 International

This license requires that reusers give credit to the creator. It allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, even for commercial purposes. If others remix, adapt, or build upon the material, they must license the modified material under identical terms.

BY: Credit must be given to you, the creator.

SA: Adaptations must be shared under the same terms.

Embed
Share

This study delves into the effects of automated indexing on the thoroughness of MeSH terms. It addresses the novelty of automated indexing, its implications for teaching, questions raised by students, observed missing index terms, and the significance of MeSH in practice. The explanation of how automated indexing works sheds light on TF-IDF methodology.

  • Automated Indexing
  • MeSH Terms
  • Information Retrieval
  • NLM
  • TF-IDF

Uploaded on Apr 19, 2024 | 4 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Exploring the impact of automated indexing on completeness of MeSH terms Alexandre Amar-ZifkinMLIS Universit de Montr al Virginie Paquet MIS Universit de Montr al Taline Ekmekjian MLIS Public Health Agency of Canada- Agence de la sant publique du Canada Tara Landry MLIS Universit de Montr al This work is licensed under CC BY-SA 4.0

  2. Exploring the impact of automated indexing on completeness of MeSH terms Alexandre Amar-ZifkinMLIS Universit de Montr al Virginie Paquet MIS Universit de Montr al Taline Ekmekjian MLIS Public Health Agency of Canada- Agence de la sant publique du Canada Tara Landry MLIS Universit de Montr al This work is licensed under CC BY-SA 4.0, I guess

  3. Exploring the impact of automated indexing on completeness of MeSH terms Why? - The novelty of Automated Indexing, and its impact on our teaching - Questions raised by students and residents about indexing - Our own incidental observations of fairly obvious missing index terms

  4. Why? What do we mean by fairly obvious ?

  5. Why? What do we mean by fairly obvious ?

  6. MeSH in our practice In our teaching and searching practices, MeSH is positioned, understood and used as a reliable indicator of important (including when implicit) concepts within a publication aboutness . NLM Limiting your searches to MeSH provides precision that keyword searching cannot. MeSH searches allow you to search the set of indexed records with some assurance that the results are really about the topic you are searching, limiting the false hits. This is particularly true if you apply the Major Topic [majr] designation.

  7. How does Automated Indexing work? Think automated term mapping. TF-IDF (less frequent terms from the entirety of Medline count for more / more frequent terms count for less) Titles count double Looks at neighbors -papers with similar titles-and-abstracts - and pulls their MeSH in to the list. It does not look at the full text or keywords; humans did It does not look at journal title, we assume that humans could.

  8. How do we look into this? Unfiltered Medline is a lot. Didn t want to cherry-pick known issues

  9. What did we end up testing? How often do automated-indexed Medline records receive MeSH which accurately (or inaccurately) represent their essential concepts? Within that, are there trends? What does this mean for how we teach new learners? how we search in a rush?

  10. Methods gathering a sample 20-day range of Medline records (skipping weekends and holidays) from after the switch to automated indexing. For each day, 50 randomly selected records: automated-indexed, medline, English. Sample queries: 20230206.ed and automated.igand medline.st and english.la (3420) from 1 keep (50 unique random numbers between 1-3420)

  11. Methods screening Each record (source-title-abstract) assessed by 2 librarians, who indicate Is the article health as opposed to life sciences ? (if so, keep) Is there enough information to ID concepts? (if so, keep) If yes, ID the essential concepts generally not type of study or location Expected this to map to P and I, but sometimes there had to be something else contextual Then, the same librarians look at the actual MeSH applied to the record, and indicate agreement between their essential concepts and the MeSH. Photo by davisuko on Unsplash

  12. Methods tool PMID Screener 1 Journal Title Abstract OK? Concepts MeSH Agreement b/w Concepts & MeSH? Comments 36731271 Alex Epilepsy Research. 190:107088, 2023 02. Indications for continuous electro- encephalographic (cEEG) monitoring: What do they tell us? (omitted for space reasons) y EEG ; epilepsy ; length of monitoring? *Epilepsy Yes, but Lupus? Monitoring, Physiologic Also, Neuro- physiological Monitoring exists. Electro- encephalography *Lupus Erythematosus, Systemic Twice, for each record.

  13. Discussion Issues in indexing 1. There s a MeSH here that shouldn t be 2. There are one or more MeSH where better ones exist (or, the subheadings got confused) 3. Something obvious is missing Photo by Nikon Unsplash

  14. #1: Theres a MeSH here that shouldnt be Generally caused by: Acronyms and rhetoric but these are incidental and easy to skip? Neighbors these are more complicated. If everything else is ok, this isn t catastrophic unless you re searching for the irrelevant term. Then it s noisy.

  15. #1: Theres a MeSH here that shouldnt be Acronyms https://pubmed.ncbi.nlm.nih.gov/36731271/

  16. #1: Theres a MeSH here that shouldnt be Rhetoric https://pubmed.ncbi.nlm.nih.gov/34121610/

  17. #1: Theres a MeSH here that shouldnt be Neighbours https://pubmed.ncbi.nlm.nih.gov/36440479/

  18. #2: There are one or more MeSH where better ones exist / confused subheadings https://pubmed.ncbi.nlm.nih.gov/36744473/

  19. #2: There are one or more MeSH where better ones exist / confused subheadings https://www.ncbi.nlm.nih.gov/mesh/?term=pregnancy+in+adolescence

  20. #2: There are one or more MeSH where better ones exist / confused subheadings https://pubmed.ncbi.nlm.nih.gov/36820744/

  21. #2: There are one or more MeSH where better ones exist / confused subheadings https://pubmed.ncbi.nlm.nih.gov/33617292/

  22. #3: Something obvious is missing https://pubmed.ncbi.nlm.nih.gov/35075912/

  23. #3: Something obvious is missing https://pubmed.ncbi.nlm.nih.gov/35428357/

  24. #3: Something obvious is missing https://pubmed.ncbi.nlm.nih.gov/36730510/

  25. Results From 998 records (1000, minus 2 duplicates) 287 excluded on the basis of not being Health, or not having enough information to go on. From 711 records: 61% (438) agreement (either MeSH is accurate or inaccurate) 39% (273) disagreement (one screener says MeSH is accurate; one disagrees)

  26. Results, after reconciliation From 711 records: 53% (376): MeSH and our concepts were in accordance 47% (334): one or more inadequacies in the MeSH

  27. Weaknesses and gaps WE ARE NOT INDEXERS Records with no abstracts were not handled consistently might ve inflated either side of the results? Subheadings were challenging to assess When both screeners said the indexing was fine, or missing, we didn t second-guess them might ve been different qualms No comparison to human-indexed records No sense of impact on retrieval in a given topic area

  28. Conclusions MeSH no longer consistent indicator of aboutness Foundations that we took for granted are not here anymore. Mechanisms of control and precision are no longer reliable for searchers Work of indexing and QA has been reallocated to users

  29. Conclusions, continued Absolutely can t rely on pure-mesh filters ex. not (child/ not (child/ and adult/)) Textwords then (catch the unindexed) and now (catch the poorly indexed) Blind spot for MeSH derived from full- text. One example of this: racialized groups in https://pubmed.ncbi.nlm.nih.gov/34573423/

  30. Conclusions, continued (2) Can no longer honestly say that if the abstract says the study excluded diabetics, it won t have Diabetes in the MeSH Guidance to use most specific MeSH is now actively harmful.

  31. Discussion / ways forward Write lifelessly Be wary of acronyms and neologisms and rhetorical flourishes https://pubmed.ncbi.nlm.nih.gov/36928038/

  32. Discussion / ways forward NLM suggests trying out MeSH On Demand so we tried, and it s actually better than PubMed in the most egregious cases we tested: diet, stoma, pt edu as topic in all three of those cases it found what was missing from the Medline indexing. Just wait for the NLM to change the algorithm! Changes we d recommend testing: keywords, name-of-journal, abandoning Rule Of Three if the algorithm is using it Concerned about algorithmically indexed records serving as models for the algorithm in the future repeating its own mistakes over and over Look at your own publications, and complain

  33. (were sorry) Questions, comments?

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#