MetaMap and Medical Text Indexer for NLP: Advancements in Biomedical Concept Identification

Slide Note
Embed
Share

Cutting-edge tools like MetaMap and the Medical Text Indexer (MTI) are revolutionizing natural language processing in the field of medicine. These tools provide advanced linguistic analysis, word sense disambiguation, and efficient indexing of medical texts. MetaMap excels in named-entity recognition and identifies UMLS Metathesaurus concepts with linguistic rigor, while MTI uses MetaMap and MeSH vocabulary to summarize text. The MetaMap algorithm employs specialist parsers, lexicons, and part-of-speech taggers for variant generation and candidate retrieval, enhancing accuracy in concept mapping. Evaluating functions in MetaMap focus on centrality, variation, coverage, and cohesiveness to ensure comprehensive concept identification. These tools are pivotal in enhancing biomedical research and information retrieval in healthcare.


Uploaded on Oct 08, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the Art, Future Directions April 23, 2012 Alan R. Aronson U. S. National Library of Medicine

  2. Outline Introduction MetaMap Overview Linguistic roots Recent Word Sense Disambiguation (WSD) efforts The NLM Medical Text Indexer (MTI) Overview MTI as First-line Indexer (MTIFL) Recent improvements Gene indexing 2 U. S. National Library of Medicine

  3. MetaMap/MTI Example MetaMap identifies biomedical concepts in text Cigarette smoking increases the mean platelet volume in elderly patients with risk factors for atherosclerosis. Medical Text Indexer (MTI) summarizes text using MetaMap and the Medical Subject Headings (MeSH) vocabulary Cigarette Smoking Tobacco Blood Platelets Aged Humans Risk Factors Arteriosclerosis Atherosclerosis 3 U. S. National Library of Medicine

  4. Outline Introduction MetaMap Overview Linguistic roots Recent Word Sense Disambiguation (WSD) efforts The NLM Medical Text Indexer (MTI) Overview MTI as First-line Indexer (MTIFL) Recent improvements Gene indexing 4 U. S. National Library of Medicine

  5. MetaMap Overview Named-entity recognition program Identify UMLS Metathesaurus concepts in text Linguistic rigor Flexible partial matching Emphasis on thoroughness rather than speed 5 U. S. National Library of Medicine

  6. The MetaMap Algorithm Parsing Using SPECIALIST minimal commitment parser, SPECIALIST lexicon, MedPost part of speech tagger Variant generation Using SPECIALIST lexicon, Lexical Variant Generation (LVG) Candidate retrieval From the Metathesaurus Candidate evaluation Mapping construction 6 U. S. National Library of Medicine

  7. MetaMap Evaluation Function Weighted average of centrality (is the head involved?) variation (average of all variation) coverage (how much of the text is matched?) cohesiveness (in how many pieces?) 7 U. S. National Library of Medicine

  8. C0180860: Filters [mnob] MetaMap Processing Example C0581406: Optical filter [medd] C1522664: filter information process [inpr] C1704449: Filter (function) [cnce] C1704684: Filter Device Component [medd] C1875155: Filter - medical device [medd] Inferior vena caval stent filter (PMID 3490760) Candidate Concepts: 909 C0080306: Inferior Vena Cava Filter [medd] 804 C0180860: Filter [mnob] 804 C0581406: Filter [medd] 804 C1522664: Filter [inpr] 804 C1704449: Filter [cnce] 804 C1704684: Filter [medd] 804 C1875155: FILTER [medd] 717 C0521360: Inferior vena caval [blor] 673 C0042460: Vena caval [bpoc] 637 C0038257: Stent [medd] 637 C1705817: Stent [medd] 637 C0447122: Vena [bpoc] UMLS Semantic Type MetaMap Score ( 1000) Metathesaurus Concept Unique Identifier (CUI) Metathesaurus String C0038257: Stent, device C1705817: Stent Device Component [medd] [medd] 8 U. S. National Library of Medicine

  9. MetaMap Final Mappings Inferior vena caval stent filter Final Mappings (subsets of candidate sets): Meta Mapping (911) 909 C0080306: Inferior Vena Cava Filter [medd] 637 C1705817: Stent [medd] Meta Mapping (911): 909 C0080306: Inferior Vena Cava Filter [medd] 637 C0038257: Stent [medd] 9 U. S. National Library of Medicine

  10. Word Sense Disambiguation (WSD) Kids with colds may also have a sore throat, cough, headache, mild fever, fatigue, muscle aches, and loss of appetite. Candidate MetaMap mappings for cold C0234192: Cold (Cold sensation) C0009264: Cold (Cold temperature) C0009443: Cold (Common cold) 10 U. S. National Library of Medicine

  11. Knowledge-based WSD Compare UMLS candidate concept profile vectors to context of ambiguous word Concept profile vectors words from definition, synonyms and related concepts Common cold Weight 265 126 41 40 Cold temperature Weight 258 86 72 48 Word infect disease fever cough Word temperature hypothermia effect hot Candidate concept with highest similarity is predicted 11 U. S. National Library of Medicine

  12. Knowledge-based WSD Kids with colds may also have a sore throat, cough, headache, mild fever, fatigue, muscle aches, and loss of appetite. Common cold Weight 265 126 41 40 Cold temperature Weight 258 86 72 48 Word infect disease fever cough Word temperature hypothermia effect hot 12 U. S. National Library of Medicine

  13. Automatically Extracted Corpus WSD MEDLINE contains numerous examples of ambiguous words context, though not disambiguated Unambiguous synonyms Candidate concept Query common cold common cold CUI:C0009443 "common cold"[tiab] OR "acute nasopharyngitis"[tiab] cold PubMed cold temperature temperature cold CUI:C0009264 "cold temperature"[tiab] OR "low temperature"[tiab] 13 U. S. National Library of Medicine

  14. WSD Method Results Corpus method has better accuracy than UMLS method UMLS Corpus 0.69 0.84 NLM WSD MSH WSD 0.65 0.81 MSH WSD data set created using MeSH indexing 203 ambiguous words 81 semantic types 37,888 ambiguity cases Indirect evaluation with summarization and MTI correlates with direct evaluation 14 U. S. National Library of Medicine

  15. Outline Introduction MetaMap Overview Linguistic roots Recent Word Sense Disambiguation (WSD) efforts The NLM Medical Text Indexer (MTI) Overview MTI as First-line Indexer (MTIFL) Recent improvements Gene indexing 15 U. S. National Library of Medicine

  16. MEDLINE Citation Example 16 U. S. National Library of Medicine

  17. Title + Abstract Title + Abstract Title + Abstract MTI PubMed Related Citations Citations Citations PubMed PubMed Related Related MetaMap MetaMap MetaMap Indexing Indexing Indexing MetaMap Indexing Actually found in text Received 2,330 Indexer Feedbacks Incorporated 40% into MTI March 20, 2012 UMLS concepts UMLS concepts UMLS concepts Related Citations Related Citations Related Citations Restrict to MeSH Maps UMLS Concepts to MeSH Extract MeSH Descriptors Descriptors Descriptors Extract MeSH Extract MeSH Restrict to MeSH Restrict to MeSH Restrict to MeSH MeSH Main Headings MeSH Main Headings MeSH Main Headings Hibernation should only be indexed for animals, not for PubMed Related Citations Not necessarily found in text "stem cell hibernation" Clustering & Ranking Clustering & Ranking Clustering & Ranking Clove (spice) should not be mapped to the verb "cleave" Ordered list of MeSH Main Headings Ordered list of MeSH Main Headings Ordered list of MeSH Main Headings Apply Indexing Rules CheckTag Expansion Subheading Attachment Subheading Attachment Subheading Attachment Apply Indexing Rules Apply Indexing Rules CheckTag Expansion CheckTag Expansion Final Ordered list of MeSH Headings Final Ordered list of MeSH Headings Final Ordered list of MeSH Headings 17 U. S. National Library of Medicine

  18. MTI Uses Assisted indexing of MEDLINE by Index Section Assisted indexing of Cataloging and History of Medicine Division records Automatic indexing of NLM Gateway meeting abstracts First-line indexing (MTIFL) since February 2011 18 U. S. National Library of Medicine

  19. MTI as First-Line Indexer (MTIFL) Indexing Displays in PubMed Indexing Displays in PubMed as Usual as Usual MTI Processes/ Recommends MeSH MeSH Reviser Reviews Selects Adjusts Approves Approves MTI Processes/ Recommends Reviser Reviews Selects Adjusts Normal MTI Processing Indexer Reviews Selects Selects Indexer Reviews 19 U. S. National Library of Medicine

  20. MTI as First-Line Indexer (MTIFL) Indexing Displays in PubMed Indexing Displays in PubMed as Usual as Usual MTI Processes/ Indexes MeSH MeSH Reviser Reviews Selects Adjusts Approves Approves MTI Processes/ Indexes Reviser Reviews Selects Adjusts 23 MEDLINE Journals Journals 45 MEDLINE Index Section Compares MTI and Reviser Indexing MTIFL Indexer Reviews Selects MTI Processing 20 U. S. National Library of Medicine

  21. CheckTags Machine Learning Results 200k citations for training and 100k citations for testing CheckTag Middle Aged Aged Child, Preschool Adult Male Aged, 80 and over Young Adult Female Adolescent Humans Infant Swine F1 before ML F1 with ML 59.50% 54.67% 45.40% 56.84% 71.14% 30.89% 31.63% 73.84% 42.36% 91.33% 44.69% 74.75% Improvement +58.49 +42.95 +39.29 +37.35 +32.67 +29.39 +28.80 +27.78 +17.61 1.01% 11.72% 6.11% 19.49% 38.47% 1.50% 2.83% 46.06% 24.75% 79.98% 34.39% 71.04% +11.35 +10.30 +3.71 21 U. S. National Library of Medicine

  22. CheckTags Machine Learning Results 200k citations for training and 100k citations for testing CheckTag Middle Aged Aged Child, Preschool Adult Male Aged, 80 and over Young Adult Female Adolescent Humans Infant Swine F1 before ML F1 with ML 59.50% 54.67% 45.40% 56.84% 71.14% 30.89% 31.63% 73.84% 42.36% 91.33% 44.69% 74.75% Improvement +58.49 +42.95 +39.29 +37.35 +32.67 +29.39 +28.80 +27.78 +17.61 1.01% 11.72% 6.11% 19.49% 38.47% 1.50% 2.83% 46.06% 24.75% 79.98% 34.39% 71.04% +11.35 +10.30 +3.71 22 U. S. National Library of Medicine

  23. CheckTags Machine Learning Results 200k citations for training and 100k citations for testing CheckTag Middle Aged Aged Child, Preschool Adult Male Aged, 80 and over Young Adult Female Adolescent Humans Infant Swine F1 before ML F1 with ML 59.50% 54.67% 45.40% 56.84% 71.14% 30.89% 31.63% 73.84% 42.36% 91.33% 44.69% 74.75% Improvement +58.49 +42.95 +39.29 +37.35 +32.67 +29.39 +28.80 +27.78 +17.61 1.01% 11.72% 6.11% 19.49% 38.47% 1.50% 2.83% 46.06% 24.75% 79.98% 34.39% 71.04% +11.35 +10.30 +3.71 23 U. S. National Library of Medicine

  24. MTI - How are we doing? 2008 2008 2008 2008 2008 Medical Text Indexer (MTI): Precision, Recall, and F1 Stats (2008 - Present) 2009 2009 2009 2009 2009 2010 2010 2010 2010 2010 2011 2011 2011 2011 2011 2012 2012 2012 2012 2012 Medical Text Indexer (MTI): Precision, Recall, and F1 Stats (2008 - Present) Medical Text Indexer (MTI): Precision, Recall, and F1 Stats (2008 - Present) Medical Text Indexer (MTI): Precision, Recall, and F1 Stats (2008 - Present) Medical Text Indexer (MTI): Precision, Recall, and F1 Stats (2008 - Present) 0.7500 0.7500 0.7500 0.7500 0.7500 0.6500 0.6500 0.6500 0.6500 0.6500 MTIFL F1 MTIFL F1 Recall Recall Recall Recall Recall 0.5500 0.5500 0.5500 0.5500 0.5500 0.4500 0.4500 0.4500 0.4500 0.4500 F1 F1 F1 0.3500 0.3500 0.3500 0.3500 0.3500 Precision Precision Precision Precision 0.2500 0.2500 0.2500 0.2500 0.2500 Focus on Precision versus Recall Fruition of 2011 Changes 24 U. S. National Library of Medicine

  25. 25 U. S. National Library of Medicine

  26. The Gene Indexing Assistant (GIA) An automated tool to assist the indexer in identifying and creating GeneRIFs Evaluate the article Identify genes Make links to Entrez Gene Suggest geneRIF annotation Anticipated Benefits: Increase in speed Increase in comprehensiveness 26 U. S. National Library of Medicine

  27. The NLM Indexing Initiative Team Alan R. Aronson (Project Leader) James G. Mork (Staff) Fran ois-Michel Lang (Staff) Willie J. Rogers (Staff) Antonio J. Jimeno-Yepes (Postdoctoral Fellow) J. Caitlin Sticco (Library Associate Fellow) http://metamap.nlm.nih.gov 27 U. S. National Library of Medicine

Related