EDirect for PubMed

 
The Insider’s Guide to Accessing NLM Data
 
Part 1: Getting PubMed Data
 
 
National Library of Medicine
National Institutes of Health
U.S. Department of Health and Human Services
 
EDirect for PubMed
 
Peter Seibert
 
EDirect for PubMed Agenda
 
Part 1: Getting PubMed Data
Part 2: Extracting Data from XML
Part 3: Formatting Results and Unix tools
Part 4: xtract Conditional Arguments
Part 5: Developing and Building Scripts
 
2
 
Today's Agenda
 
Brief "Welcome to E-utilities" recap
Intro to Unix
Searching PubMed with esearch
Downloading records with efetch
Basic scripts: creating a data pipeline
 
3
 
Keep this theme in mind…
 
Get 
exactly
 the data you need
…and 
only
 the data you need
…in the 
format
 you need.
 
4
 
What is an API?
 
API: Application Programming Interface
A set of tools, routines, and protocols for
building software applications.
 
5
 
The E-utilities API
 
A set of tools, routines, and protocols that
allows you to interact directly with the data in
20+ NCBI databases, including PubMed, the
MeSH database, and PubMed Central (PMC).
 
6
 
The E-utilities API is just a series of
rules for querying a database.
 
7
 
URLs as Database Queries
 
Each query is a URL.
The response depends on how you build the
URL.
Choose a utility to specify a type of query
Select parameters to provide the details
 
8
 
E-utilities in a Programming
Environment
 
Don't have to create each URL by hand
Lets you combine multiple queries in
sequence
More options for manipulating output
Faster, easier, more powerful!
 
 
9
 
EDirect
 
Developed by NCBI
Set of tools with the E-utilities URL creation
rules built in
Can extract the specific data you need from
the PubMed XML
Works in a Unix environment
 
10
 
What is Unix?
 
An operating system that allows you to
interact directly with your computer
Interact via a command-line interface
a.k.a "shell", "terminal"
Developed in the 1970s
Looks old-fashioned, but still around for a reason
 
 
11
 
Some Unix Philosophy
 
Built to work with files
Modular design
Each program does one thing well
May be many different ways to do the same thing
Combine multiple programs together in scripts
 
12
Some Unix terms
Commands
 are instructions given by a user
telling a computer to do something
Arguments 
provide data to be used as input,
or modify the behavior of a command.
Example:
13
esearch –db pubmed
einfo -dbs
 
Combining Commands Together
 
Introducing "|"
Shift-backslash (above the Enter key)
"Pipes" the output of one command into the
next
 
14
 
Why didn't it work?
 
With Unix, the details matter
Unix can be "uncommunicative"
Have patience, be willing to experiment
 
15
 
Some Unix/EDirect Tips
 
Test early and often.
Try each command separately.
Use small sets of dummy data.
Know when to ask for help!
 
16
 
Tips for Cygwin users
 
Copy: Ctrl + Insert
Not Ctrl + C!
Paste: Shift + Insert
Not Ctrl + V!
Adjustable in Cygwin options.
 
17
 
Tips for all users
 
Ctrl + C = Cancel
Quick way out of a mistake
Up and Down arrows cycle through history
Helpful to edit or re-run recent commands
"clear" clears your screen
Doesn't clear your history!
 
18
esearch
 
Searches a database and returns the unique
identifiers of every record that meets your
search criteria
For PubMed, that would be the PMIDs of
every PubMed record that matches our query.
19
 
Basic esearch
 
20
esearch –db pubmed –query "seasonal affective disorder"
 
View Search Details
 
21
esearch –db pubmed –query "seasonal affective disorder“ -log
 
Search like you do in PubMed
 
Boolean AND/OR/NOT
Field Tags
 
22
esearch –db pubmed –query "malaria AND jama[journal]"
 
Restricting by Date
 
Use -datetype to specify date field
Use -mindate and -maxdate to specify range
 
23
search –db pubmed –query "malaria AND jama[journal]" \
–datetype PDAT –mindate 2015 –maxdate 2017
Be careful with quotes!
 
Example: cancer AND "science"[journal]
 
Our -query is enclosed in quotes.
If you have to use quotes within your search
string, put "\" before them.
24
esearch -db pubmed -query "cancer AND \"science\"[journal]"
esearch -db pubmed -query "cancer AND "science"[journal]"
 
Exercise 1: esearch
 
How many Spanish-language articles about
diabetes are in PubMed?
Hint: consider using the [lang] tag.
 
25
Exercise 1 Solution
26
esearch –db pubmed –query “diabetes AND spanish[lang]”
 
Exercise 2: more esearch
 
How many articles were written by BH Smith
between 2012 and 2017, inclusive?
 
27
Exercise 2 Solutions
28
esearch –db pubmed –query “smith bh[Author]” \
–datetype PDAT –mindate 2012 –maxdate 2017
esearch –db pubmed –query “smith bh[Author] \
AND (2012/01/01[pdat] : 2017/12/31[pdat])”
efetch
 
Retrieves full records from PMIDs
Variety of formats
29
 
efetch Example
 
30
efetch –db pubmed –id 25359968 –format abstract
 
efetch Formats
 
MEDLINE
 
XML
 
PMID list
 
Other options on your handout
 
31
–format medline
–format xml
–format uid
 
efetch Multiple Records
 
Comma-separate multiple PMIDs
Note: no spaces between PMIDs!
 
32
efetch –db pubmed –id 24102982,21171099,17150207 \
–format abstract
 
Exercise 3: efetch
 
Who is the first author listed on the PubMed
record 26287646?
 
33
 
Exercise 3 Solution
efetch –db pubmed –id 26287646 –format abstract
 
 
Brennan PF
 
34
Creating a data pipeline
 
Pipes the PMIDs retrieved with esearch, and
uses them as the -id argument for efetch.
Also pipes the -db
35
esearch –db pubmed –query “asthenopia[mh] AND \
nursing[sh]” | efetch –format uid
 
Tips for scripting
 
First, check the search to make sure it’s not
too big.
Then combine the commands
Small steps, building blocks
 
36
Exercise 4: Combining Commands
 
How do we get a list of PMIDs for all of the
articles written by BH Smith between 2012
and 2017?
Hint: You can use the Up Arrow to access your
previous commands.
Another Hint: Remember "-format uid"
37
Exercise 4 Solution
38
esearch –db pubmed –query “smith bh[Author] AND \
(2012/01/01[pdat] : 2017/12/31[pdat])” | \
efetch –format uid
 
Coming next class…
 
Using xtract to create tables from XML!
 
39
 
In the meantime…
 
NCBI Now: Introduction to Linux
https://youtu.be/XgaE4VIaJqI
Insider’s Guide online
https://dataguide.nlm.nih.gov
Sign up for "utilities-announce" mailing list!
Questions?
https://dataguide.nlm.nih.gov/contact
 
40
 
Homework
 
41
 
Questions?
 
42
Slide Note
Embed
Share

The Insider's Guide to Accessing NLM Data EDirect for PubMed provides valuable insights into utilizing EDirect for accessing PubMed data efficiently. The guide covers various aspects such as getting PubMed data, extracting data from XML, formatting results, Unix tools, extracting conditional arguments, developing scripts, and more.

  • Data Access
  • PubMed
  • NLM
  • EDirect
  • Guides

Uploaded on Feb 15, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. The Insiders Guide to Accessing NLM Data EDirect for PubMed Part 1: Getting PubMed Data Peter Seibert National Library of Medicine National Institutes of Health U.S. Department of Health and Human Services

  2. EDirect for PubMed Agenda Part 1: Getting PubMed Data Part 2: Extracting Data from XML Part 3: Formatting Results and Unix tools Part 4: xtract Conditional Arguments Part 5: Developing and Building Scripts 2

  3. Today's Agenda Brief "Welcome to E-utilities" recap Intro to Unix Searching PubMed with esearch Downloading records with efetch Basic scripts: creating a data pipeline 3

  4. Keep this theme in mind Get exactly the data you need and only the data you need in the format you need. 4

  5. What is an API? API: Application Programming Interface A set of tools, routines, and protocols for building software applications. 5

  6. The E-utilities API A set of tools, routines, and protocols that allows you to interact directly with the data in 20+ NCBI databases, including PubMed, the MeSH database, and PubMed Central (PMC). 6

  7. The E-utilities API is just a series of rules for querying a database. 7

  8. URLs as Database Queries Each query is a URL. The response depends on how you build the URL. Choose a utility to specify a type of query Select parameters to provide the details 8

  9. E-utilities in a Programming Environment Don't have to create each URL by hand Lets you combine multiple queries in sequence More options for manipulating output Faster, easier, more powerful! 9

  10. EDirect Developed by NCBI Set of tools with the E-utilities URL creation rules built in Can extract the specific data you need from the PubMed XML Works in a Unix environment 10

  11. What is Unix? An operating system that allows you to interact directly with your computer Interact via a command-line interface a.k.a "shell", "terminal" Developed in the 1970s Looks old-fashioned, but still around for a reason 11

  12. Some Unix Philosophy Built to work with files Modular design Each program does one thing well May be many different ways to do the same thing Combine multiple programs together in scripts 12

  13. Some Unix terms Commands are instructions given by a user telling a computer to do something Arguments provide data to be used as input, or modify the behavior of a command. Example: esearch db pubmed einfo -dbs 13

  14. Combining Commands Together Introducing "|" Shift-backslash (above the Enter key) "Pipes" the output of one command into the next 14

  15. Why didn't it work? With Unix, the details matter Unix can be "uncommunicative" Have patience, be willing to experiment 15

  16. Some Unix/EDirect Tips Test early and often. Try each command separately. Use small sets of dummy data. Know when to ask for help! 16

  17. Tips for Cygwin users Copy: Ctrl + Insert Not Ctrl + C! Paste: Shift + Insert Not Ctrl + V! Adjustable in Cygwin options. 17

  18. Tips for all users Ctrl + C = Cancel Quick way out of a mistake Up and Down arrows cycle through history Helpful to edit or re-run recent commands "clear" clears your screen Doesn't clear your history! 18

  19. esearch Searches a database and returns the unique identifiers of every record that meets your search criteria For PubMed, that would be the PMIDs of every PubMed record that matches our query. 19

  20. Basic esearch esearch db pubmed query "seasonal affective disorder" 20

  21. View Search Details esearch db pubmed query "seasonal affective disorder -log 21

  22. Search like you do in PubMed Boolean AND/OR/NOT Field Tags esearch db pubmed query "malaria AND jama[journal]" 22

  23. Restricting by Date Use -datetype to specify date field Use -mindate and -maxdate to specify range search db pubmed query "malaria AND jama[journal]" \ datetype PDAT mindate 2015 maxdate 2017 23

  24. Be careful with quotes! Example: cancer AND "science"[journal] esearch -db pubmed -query "cancer AND \"science\"[journal]" esearch -db pubmed -query "cancer AND "science"[journal]" Our -query is enclosed in quotes. If you have to use quotes within your search string, put "\" before them. 24

  25. Exercise 1: esearch How many Spanish-language articles about diabetes are in PubMed? Hint: consider using the [lang] tag. 25

  26. Exercise 1 Solution esearch db pubmed query diabetes AND spanish[lang] 26

  27. Exercise 2: more esearch How many articles were written by BH Smith between 2012 and 2017, inclusive? 27

  28. Exercise 2 Solutions esearch db pubmed query smith bh[Author] \ datetype PDAT mindate 2012 maxdate 2017 esearch db pubmed query smith bh[Author] \ AND (2012/01/01[pdat] : 2017/12/31[pdat]) 28

  29. efetch Retrieves full records from PMIDs Variety of formats 29

  30. efetch Example efetch db pubmed id 25359968 format abstract 30

  31. efetch Formats MEDLINE format medline XML format xml PMID list format uid Other options on your handout 31

  32. efetch Multiple Records efetch db pubmed id 24102982,21171099,17150207 \ format abstract Comma-separate multiple PMIDs Note: no spaces between PMIDs! 32

  33. Exercise 3: efetch Who is the first author listed on the PubMed record 26287646? 33

  34. Exercise 3 Solution efetch db pubmed id 26287646 format abstract Brennan PF 34

  35. Creating a data pipeline esearch db pubmed query asthenopia[mh] AND \ nursing[sh] | efetch format uid Pipes the PMIDs retrieved with esearch, and uses them as the -id argument for efetch. Also pipes the -db 35

  36. Tips for scripting First, check the search to make sure it s not too big. Then combine the commands Small steps, building blocks 36

  37. Exercise 4: Combining Commands How do we get a list of PMIDs for all of the articles written by BH Smith between 2012 and 2017? Hint: You can use the Up Arrow to access your previous commands. Another Hint: Remember "-format uid" 37

  38. Exercise 4 Solution esearch db pubmed query smith bh[Author] AND \ (2012/01/01[pdat] : 2017/12/31[pdat]) | \ efetch format uid 38

  39. Coming next class Using xtract to create tables from XML! 39

  40. In the meantime NCBI Now: Introduction to Linux https://youtu.be/XgaE4VIaJqI Insider s Guide online https://dataguide.nlm.nih.gov Sign up for "utilities-announce" mailing list! Questions? https://dataguide.nlm.nih.gov/contact 40

  41. Homework 41

  42. Questions? 42

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#