Insider's Guide to Accessing NLM Data EDirect for PubMed

The Insider’s Guide to Accessing NLM Data
EDirect for PubMed
Part 1: Getting PubMed Data
Peter Seibert
National Library of Medicine
National Institutes of Health
U.S. Department of Health and Human Services
EDirect for PubMed Agenda
Part 1: Getting PubMed Data
Part 2: Extracting Data from XML
Part 3: Formatting Results and Unix tools
Part 4: xtract Conditional Arguments
Part 5: Developing and Building Scripts
2
Today's Agenda
Brief "Welcome to E-utilities" recap
Intro to Unix
Searching PubMed with esearch
Downloading records with efetch
Basic scripts: creating a data pipeline
3
Keep this theme in mind…
Get 
exactly
 the data you need
…and 
only
 the data you need
…in the 
format
 you need.
4
What is an API?
API: Application Programming Interface
A set of tools, routines, and protocols for
building software applications.
5
The E-utilities API
A set of tools, routines, and protocols that
allows you to interact directly with the data in
20+ NCBI databases, including PubMed, the
MeSH database, and PubMed Central (PMC).
6
The E-utilities API is just a series of
rules for querying a database.
7
URLs as Database Queries
Each query is a URL.
The response depends on how you build the
URL.
Choose a utility to specify a type of query
Select parameters to provide the details
8
E-utilities in a Programming
Environment
Don't have to create each URL by hand
Lets you combine multiple queries in
sequence
More options for manipulating output
Faster, easier, more powerful!
9
EDirect
Developed by NCBI
Set of tools with the E-utilities URL creation
rules built in
Can extract the specific data you need from
the PubMed XML
Works in a Unix environment
10
What is Unix?
An operating system that allows you to
interact directly with your computer
Interact via a command-line interface
a.k.a "shell", "terminal"
Developed in the 1970s
Looks old-fashioned, but still around for a reason
11
Some Unix Philosophy
Built to work with files
Modular design
Each program does one thing well
May be many different ways to do the same thing
Combine multiple programs together in scripts
12
Some Unix terms
Commands
 are instructions given by a user
telling a computer to do something
Arguments 
provide data to be used as input,
or modify the behavior of a command.
Example:
esearch –db pubmed
einfo -dbs
13
Combining Commands Together
Introducing "|"
Shift-backslash (above the Enter key)
"Pipes" the output of one command into the
next
14
Why didn't it work?
With Unix, the details matter
Unix can be "uncommunicative"
Have patience, be willing to experiment
15
Some Unix/EDirect Tips
Test early and often.
Try each command separately.
Use small sets of dummy data.
Know when to ask for help!
16
Tips for Cygwin users
17
Copy: Ctrl + Insert
Not Ctrl + C!
Paste: Shift + Insert
Not Ctrl + V!
Adjustable in Cygwin options.
Tips for all users
Ctrl + C = Cancel
Quick way out of a mistake
Up and Down arrows cycle through history
Helpful to edit or re-run recent commands
"clear" clears your screen
Doesn't clear your history!
18
esearch
Searches a database and returns the unique
identifiers of every record that meets your
search criteria
For PubMed, that would be the PMIDs of
every PubMed record that matches our query.
19
Basic esearch
esearch –db pubmed –query "seasonal affective disorder"
20
View Search Details
21
esearch –db pubmed –query "seasonal affective disorder“ -log
Search like you do in PubMed
Boolean AND/OR/NOT
Field Tags
esearch –db pubmed –query "malaria AND jama[journal]"
22
Restricting by Date
Use -datetype to specify date field
Use -mindate and -maxdate to specify range
search –db pubmed –query "malaria AND jama[journal]" \
–datetype PDAT –mindate 2014 –maxdate 2016
23
Be careful with quotes!
Example: cancer AND "science"[journal]
Our -query is enclosed in quotes.
If you have to use quotes within your search
string, put "\" before them.
esearch -db pubmed -query "cancer AND "science"[journal]"
24
Exercise 1: esearch
 
How many Spanish-language articles about
diabetes are in PubMed?
Hint: consider using the [lang] tag.
25
Exercise 1 Solution
esearch –db pubmed –query “diabetes AND spanish[lang]”
26
Exercise 2: more esearch
How many articles were written by BH Smith
between 2012 and 2017, inclusive?
27
Exercise 2 Solutions
esearch –db pubmed –query “smith bh[Author]” \
–datetype PDAT –mindate 2012 –maxdate 2017
esearch –db pubmed –query “smith bh[Author] \
AND (2012/01/01[pdat] : 2017/12/31[pdat])”
28
efetch
Retrieves full records from PMIDs
Variety of formats
29
efetch Example
efetch –db pubmed –id 25359968 –format abstract
30
efetch Formats
MEDLINE
XML
PMID list
Other options on your handout
–format medline
–format xml
–format uid
31
efetch Multiple Records
efetch –db pubmed –id 24102982,21171099,17150207 \
–format abstract
Comma-separate multiple PMIDs
Note: no spaces between PMIDs!
32
Exercise 3: efetch
Who is the first author listed on the PubMed
record 26287646?
33
Exercise 3 Solution
efetch –db pubmed –id 26287646 –format abstract
Brennan PF
34
Creating a data pipeline
 
Pipes the PMIDs retrieved with esearch, and
uses them as the -id argument for efetch.
Also pipes the -db
35
esearch –db pubmed –query “asthenopia[mh] AND \
nursing[sh]” | efetch –format uid
Tips for scripting
First, check the search to make sure it’s not
too big.
Then combine the commands
Small steps, building blocks
36
Exercise 4: Combining Commands
How do we get a list of PMIDs for all of the
articles written by BH Smith between 2012
and 2017?
Hint: You can use the Up Arrow to access your
previous commands.
Another Hint: Remember "-format uid"
37
Exercise 4 Solution
esearch –db pubmed –query “smith bh[Author] AND \
(2012/01/01[pdat] : 2017/12/31[pdat])” | \
efetch –format uid
38
Coming next class…
Using xtract to create tables from XML!
39
In the meantime…
NCBI Now: Introduction to Linux
https://youtu.be/XgaE4VIaJqI
Insider’s Guide online
https://dataguide.nlm.nih.gov
Sign up for "utilities-announce" mailing list!
Questions?
https://dataguide.nlm.nih.gov/contact
40
Homework
41
Questions?
42
Slide Note
Embed
Share

Learn how to access PubMed data efficiently using EDirect, delve into extracting and formatting data, understand the power of APIs like E-utilities, and discover the ease of using URLs as database queries. Maximize your research capabilities with tips and insights shared in this comprehensive guide.


Uploaded on Oct 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. The Insiders Guide to Accessing NLM Data EDirect for PubMed Part 1: Getting PubMed Data Peter Seibert National Library of Medicine National Institutes of Health U.S. Department of Health and Human Services

  2. EDirect for PubMed Agenda Part 1: Getting PubMed Data Part 2: Extracting Data from XML Part 3: Formatting Results and Unix tools Part 4: xtract Conditional Arguments Part 5: Developing and Building Scripts 2

  3. Today's Agenda Brief "Welcome to E-utilities" recap Intro to Unix Searching PubMed with esearch Downloading records with efetch Basic scripts: creating a data pipeline 3

  4. Keep this theme in mind Get exactly the data you need and only the data you need in the format you need. 4

  5. What is an API? API: Application Programming Interface A set of tools, routines, and protocols for building software applications. 5

  6. The E-utilities API A set of tools, routines, and protocols that allows you to interact directly with the data in 20+ NCBI databases, including PubMed, the MeSH database, and PubMed Central (PMC). 6

  7. The E-utilities API is just a series of rules for querying a database. 7

  8. URLs as Database Queries Each query is a URL. The response depends on how you build the URL. Choose a utility to specify a type of query Select parameters to provide the details 8

  9. E-utilities in a Programming Environment Don't have to create each URL by hand Lets you combine multiple queries in sequence More options for manipulating output Faster, easier, more powerful! 9

  10. EDirect Developed by NCBI Set of tools with the E-utilities URL creation rules built in Can extract the specific data you need from the PubMed XML Works in a Unix environment 10

  11. What is Unix? An operating system that allows you to interact directly with your computer Interact via a command-line interface a.k.a "shell", "terminal" Developed in the 1970s Looks old-fashioned, but still around for a reason 11

  12. Some Unix Philosophy Built to work with files Modular design Each program does one thing well May be many different ways to do the same thing Combine multiple programs together in scripts 12

  13. Some Unix terms Commands are instructions given by a user telling a computer to do something Arguments provide data to be used as input, or modify the behavior of a command. Example: esearch db pubmed einfo -dbs 13

  14. Combining Commands Together Introducing "|" Shift-backslash (above the Enter key) "Pipes" the output of one command into the next 14

  15. Why didn't it work? With Unix, the details matter Unix can be "uncommunicative" Have patience, be willing to experiment 15

  16. Some Unix/EDirect Tips Test early and often. Try each command separately. Use small sets of dummy data. Know when to ask for help! 16

  17. Tips for Cygwin users Copy: Ctrl + Insert Not Ctrl + C! Paste: Shift + Insert Not Ctrl + V! Adjustable in Cygwin options. 17

  18. Tips for all users Ctrl + C = Cancel Quick way out of a mistake Up and Down arrows cycle through history Helpful to edit or re-run recent commands "clear" clears your screen Doesn't clear your history! 18

  19. esearch Searches a database and returns the unique identifiers of every record that meets your search criteria For PubMed, that would be the PMIDs of every PubMed record that matches our query. 19

  20. Basic esearch esearch db pubmed query "seasonal affective disorder" 20

  21. View Search Details esearch db pubmed query "seasonal affective disorder -log 21

  22. Search like you do in PubMed Boolean AND/OR/NOT Field Tags esearch db pubmed query "malaria AND jama[journal]" 22

  23. Restricting by Date Use -datetype to specify date field Use -mindate and -maxdate to specify range search db pubmed query "malaria AND jama[journal]" \ datetype PDAT mindate 2014 maxdate 2016 23

  24. Be careful with quotes! Example: cancer AND "science"[journal] esearch -db pubmed -query "cancer AND "science"[journal]" Our -query is enclosed in quotes. If you have to use quotes within your search string, put "\" before them. 24

  25. Exercise 1: esearch How many Spanish-language articles about diabetes are in PubMed? Hint: consider using the [lang] tag. 25

  26. Exercise 1 Solution esearch db pubmed query diabetes AND spanish[lang] 26

  27. Exercise 2: more esearch How many articles were written by BH Smith between 2012 and 2017, inclusive? 27

  28. Exercise 2 Solutions esearch db pubmed query smith bh[Author] \ datetype PDAT mindate 2012 maxdate 2017 esearch db pubmed query smith bh[Author] \ AND (2012/01/01[pdat] : 2017/12/31[pdat]) 28

  29. efetch Retrieves full records from PMIDs Variety of formats 29

  30. efetch Example efetch db pubmed id 25359968 format abstract 30

  31. efetch Formats MEDLINE format medline XML format xml PMID list format uid Other options on your handout 31

  32. efetch Multiple Records efetch db pubmed id 24102982,21171099,17150207 \ format abstract Comma-separate multiple PMIDs Note: no spaces between PMIDs! 32

  33. Exercise 3: efetch Who is the first author listed on the PubMed record 26287646? 33

  34. Exercise 3 Solution efetch db pubmed id 26287646 format abstract Brennan PF 34

  35. Creating a data pipeline esearch db pubmed query asthenopia[mh] AND \ nursing[sh] | efetch format uid Pipes the PMIDs retrieved with esearch, and uses them as the -id argument for efetch. Also pipes the -db 35

  36. Tips for scripting First, check the search to make sure it s not too big. Then combine the commands Small steps, building blocks 36

  37. Exercise 4: Combining Commands How do we get a list of PMIDs for all of the articles written by BH Smith between 2012 and 2017? Hint: You can use the Up Arrow to access your previous commands. Another Hint: Remember "-format uid" 37

  38. Exercise 4 Solution esearch db pubmed query smith bh[Author] AND \ (2012/01/01[pdat] : 2017/12/31[pdat]) | \ efetch format uid 38

  39. Coming next class Using xtract to create tables from XML! 39

  40. In the meantime NCBI Now: Introduction to Linux https://youtu.be/XgaE4VIaJqI Insider s Guide online https://dataguide.nlm.nih.gov Sign up for "utilities-announce" mailing list! Questions? https://dataguide.nlm.nih.gov/contact 40

  41. Homework 41

  42. Questions? 42

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#