Ultrafast and Memory-Efficient DNA Sequence Alignment Study

 
Ultrafast and memory-efficient
alignment of short DNA sequences
to the human genome
 
Ben Langmead, Cole Trapnell, Mihai Pop, Steven L
Salzberg
 
 
林恩羽  宋曉亞  陳翰平
 
Rationale
 
Existing method(Maq, SOAP, …) 
computational cost of many short reads is
large
Align the 140 billion bases
Maq: 5 cpu-months
SOAP: 3 cpu-years
Clear need for new tools that consume less
time and computational resources
 
Bowtie
 
Ultrafast, memory-efficient
Burrows-Wheeler indexing
25 million reads per CPU hour with a
memory footprint of approximately 1.3 GB
 
Bowtie
 
Aligns 35-base pair reads 
 
25 million /
hour (35 times faster than Maq / 300 times
faster than SOAP)
Small footprint allows Bowtie to run on PC
with 2 GB of RAM
Multiple processor cores 
 to achieve
greater alignment speed
 
 
Compromise
 
Fail to align a small number of reads with
valid alignment, if those reads have multiple
mismatches
Options that increase accuracy at cost some
performance
 
Environment
 
Evaluation
 
Wall-clock time, CPU time
Time to build index is excluded
Reuse pre-computed index
Human
Chimp
Mouse
Dog
Rat
Arabidopsis thaliana
 
Comparison to SOAP and Maq
 
1,000 Genomes project (NCBI Short Read
Archive: SRR001115)
8.84 million reads
Trimmed to 35bp
Aligned to Human Reference Genome (build
36.3)
 
Bowtie V.S. SOAP
99.7% were aligned by both
0.2% were aligned by Bowtie
0.1% were aligned by SOAP
 
Bowtie V.S. Maq
96.0% were aligned by both
0.1% were aligned by Bowtie
3.9% were aligned by Maq
 
Maq is more flexible 
 allows 3 mismatches
 
Maq with filtered read set
 
‘poly-A’ artifacts
‘catfilter’ 
 438,145 reads
 
Read length and performance
 
Lengths of reads supported:
Bowtie – 1024bp
Maq – 127bp (v0.7.0)
SOAP – 60bp
Align 36-bp set, 50-bp set, 76-bp set to
human genome
36-bp set (SRR003084)
50-bp set (SRR003092)
76-bp set (SRR003196)
 
Parameters
 
Bowtie
‘-v 2’ 
 allows 2 mismatches [SOAP]
‘--maxns 5’ 
 filter out reads with 5 or more no-
confidence bases [SOAP]
‘-z’ 
 only forward index is resident in memory
at one time [Maq]
 
Parallel performance
 
Bowtie allows the user to specify a
 
desired
number of threads
 
The memory image of the
 
index is shared by
all threads
 
Parallel performance
 
Index building
 
Bowtie uses a flexible indexing algorithm
that can be
 
configured to trade off between
memory usage and running
 
time
 
Discussion
 
Unlike SOAP,
   
Bowtie‘s 1.3 GB memory
footprint allows it to run on a typical PC with
2 GB of RAM
 
Unlike many other short-read aligners,
Bowtie creates a permanent
 
index of the
reference that may be re-used across
alignment runs
 
Discussion
 
Bowtie‘s speed and small memory footprint
are due chiefly to
 
its use of the Burrows-
Wheeler index
 
Does not yet support paired-end alignment
or alignments
 
with insertions or deletions
Slide Note
Embed
Share

"This study discusses the development of Bowtie, a tool for aligning short DNA sequences to the human genome. Bowtie offers ultrafast and memory-efficient alignment, outperforming previous methods like Maq and SOAP. The tool achieves high alignment speeds with a small memory footprint, making it suitable for various computational environments."

  • DNA sequencing
  • Genome alignment
  • Bioinformatics
  • Computational efficiency
  • Bowtie

Uploaded on Sep 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg

  2. Rationale Rationale Existing method(Maq, SOAP, ) computational cost of many short reads is large Align the 140 billion bases Maq: 5 cpu-months SOAP: 3 cpu-years Clear need for new tools that consume less time and computational resources

  3. Bowtie Bowtie Ultrafast, memory-efficient Burrows-Wheeler indexing 25 million reads per CPU hour with a memory footprint of approximately 1.3 GB

  4. Bowtie Bowtie Aligns 35-base pair reads 25 million / hour (35 times faster than Maq / 300 times faster than SOAP) Small footprint allows Bowtie to run on PC with 2 GB of RAM Multiple processor cores to achieve greater alignment speed

  5. Compromise Compromise Fail to align a small number of reads with valid alignment, if those reads have multiple mismatches Options that increase accuracy at cost some performance

  6. Environment Environment PC PC Server Server Intel Core 2 2.4GHz AMD Opteron (4-core) 2GB RAM 32GB RAM Red Red Hat Enterprise Linux AS Release 4 Hat Enterprise Linux AS Release 4

  7. Evaluation Evaluation Wall-clock time, CPU time Time to build index is excluded Reuse pre-computed index Human Chimp Mouse Dog Rat Arabidopsis thaliana

  8. Comparison to SOAP and Comparison to SOAP and Maq Maq 1,000 Genomes project (NCBI Short Read Archive: SRR001115) 8.84 million reads Trimmed to 35bp Aligned to Human Reference Genome (build 36.3)

  9. Bowtie V.S. SOAP Bowtie V.S. SOAP 99.7% were aligned by both 0.2% were aligned by Bowtie 0.1% were aligned by SOAP Bowtie V.S. Bowtie V.S. Maq 96.0% were aligned by both 0.1% were aligned by Bowtie 3.9% were aligned by Maq Maq Maq is more flexible allows 3 mismatches

  10. Maq Maq with filtered read set with filtered read set poly-A artifacts catfilter 438,145 reads

  11. Read length and performance Read length and performance Lengths of reads supported: Bowtie 1024bp Maq 127bp (v0.7.0) SOAP 60bp Align 36-bp set, 50-bp set, 76-bp set to human genome 36-bp set (SRR003084) 50-bp set (SRR003092) 76-bp set (SRR003196)

  12. Parameters Parameters Bowtie -v 2 allows 2 mismatches [SOAP] --maxns 5 filter out reads with 5 or more no- confidence bases [SOAP] -z only forward index is resident in memory at one time [Maq]

  13. Parallel performance Parallel performance Bowtie allows the user to specify a desired number of threads The memory image of the index is shared by all threads

  14. Parallel performance Parallel performance

  15. Index building Index building Bowtie uses a flexible indexing algorithm that can be configured to trade off between memory usage and running time

  16. Discussion Discussion Unlike SOAP, Bowtie s 1.3 GB memory footprint allows it to run on a typical PC with 2 GB of RAM Unlike many other short-read aligners, Bowtie creates a permanent index of the reference that may be re-used across alignment runs

  17. Discussion Discussion Bowtie s speed and small memory footprint are due chiefly to its use of the Burrows- Wheeler index Does not yet support paired-end alignment or alignments with insertions or deletions

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#