GenPIP: In-Memory Acceleration of Genome Analysis

undefined
Haiyu Mao
,
 
Mohammed Alser
,
 
 Mohammad Sadrosadati
,
 Can Firtina
,
Akanksha
 
Baranwal
,
 
Damla Senol Cali
,
 Aditya Manglik
,
 Nour Almadhoun Alserr
,
 Onur Mutlu
GenPIP 
In-Memory Acceleration of Genome Analysis via
Tight Integration of Basecalling and Read Mapping
 
Overview:
 
Genome
 
Analysis
 
Genome 
analysis
: 
Enables us to determine the order of the DNA
sequence in an organism’s genome
o
Plays a
n
 
imp
ortant
 role 
in
Personalized medicine
Outbreak tracing
Understanding of evolution
2
 
ONT
 
sequencing
 
device
 
[forbes.com]
 
Modern genome sequencing machines extract smaller randomized
fragments of the original DNA sequence, known as 
reads
o
Oxford Nanopore Technolog
ies
 (ONT)
:
        
A
 
widely-used sequencing technolog
y
P
ortable sequencing devices
High-throughput
Cheap
Overview:
 
Two
 
Limitations
3
Multiple steps in genome analysis
Large data movement
between 
multiple 
steps
A
 
lot
 
of
 
w
asted computation
done on data that
 
is
later discovered to be
useless
Overview:
 
GenPIP
4
 
 
 
 
GenPIP
:
 
A
 fast and energy-efficient 
in-memory
 acceleration system for the
Gen
ome analysis 
PIP
eline 
via
 
tight
 
integration of
 
genome
 
analysis
 
steps
 
GenPIP
 
has
 
two key techniques
 
GenPIP
 outperforms state-of-the-art software & hardware solutions using
CPU
, 
GPU
, and 
optimistic 
PIM
 
by
 
41.6×
,
 
8.4x
,
 
and
 
1.4x
,
 
respectively.
 
 
o
E
arly rejection (ER)
Timely stop
s
 the execution on useless 
data
 by predicting which
reads will not be useful
 
o
C
hunk-based pipeline (CP
)
P
rovide
s
 fine-grained collaboration 
of
 
genome
 
analysis
 
steps
Outline
Background
 
and
 
Motivation
GenPIP:
 
Tight
 
Integration of
 
Genome
 
Analysis
 
Steps
o
Chunk-based
 
Pipeline
 
(CP)
o
Early
 
Rejection
 
(ER)
 
GenPIP Implementation
Evaluation 
Conclusion
5
 
High-quality
 
Read
Genome
 
Analysis
 
Pipeline
6
 
1.
 
Basecalling
 
Mapped
 
Unmapped
 
Low-quality
Limitation 1: Large Data Movement
7
Basecalling
Reads
Large
 
data
 
movement
 
between
 
genome
 
analysis
 
steps
 
3
9
1
3
 
G
B
 
5
4
6
 
G
B
 
4
3
7
 
G
B
 
3
8
2
 
G
B
Using
 
a
 
human
 
dataset
 
in
 
[
NC’19
]
  
as
 
an
 
example:
[NC'19]
 
Rory Bowden, Robert W Davies, Andreas Heger, Alistair T Pagnamenta, Mariateresa de Cesare, Laura E Oikkonen, Duncan Parkes, Colin Freeman, Fatima Dhalla, Smita Y Patel,
et al. Sequencing of human genomes with nanopore technology. Nature Communications, 2019.
Using
 
a
 
human
 
dataset
 
in
 
[
NC’19
]
 
as
 
an
 
example:
Limitation 2: Wasted
 
Computation
8
Basecalling
Reads
A
 
considerable
 
amount
 
of computation on
 
useless
 
data 
due to
o
Low-quality
 
reads
o
Unmapped
 
reads
 
1
0
0
%
 
7
9
.
5
%
 
6
9
.
5
%
 
2
0
.
5
%
 
1
0
%
[NC'19]
 
Rory Bowden, Robert W Davies, Andreas Heger, Alistair T Pagnamenta, Mariateresa de Cesare, Laura E Oikkonen, Duncan Parkes, Colin Freeman, Fatima Dhalla, Smita Y Patel,
et al. Sequencing of human genomes with nanopore technology. Nature Communications, 2019.
State-of-the-art Works
 
NVM-based
 
PIM
 
is
 
an
 
efficient technique to
 
reduce
 
data
movement
 
by
 
processing
 
data
 
using
 
or
 
near
 
memory
 
 
 
 
 
o
Reduce the data movement in a single
 
genome
 
analysis
 step
o
Exacerbate the data movement
 
overhead
 between analysis steps
9
Basecalling
Reads
No 
prior work tackles 
data movement between analysis steps
and reduces 
useless computation
Goal
 
and
 
Opportunities
10
 
We
 
perform a study to quantify potential performance benefits
o
Results
 
are
 
normalized
 
to
 
the
 
performance
 
of
 
GPU
 
 
Goal:
 
Efficiently
 
accelerate
 
the entire genome analysis pipeline
while 
minimizing data movement and useless computation
Outline
Background and Motivation
GenPIP:
 
Tight
 
Integration of
 
Genome
 
Analysis
 
Steps
o
Chunk-based
 
Pipeline
 
(CP)
o
Early
 
Rejection
 
(ER)
 
GenPIP Implementation
Evaluation 
Conclusion
11
First
 
holistic
 
in-memory
 
accelerator
 
for
 
the genome analysis pipeline
,
including basecalling, read quality control, and read mapping steps
 
 
GenPIP
12
 
 
o
C
hunk-based 
P
ipeline (CP
)
Enables
 
fine-grained 
pipelining
 
of 
genome
 
analysis
 
steps
Processes
 
reads at 
chunk
 granularity (i.e., a subsequence
;
 300 bases)
 
o
E
arly 
R
ejection (ER)
 
 
GenPIP
 
has
 
two key techniques
Chunk-based
 
Pipeline
 
(CP)
 
C
P
 
i
ncrease
s
 parallelism 
by overlapping the execution of different steps
 
at
chuck
 
granularity
CP
 
reduces
 
intermediate data
 
by
 
computing
 
on
 
data
 
as
 
soon
 
as
 
data
 
is
generated
CP
 
provides
 
opportunities
 
for
 
ER
 
by
 
analyzing
 
a
 
read
 
at
 
chunk
 
granularity
13
Q
C
R
e
a
d
R
e
a
d
 
M
a
p
p
i
n
g
R
e
a
d
 
C
o
n
v
e
n
t
i
o
n
a
l
P
i
p
e
l
i
n
e
B
a
s
e
c
a
l
l
i
n
g
C
1
B
a
s
e
c
a
l
l
i
n
g
C
1
B
a
s
e
c
a
l
l
i
n
g
C
2
B
a
s
e
c
a
l
l
i
n
g
C
3
B
a
s
e
c
a
l
l
i
n
g
C
4
Q
C
:
 
Q
u
a
l
i
t
y
 
c
o
n
t
r
o
l
R
M
:
 
R
e
a
d
 
m
a
p
p
i
n
g
 
C
h
u
n
k
-
b
a
s
e
d
P
i
p
e
l
i
n
e
 
A
 
r
e
a
d
 
c
o
n
s
i
s
t
s
 
o
f
 
f
o
u
r
 
c
h
u
n
k
s
:
 
C
1
,
 
C
2
,
 
C
3
,
 
C
4
First
 
holistic
 
in-memory
 
accelerator
 
for
 
the genome analysis pipeline
,
including basecalling, read quality control, and read mapping steps
 
 
GenPIP
14
o
C
hunk-based 
P
ipeline (CP
)
Enables
 
fine-grained collaboration 
of 
genome
 
analysis
 
steps by
processing reads at chunk granularity (i.e., a subsequence of a read,
e.g., 300 bases)
 
o
E
arly 
R
ejection (ER)
Stops
 
the
 
execution
 
on
 
useless
 
reads
 
as
 
early
 
as
 
possible
 
by
 
using
 
a
small
 
number
 
of
 
chunks
 
to
 
predict
 
the
 
usefulness of
 
a
 
read
GenPIP
 
has
 
two key techniques
 
 
Early
 
Rejection
 
(ER)
Predict
 
and
 
eliminate 
low-quality and unmapped reads 
from
 the genome
analysis pipeline
 
as
 
early
 
as
 
possible
15
B
a
s
e
c
a
l
l
a
 
s
m
a
l
l
n
u
m
b
e
r
o
f
c
h
u
n
k
s
B
a
s
e
c
a
l
l
m
o
r
e
c
h
u
n
k
s
B
a
s
e
c
a
l
l
t
h
e
r
e
m
a
i
n
i
n
g
c
h
u
n
k
s
 
Early-Rejection based on chunk quality scores
 
(ER-QSR)
o
Predict
 
low-quality
 
reads
 
using
 
chunk
 
quality
 
scores
 
 
Early-Rejection based on chunk 
mapping
 
scores
 
(ER-CMR)
o
Predict
 
unmapped
 
reads
 
using
 
chunk
 
mapping
 
scores
 
 
 
Implementation
 
of
 
CP
 
and
 
ER
16
CP
 
and
 
ER
 
can
 
be
 
applied
 
on
different
 
systems,
 
e.g.,
 
CPU,
 
GPU,
 
and
 
PIM
We implement CP and ER using
 
PIM
 
since
 
PIM
 
is
more
 
efficient
 
to
 
reduce
 
the
 
data
 
movement
between
 
genome
 
analysis
 
steps
We also apply CP and ER on CPU and GPU baselines
and observe speedup
 
and energy savings
Outline
Background
 
and Motivation
GenPIP:
 
Tight
 
Integration of
 
Genome
 
Analysis
 
Steps
o
Chunk-based
 
Pipeline
 
(CP)
o
Early
 
Rejection
 
(ER)
 
GenPIP Implementation
Evaluation 
Conclusion
17
G
e
n
P
I
P
GenPIP
 
Implementation
18
e
D
R
A
M
I
n
-
m
e
m
o
r
y
R
e
a
d
 
M
a
p
p
i
n
g
[
P
A
R
C
,
 
A
S
P
D
A
C
2
0
]
+
 
O
u
r
 
d
e
s
i
g
n
I
n
-
m
e
m
o
r
y
B
a
s
e
c
a
l
l
e
r
[
H
e
l
i
x
,
 
P
A
C
T
2
0
]
P
I
M
-
C
Q
S
PIM
 
chunk quality
score calculation
 
https://arxiv.org/pdf/2209.08600.pdf
G
e
n
P
I
P
GenPIP
 
Implementation
19
e
D
R
A
M
I
n
-
m
e
m
o
r
y
R
e
a
d
 
M
a
p
p
i
n
g
O
u
r
 
d
e
s
i
g
n
 
+
[
P
A
R
C
,
 
A
S
P
D
A
C
2
0
]
I
n
-
m
e
m
o
r
y
B
a
s
e
c
a
l
l
e
r
[
H
e
l
i
x
,
 
P
A
C
T
2
0
]
P
I
M
-
C
Q
S
PIM
 
chunk quality
score calculation
Tightly integrating
 
the
 
genome
 
analysis
 
steps
o
Reduces
 
data
 
movement
o
Eliminates
 
useless
 
computation
Outline
Background and Motivation
GenPIP:
 
Tight
 
Integration of
 
Genome
 
Analysis
 
Steps
o
Chunk-based
 
Pipeline
 
(CP)
o
Early
 
Rejection
 
(ER)
 
GenPIP Implementation
Evaluation 
Conclusion
20
Evaluation Methodology
21
 
Performance, Area and Power Analysis:
o
Simulation
 
via
 
Verilog HDL
,
 
NVSim
 
[TCAD’12],
 
and
 
CACTI 6.5
 
[MICRO’07]
o
See
 
methodology
 
in
 
the
 
paper for
 
more
Baseline
s
:
o
CPU
 
(
Intel Xeon Gold 5118 CPU
)
o
GPU
 
(
NVIDIA GeForce RTX 2080 Ti GPU
)
o
Optimistic integration of
 
two
 
PIM
 
accelerators
 
(
Helix [PACT’20]
 
and
PARC [ASP-DAC’20]
)
Assumes
 
no
 
data
 
movement
 
between
 
steps
Assumes
 
intermediate data
 
causes
 
no
 
overhead
Datasets:
o
E.
 
coli
 
(
http://lab.loman.net/2016/07/30/nano pore- r9- data- release/
)
o
Human
 
(
https://www.ebi.ac.uk/ena/browser/view/PRJEB30620
)
Key
 
Results
 
 
Performance
22
GenPIP
 
provides
 
41.6x
,
 
8.4x
,
 
and
 
1.4x
 
speedup
 
over
 
CPU,
 
GPU,
 
and
 
optimistic
 
PIM
Both
 
CP
 
and
 
ER
 
are
 
critical 
to
 
the
 
speedup
Key
 
Results
 
 
Energy
 
Efficiency
23
GenPIP
 
provides
 
32.8x
,
 
20.8x
,
 
and
 
1.37x
 
energy
 
savings
 
over
 
CPU,
 
GPU,
 
and
 
optimistic
 
PIM
ER is
 
especially critical 
to
 
the energy efficiency
More
 
in
 
the
 
Paper
Details
 
of
 
CP
 
and
 
ER
 
Detailed
 
GenPIP
 
architecture
 and
 
implementations
o
GenPIP
 
controller
o
Timely 
e
arly 
r
ejection 
implementation
o
In-
m
emory 
s
eeding 
accelerator
More
 
c
omparison points
Sensitivity
 
analysis 
of
 
the
 
chunks
 
used
 
for
 
ER
Area
 
and
 
power
 
analysis
24
https://arxiv.org/pdf/2209.08600.pdf
More
 
in
 
the
 
Paper
 
Details
 
of
 
CP
 
and
 
ER
Detailed
 
GenPIP
 
implementation
o
GenPIP
 
controller
o
E
arly 
r
ejection 
implementation
o
In-
m
emory 
s
eeding 
accelerator
 Results
 
of
 
applying CP and ER in CPU and GPU
Sensitivity
 
analysis 
on
 
the
 
number
 
of
 
sampled
 
chunks
 
used
 
for
 
ER
Area
 
and
 
power
 
analysis
25
Outline
Background and Motivation
GenPIP:
 
Tight
 
Integration of
 
Genome
 
Analysis
 
Steps
o
Chunk-based
 
Pipeline
 
(CP)
o
Early
 
Rejection
 
(ER)
 
GenPIP Implementation
Evaluation 
Conclusion
26
Conclusion
27
 
 
 
 
Goal
:
 
Tightly
 
integrate genome
 
analysis steps
 
to
 
reduce
 
the
 
data
movement
 
between
 
steps
 
and
 
eliminate
 
computation on useless
 
data
 
GenPIP
 
outperforms
 
state-of-the-art software & hardware solutions using
CPU
, 
GPU
, and 
optimistic
 
PIM
 
by
 
41.6×
,
 
8.4x
,
 
and
 
1.4x
,
 
respectively.
 
GenPIP
:
 
The
 
f
irst
 in-memory genome analysis accelerator 
that
 tightly
integrates 
genome
 
analysis
 
steps
GenPIP
 
has
 
two
 
key
 
techniques
o
A
 chunk-based pipeline
o
A
 
new early-rejection technique
 
 
Problem
:
 
The
 
genome
 
analysis pipeline
 
has
 
large
 
data
 
movement
between
 
genome
 
analysis
 
steps
 
and
 
a
 
significant
 
amount
 
of 
wasted
computation on
 
useless data
undefined
Haiyu Mao
,
 
Mohammed Alser
,
 
 Mohammad Sadrosadati
,
 Can Firtina
,
Akanksha
 
Baranwal
,
 
Damla Senol Cali
,
 Aditya Manglik
,
 Nour Almadhoun Alserr
,
 Onur Mutlu
GenPIP 
In-Memory Acceleration of Genome Analysis via
Tight Integration of Basecalling and Read Mapping
 
Backup
 
Slides
Genome
 
Sequencing
o
Sequencing
 
Technologies
o
Current
 
State
 
of
 
Sequencing
Genome
 
Analysis
 
Pipeline
o
Basecalling
o
Read
 
Quality
 
Control
o
Read
 
Mapping
Early
 
Rejection
o
ER
 
based
 
on
 
Quality
 
Scores
o
ER
 
based
 
on
 
Chunk
 
Mapping
GenPIP
 
Architecture
o
In-memory seeding
Number
 
of
 
chunks
 
for
 
ER
29
Genome Sequencing
30
Sample Collection
Preparation
Sequencing
Genome Sequence
Analysis
Large DNA
molecule
Chopped DNA
fragments
Sequenced
reads
Sequencing Technologies
31
Short reads: 
a few hundred base pairs
 
and
 
error rate of 
0.1%
Long reads: 
thousands to millions of base pairs
 
and
 
error rate of 
5–10%
 
Oxford Nanopore
(ONT)
 
PacBio
 
Illumina
Current State of Sequencing
*From NIH (
https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data
)
32
Current State of Sequencing
*From NIH (
https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data
)
33
Computation is a bottleneck!
Basecalling
34
 
I
n
p
u
t
r
a
w
s
i
g
n
a
l
c
h
u
n
k
s
L
o
n
g
 
R
e
a
d
s
R
a
w
 
S
i
g
n
a
l
s
A
 
0
.
1
C
 
0
.
6
G
 
0
.
2
T
 
0
.
1
 
Input:
 
Raw
 
signal
 
chunks
Process:
 
Translate raw
 
signals
 
to
 
bases
 
(i.e.,
 
A,
 
C,
 
G,
 
T)
 
and
calculate
 
each
 
base
 
quality
Output:
 
Assemble
 
chunks
 
into
 
a
 
long
 
read
Read
 
Quality
 
Control
35
 
Input:
 
Base
 
quality
 
scores
 
of
 
a
 
read
 
from
 
the
 
basecalling
 
step
Process:
 
Calculate
 
the
 
average
 
of
 
all
 
base
 
quality
 
scores
 
in
 
a
 
read
 
as
 
the
 
read
 
quality
score
,
 
and
 
compare
 
the
 
read
 
quality
 
score
 
to
 
the
 
threshold to
 
decide
 
whether this
read
 
is
 
low-quality
 
or
 
high-quality
Output:
 
High-quality
 
reads
 
(discard
 
low-quality
 
reads)
C
a
l
c
u
l
a
t
e
t
h
e
 
a
v
e
r
a
g
e
C
o
m
p
a
r
e
 
t
h
e
r
e
a
d
 
q
u
a
l
i
t
y
s
c
o
r
e
 
w
i
t
h
t
h
e
 
t
h
r
e
s
h
o
l
d
 
H
i
g
h
-
q
u
a
l
i
t
y
 
L
o
w
-
q
u
a
l
i
t
y
 
l
o
w
e
r
 
h
i
g
h
e
r
Discard
Read
 
Mapping
36
 
(
c
)
 
C
h
a
i
n
i
n
g
 
P
o
s
s
i
b
l
e
s
u
b
s
t
r
i
n
g
s
 
R
e
a
d
 
D
P
 
(
d
)
 
A
l
i
g
n
m
e
n
t
c
d
 
Input:
 
High-quality
 
read
 
passes
 
the
 
read
 
quality
 
control
 
step
Process:
o
Use
 
subsequence
 
in
 
a
 
read
 
to query
 
the
 
hash table
 
to
 
get
 
possible
 
match
 
locations
o
Identify the candidate regions and
 
output
 
the
 
chaining
 
score
o
Execute
 
the
 
alignment
 
step
 
if
 
there
 
is
 
a
 
chain
Output:
 
Mapping
 
information
H
i
g
h
-
q
u
a
l
i
t
y
 
L
o
n
g
 
R
e
a
d
s
ER
 
based
 
on
 
Chunk
 
Quality
 
Scores
Goal:
 
Accurately e
stimate the quality of the entire read by 
checking the
quality of a small number of sampled chunks
37
 
Observation
 
1:
 
The range of quality scores for the chunks extracted from high-quality
reads is greatly higher than that from low-quality reads
Observation
 
2:
 
A single chunk’s quality score is not enough to predict the read quality
score because there are many chunks whose quality scores are larger than 7
Observation
 
3:
 
Consecutive chunks’ quality scores are usually close to each other,
indicating that sampling consecutive chunks may not be representative enough to
estimate the quality of an entire read
Threshold:
 
7
Low-quality
 
read
High-quality
 
read
High-quality
Low-quality
S
ample a small number of 
non-consecutive 
chunks 
evenly
 
in
 
a
 
read
 
to
 
predict
 
the
 
read
 
quality
ER
 
based
 
on
 
Chunk
 
Mapping
Key
 
insight
 
of
 
ER
 
based
 
on
 
chunk
 
mapping:
 
A
 read probably cannot be
mapped to the reference genome 
if enough consecutive chunks in this
read cannot be mapped to the reference genome
38
1.
 
S
ample a small number
 
of
 
consecutive 
chunks 
in
 
a
 
read
2.
 
Merge
 
these
 
small
 
consecutive
 
chunks
 
into
 
a
 
big
 
chunk
3.
 
Map
 
this
 
big
 
chunk
 
to
 
the
 
reference genome
 
to
 
predict
 
whether the
read
 
can
 
be
 
mapped
 
or
 
not
Mapping
 
a
 
small
 
chunk
 
provides too many
 
possible mapping locations 
Architecture of
 
GenPIP
39
P
I
M
 
B
a
s
e
c
a
l
l
e
r
Q
u
a
l
i
t
y
 
S
c
o
r
e
C
a
l
c
u
l
a
t
i
o
n
f
o
r
 
E
a
c
h
 
B
a
s
e
b
C
h
u
n
k
B
u
f
f
e
r
4
D
P
 
U
n
i
t
s
(
D
y
n
a
m
i
c
P
r
o
g
r
a
m
m
i
n
g
)
R
e
a
d
Q
u
e
u
e
1
5
G
l
o
b
a
l
 
B
u
f
f
e
r
P
I
M
-
C
Q
S
PIM
 
chunk quality
score calculation
2
B
a
s
e
c
a
l
l
i
n
g
 
M
o
d
u
l
e
a
G
e
n
P
I
P
C
o
n
t
r
o
l
l
e
r
R
e
a
d
 
M
a
p
p
i
n
g
 
M
o
d
u
l
e
R
e
a
d
M
a
p
p
i
n
g
C
o
n
t
r
o
l
l
e
r
I
n
-
m
e
m
o
r
y
s
e
e
d
i
n
g
3
A
v
e
r
a
g
e
Q
u
a
l
i
t
y
 
S
c
o
r
e
C
a
l
c
u
l
a
t
o
r
B
u
f
f
e
r
c
In-memory Seeding
40
 
A
 
list
 
of
 
possible
locations
 
to
 
the
 
read
mapping
 
controller
 
Key
 
Results
 
 
Sensitivity Analysis
41
Early rejection based
 
on
 
the
 
chunk
 
quality
 
scores
 
technique
 
uses 
two
 and
five
 sampled chunks for the E. coli and human datasets, respectively.
Early rejection based
 
on
 
the
 
chunk
 
mapping
 
technique
 
uses 
five
 and 
three
sampled chunks for the E. coli and human datasets, respectively.
Early rejection based
 
on
 
the
 
chunk
 
quality
 
scores
Early rejection based
 
on
 
the
 
chunk
 
mapping
Slide Note

Good Morning, everyone, I am Haiyu Mao, a postdoc researcher in SAFARI research group at ETH. Today, I am gonna present our work, GenPIP.

Embed
Share

GenPIP is an innovative system that accelerates genome analysis through tight integration of basecalling and read mapping. By utilizing chunk-based pipelines and early rejection techniques, GenPIP optimizes data processing, reducing wasted computation and data movement. The system outperforms existing solutions by leveraging CPU, GPU, and optimistic PIM technologies, achieving significant performance gains. GenPIP offers a fast and energy-efficient approach to genome analysis, enhancing applications in personalized medicine, outbreak tracing, and evolutionary studies.

  • GenPIP
  • In-Memory Acceleration
  • Genome Analysis
  • Basecalling
  • Read Mapping

Uploaded on Sep 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. GenPIP In-Memory Acceleration of Genome Analysis via Tight Integration of Basecallingand Read Mapping Haiyu Mao, Mohammed Alser, Mohammad Sadrosadati,Can Firtina, Akanksha Baranwal, DamlaSenolCali,Aditya Manglik,Nour AlmadhounAlserr,Onur Mutlu

  2. Overview:GenomeAnalysis Genome analysis: Enables us to determine the order of the DNA sequence in an organism s genome o Plays an important role in Personalized medicine Outbreak tracing Understanding of evolution Modern genome sequencing machines extract smaller randomized fragments of the original DNA sequence, known as reads o Oxford Nanopore Technologies (ONT): Awidely-used sequencing technology Portable sequencing devices High-throughput Cheap ONT sequencing device [forbes.com] 2

  3. Overview:TwoLimitations Multiple steps in genome analysis A lot of wasted computation done on data that is later discovered to be useless Large data movement between multiple steps 3

  4. Overview:GenPIP GenPIP: A fast and energy-efficient in-memory acceleration system for the Genome analysis PIPeline viatight integration of genomeanalysissteps GenPIP hastwo key techniques o Chunk-based pipeline (CP) Provides fine-grained collaboration ofgenome analysissteps o Early rejection (ER) Timely stops the execution on useless data by predicting which reads will not be useful GenPIP outperforms state-of-the-art software & hardware solutions using CPU, GPU, and optimistic PIM by 41.6 ,8.4x,and1.4x,respectively. 4

  5. Outline Background and Motivation GenPIP:Tight Integration ofGenomeAnalysisSteps o Chunk-based Pipeline (CP) o Early Rejection (ER) GenPIP Implementation Evaluation Conclusion 5

  6. GenomeAnalysisPipeline Genome sequencing Storage Read Storage Compute G G A C A T G C G T T C G C G T T C T C A A A G T C A A A G 1. Basecalling Low-quality Compute Compute Reference genome mapping results High-quality Storage T A T G G A C T T T A G C A A AC Store G C G T T C Mapped Unmapped 2. ReadQuality Control A T G G A C A T G G A C 3. Read Mapping 6

  7. Limitation 1: Large Data Movement Using ahumandataset in [NC 19] as anexample: High-quality reads Mapped reads Raw Signals Read quality control Read mapping Basecalling Reads 3913 GB 382 GB 546 GB 437 GB Large data movement between genome analysis steps [NC'19]Rory Bowden, Robert W Davies, Andreas Heger, Alistair T Pagnamenta, Mariateresa de Cesare, Laura E Oikkonen, Duncan Parkes, Colin Freeman, Fatima Dhalla, Smita Y Patel, et al. Sequencing of human genomes with nanopore technology. Nature Communications, 2019. 7

  8. Limitation 2: WastedComputation Using ahumandataset in [NC 19]as anexample: 100% 79.5% 69.5% High-quality reads Mapped reads Raw Signals Read quality control Read mapping Basecalling Reads Low-quality reads 20.5% Unmapped reads 10% A considerableamount of computation on uselessdata due to o Low-quality reads o Unmapped reads [NC'19]Rory Bowden, Robert W Davies, Andreas Heger, Alistair T Pagnamenta, Mariateresa de Cesare, Laura E Oikkonen, Duncan Parkes, Colin Freeman, Fatima Dhalla, Smita Y Patel, et al. Sequencing of human genomes with nanopore technology. Nature Communications, 2019. 8

  9. State-of-the-art Works NVM-based PIM is anefficient technique to reducedata movementby processingdata usingornearmemory High-quality reads Mapped reads Raw Signals Read quality control Read mapping Basecalling Reads NVM-based PIM for dot-product operation [Helix, PACT 20] NVM-based PIM for search and addition [PARC,ASPDAC 20] o Reduce the data movement in a singlegenomeanalysis step o Exacerbate the data movement overhead between analysis steps No prior work tackles data movement between analysis steps and reduces useless computation 9

  10. GoalandOpportunities Goal: Efficiently accelerate the entire genome analysis pipeline while minimizing data movement and useless computation We perform a study to quantify potential performance benefits o Results arenormalizedtotheperformanceofGPU 12 9x Normalized Speedup 8 6.1x 2.7x 4 0 no data movement and no useless reads (ideal case) no data movement between the accelerators of analysis steps NVM-based PIM accelerators for separate basecalling and read mapping 10

  11. Outline Background and Motivation GenPIP:Tight Integration ofGenomeAnalysisSteps o Chunk-based Pipeline (CP) o Early Rejection (ER) GenPIP Implementation Evaluation Conclusion 11

  12. GenPIP Firstholistic in-memoryacceleratorfor the genome analysis pipeline, including basecalling, read quality control, and read mapping steps GenPIP hastwo key techniques o Chunk-based Pipeline (CP) Enablesfine-grained pipelining of genomeanalysissteps Processes reads at chunk granularity (i.e., a subsequence; 300 bases) o Early Rejection (ER) 12

  13. Chunk-basedPipeline(CP) CPincreases parallelism by overlapping the execution of different steps at chuckgranularity CPreduces intermediate databy computingondata assoonas data is generated CPprovides opportunities for ERby analyzingareadatchunkgranularity A readconsistsoffour chunks:C1, C2, C3, C4 Conventional QC: Quality control RM: Read mapping Assemble Pipeline QC Read Read Mapping Read Basecalling C1 Basecalling C4 Basecalling C2 Basecalling C3 Basecalling C4 Basecalling C1 Basecalling C2 Basecalling C3 Chunk-based Assemble Pipeline QC C3 QC C4 QC C1 QC C2 Saved cycles RM C2 RM Read RM C1 RM C3 RM C4 Time 13

  14. GenPIP Firstholistic in-memoryacceleratorfor the genome analysis pipeline, including basecalling, read quality control, and read mapping steps GenPIP hastwo key techniques o Chunk-based Pipeline (CP) Enablesfine-grained collaboration of genomeanalysissteps by processing reads at chunk granularity (i.e., a subsequence of a read, e.g., 300 bases) o Early Rejection (ER) Stops the executionon uselessreadsas earlyaspossible by usinga smallnumber ofchunkstopredict theusefulness ofaread 14

  15. EarlyRejection(ER) Predict andeliminate low-quality and unmapped reads from the genome analysis pipelineas earlyaspossible Pass Map Check the average quality of these chunks Basecall more chunks Execute the remaining computation in read mapping basecalled chunks so far Basecall a small number of chunks Basecall the remaining chunks Check the mapping score Stop analysis Fail Pass Fail Early-Rejection based on chunk quality scores (ER-QSR) o Predict low-quality reads using chunk quality scores Early-Rejection based on chunk mappingscores (ER-CMR) o Predict unmapped reads using chunk mapping scores 15

  16. ImplementationofCPandER CP and ER can be applied on different systems, e.g.,CPU,GPU, and PIM We implement CP and ER using PIM since PIM is more efficient to reduce the data movement between genome analysis steps We also apply CP and ER on CPU and GPU baselines and observe speedup and energy savings 16

  17. Outline Background and Motivation GenPIP:Tight Integration ofGenomeAnalysisSteps o Chunk-based Pipeline (CP) o Early Rejection (ER) GenPIP Implementation Evaluation Conclusion 17

  18. GenPIPImplementation Raw signals from the sequencing machine Signal chunk In-memory Read Mapping [PARC, ASPDAC 20] + Our design In-memory Basecaller [Helix, PACT 20] eDRAM Basecalled chunk Chunk Average GenPIP Calculator Quality score Chunk quality score Base quality score Read Mapping Controller PIM-CQS PIM chunk quality score calculation Chunk mapping score Read mapping result To storage ER Controller GenPIP Controller ER ER Basecalling Module Read Mapping Module https://arxiv.org/pdf/2209.08600.pdf 18

  19. GenPIPImplementation Raw signals from the sequencing machine Signal chunk In-memory Read Mapping Our design + [PARC, ASPDAC 20] In-memory Basecaller [Helix, PACT 20] Tightly integrating the genome analysis steps o Reduces data movement o Eliminates useless computation eDRAM Basecalled chunk Chunk Average GenPIP Calculator Quality score Chunk quality score Base quality score Read Mapping Controller PIM-CQS PIM chunk quality score calculation Chunk mapping score Read mapping result To storage ER Controller GenPIP Controller ER ER Basecalling Module Read Mapping Module 19

  20. Outline Background and Motivation GenPIP:Tight Integration ofGenomeAnalysisSteps o Chunk-based Pipeline (CP) o Early Rejection (ER) GenPIP Implementation Evaluation Conclusion 20

  21. Evaluation Methodology Performance, Area and Power Analysis: o SimulationviaVerilog HDL, NVSim[TCAD 12], andCACTI 6.5 [MICRO 07] o Seemethodology inthe paper for more Baselines: o CPU (Intel Xeon Gold 5118 CPU) o GPU (NVIDIA GeForce RTX 2080 Ti GPU) o Optimistic integration of two PIM accelerators (Helix [PACT 20]and PARC [ASP-DAC 20]) Assumesno data movement between steps Assumesintermediate data causes no overhead Datasets: o E. coli (http://lab.loman.net/2016/07/30/nano pore- r9- data- release/) o Human(https://www.ebi.ac.uk/ena/browser/view/PRJEB30620) 21

  22. KeyResultsPerformance CPU GPU Optimistic PIM GenPIP 50 NormalizedSpeedup 40 1.4x 30 8.4x 20 41.6x 10 0 E. coli Human GMEAN GenPIP provides 41.6x, 8.4x, and 1.4x speedupoverCPU,GPU, and optimistic PIM BothCPand ER are critical to the speedup 22

  23. KeyResultsEnergyEfficiency CPU GPU Optimistic PIM GenPIP 45 Normalized Energy Reduction 40 35 30 1.37x 25 20 20.8x 15 32.8x 10 5 0 E. coli Human GMEAN GenPIP provides 32.8x, 20.8x, and 1.37x energy savings overCPU,GPU, and optimistic PIM ER is especially critical to the energy efficiency 23

  24. MoreinthePaper Details ofCP and ER DetailedGenPIP architecture andimplementations o GenPIPcontroller o Timely early rejection implementation https://arxiv.org/pdf/2209.08600.pdf o In-memory seeding accelerator Morecomparison points Sensitivity analysis ofthe chunksused forER Area and power analysis 24

  25. MoreinthePaper Details ofCP and ER DetailedGenPIP implementation o GenPIP controller o Early rejection implementation o In-memory seeding accelerator Results of applying CP and ER in CPU and GPU Sensitivity analysis onthe number of sampled chunks used for ER Area and power analysis 25

  26. Outline Background and Motivation GenPIP:Tight Integration ofGenomeAnalysisSteps o Chunk-based Pipeline (CP) o Early Rejection (ER) GenPIP Implementation Evaluation Conclusion 26

  27. Conclusion Problem: Thegenomeanalysis pipelinehaslargedatamovement between genome analysisstepsandasignificantamountof wasted computation on useless data Goal:Tightly integrate genome analysis steps to reduce thedata movement between steps andeliminatecomputation on useless data GenPIP:Thefirst in-memory genome analysis accelerator that tightly integrates genomeanalysissteps GenPIPhastwokey techniques o A chunk-based pipeline o A new early-rejection technique GenPIPoutperforms state-of-the-art software & hardware solutions using CPU, GPU, and optimisticPIM by41.6 ,8.4x, and1.4x, respectively. 27

  28. GenPIP In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping Haiyu Mao, Mohammed Alser, Mohammad Sadrosadati, Can Firtina, Akanksha Baranwal, Damla Senol Cali, Aditya Manglik, Nour Almadhoun Alserr, Onur Mutlu

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#