GenPIP: In-Memory Acceleration of Genome Analysis
GenPIP is an innovative system that accelerates genome analysis through tight integration of basecalling and read mapping. By utilizing chunk-based pipelines and early rejection techniques, GenPIP optimizes data processing, reducing wasted computation and data movement. The system outperforms existing solutions by leveraging CPU, GPU, and optimistic PIM technologies, achieving significant performance gains. GenPIP offers a fast and energy-efficient approach to genome analysis, enhancing applications in personalized medicine, outbreak tracing, and evolutionary studies.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
GenPIP In-Memory Acceleration of Genome Analysis via Tight Integration of Basecallingand Read Mapping Haiyu Mao, Mohammed Alser, Mohammad Sadrosadati,Can Firtina, Akanksha Baranwal, DamlaSenolCali,Aditya Manglik,Nour AlmadhounAlserr,Onur Mutlu
Overview:GenomeAnalysis Genome analysis: Enables us to determine the order of the DNA sequence in an organism s genome o Plays an important role in Personalized medicine Outbreak tracing Understanding of evolution Modern genome sequencing machines extract smaller randomized fragments of the original DNA sequence, known as reads o Oxford Nanopore Technologies (ONT): Awidely-used sequencing technology Portable sequencing devices High-throughput Cheap ONT sequencing device [forbes.com] 2
Overview:TwoLimitations Multiple steps in genome analysis A lot of wasted computation done on data that is later discovered to be useless Large data movement between multiple steps 3
Overview:GenPIP GenPIP: A fast and energy-efficient in-memory acceleration system for the Genome analysis PIPeline viatight integration of genomeanalysissteps GenPIP hastwo key techniques o Chunk-based pipeline (CP) Provides fine-grained collaboration ofgenome analysissteps o Early rejection (ER) Timely stops the execution on useless data by predicting which reads will not be useful GenPIP outperforms state-of-the-art software & hardware solutions using CPU, GPU, and optimistic PIM by 41.6 ,8.4x,and1.4x,respectively. 4
Outline Background and Motivation GenPIP:Tight Integration ofGenomeAnalysisSteps o Chunk-based Pipeline (CP) o Early Rejection (ER) GenPIP Implementation Evaluation Conclusion 5
GenomeAnalysisPipeline Genome sequencing Storage Read Storage Compute G G A C A T G C G T T C G C G T T C T C A A A G T C A A A G 1. Basecalling Low-quality Compute Compute Reference genome mapping results High-quality Storage T A T G G A C T T T A G C A A AC Store G C G T T C Mapped Unmapped 2. ReadQuality Control A T G G A C A T G G A C 3. Read Mapping 6
Limitation 1: Large Data Movement Using ahumandataset in [NC 19] as anexample: High-quality reads Mapped reads Raw Signals Read quality control Read mapping Basecalling Reads 3913 GB 382 GB 546 GB 437 GB Large data movement between genome analysis steps [NC'19]Rory Bowden, Robert W Davies, Andreas Heger, Alistair T Pagnamenta, Mariateresa de Cesare, Laura E Oikkonen, Duncan Parkes, Colin Freeman, Fatima Dhalla, Smita Y Patel, et al. Sequencing of human genomes with nanopore technology. Nature Communications, 2019. 7
Limitation 2: WastedComputation Using ahumandataset in [NC 19]as anexample: 100% 79.5% 69.5% High-quality reads Mapped reads Raw Signals Read quality control Read mapping Basecalling Reads Low-quality reads 20.5% Unmapped reads 10% A considerableamount of computation on uselessdata due to o Low-quality reads o Unmapped reads [NC'19]Rory Bowden, Robert W Davies, Andreas Heger, Alistair T Pagnamenta, Mariateresa de Cesare, Laura E Oikkonen, Duncan Parkes, Colin Freeman, Fatima Dhalla, Smita Y Patel, et al. Sequencing of human genomes with nanopore technology. Nature Communications, 2019. 8
State-of-the-art Works NVM-based PIM is anefficient technique to reducedata movementby processingdata usingornearmemory High-quality reads Mapped reads Raw Signals Read quality control Read mapping Basecalling Reads NVM-based PIM for dot-product operation [Helix, PACT 20] NVM-based PIM for search and addition [PARC,ASPDAC 20] o Reduce the data movement in a singlegenomeanalysis step o Exacerbate the data movement overhead between analysis steps No prior work tackles data movement between analysis steps and reduces useless computation 9
GoalandOpportunities Goal: Efficiently accelerate the entire genome analysis pipeline while minimizing data movement and useless computation We perform a study to quantify potential performance benefits o Results arenormalizedtotheperformanceofGPU 12 9x Normalized Speedup 8 6.1x 2.7x 4 0 no data movement and no useless reads (ideal case) no data movement between the accelerators of analysis steps NVM-based PIM accelerators for separate basecalling and read mapping 10
Outline Background and Motivation GenPIP:Tight Integration ofGenomeAnalysisSteps o Chunk-based Pipeline (CP) o Early Rejection (ER) GenPIP Implementation Evaluation Conclusion 11
GenPIP Firstholistic in-memoryacceleratorfor the genome analysis pipeline, including basecalling, read quality control, and read mapping steps GenPIP hastwo key techniques o Chunk-based Pipeline (CP) Enablesfine-grained pipelining of genomeanalysissteps Processes reads at chunk granularity (i.e., a subsequence; 300 bases) o Early Rejection (ER) 12
Chunk-basedPipeline(CP) CPincreases parallelism by overlapping the execution of different steps at chuckgranularity CPreduces intermediate databy computingondata assoonas data is generated CPprovides opportunities for ERby analyzingareadatchunkgranularity A readconsistsoffour chunks:C1, C2, C3, C4 Conventional QC: Quality control RM: Read mapping Assemble Pipeline QC Read Read Mapping Read Basecalling C1 Basecalling C4 Basecalling C2 Basecalling C3 Basecalling C4 Basecalling C1 Basecalling C2 Basecalling C3 Chunk-based Assemble Pipeline QC C3 QC C4 QC C1 QC C2 Saved cycles RM C2 RM Read RM C1 RM C3 RM C4 Time 13
GenPIP Firstholistic in-memoryacceleratorfor the genome analysis pipeline, including basecalling, read quality control, and read mapping steps GenPIP hastwo key techniques o Chunk-based Pipeline (CP) Enablesfine-grained collaboration of genomeanalysissteps by processing reads at chunk granularity (i.e., a subsequence of a read, e.g., 300 bases) o Early Rejection (ER) Stops the executionon uselessreadsas earlyaspossible by usinga smallnumber ofchunkstopredict theusefulness ofaread 14
EarlyRejection(ER) Predict andeliminate low-quality and unmapped reads from the genome analysis pipelineas earlyaspossible Pass Map Check the average quality of these chunks Basecall more chunks Execute the remaining computation in read mapping basecalled chunks so far Basecall a small number of chunks Basecall the remaining chunks Check the mapping score Stop analysis Fail Pass Fail Early-Rejection based on chunk quality scores (ER-QSR) o Predict low-quality reads using chunk quality scores Early-Rejection based on chunk mappingscores (ER-CMR) o Predict unmapped reads using chunk mapping scores 15
ImplementationofCPandER CP and ER can be applied on different systems, e.g.,CPU,GPU, and PIM We implement CP and ER using PIM since PIM is more efficient to reduce the data movement between genome analysis steps We also apply CP and ER on CPU and GPU baselines and observe speedup and energy savings 16
Outline Background and Motivation GenPIP:Tight Integration ofGenomeAnalysisSteps o Chunk-based Pipeline (CP) o Early Rejection (ER) GenPIP Implementation Evaluation Conclusion 17
GenPIPImplementation Raw signals from the sequencing machine Signal chunk In-memory Read Mapping [PARC, ASPDAC 20] + Our design In-memory Basecaller [Helix, PACT 20] eDRAM Basecalled chunk Chunk Average GenPIP Calculator Quality score Chunk quality score Base quality score Read Mapping Controller PIM-CQS PIM chunk quality score calculation Chunk mapping score Read mapping result To storage ER Controller GenPIP Controller ER ER Basecalling Module Read Mapping Module https://arxiv.org/pdf/2209.08600.pdf 18
GenPIPImplementation Raw signals from the sequencing machine Signal chunk In-memory Read Mapping Our design + [PARC, ASPDAC 20] In-memory Basecaller [Helix, PACT 20] Tightly integrating the genome analysis steps o Reduces data movement o Eliminates useless computation eDRAM Basecalled chunk Chunk Average GenPIP Calculator Quality score Chunk quality score Base quality score Read Mapping Controller PIM-CQS PIM chunk quality score calculation Chunk mapping score Read mapping result To storage ER Controller GenPIP Controller ER ER Basecalling Module Read Mapping Module 19
Outline Background and Motivation GenPIP:Tight Integration ofGenomeAnalysisSteps o Chunk-based Pipeline (CP) o Early Rejection (ER) GenPIP Implementation Evaluation Conclusion 20
Evaluation Methodology Performance, Area and Power Analysis: o SimulationviaVerilog HDL, NVSim[TCAD 12], andCACTI 6.5 [MICRO 07] o Seemethodology inthe paper for more Baselines: o CPU (Intel Xeon Gold 5118 CPU) o GPU (NVIDIA GeForce RTX 2080 Ti GPU) o Optimistic integration of two PIM accelerators (Helix [PACT 20]and PARC [ASP-DAC 20]) Assumesno data movement between steps Assumesintermediate data causes no overhead Datasets: o E. coli (http://lab.loman.net/2016/07/30/nano pore- r9- data- release/) o Human(https://www.ebi.ac.uk/ena/browser/view/PRJEB30620) 21
KeyResultsPerformance CPU GPU Optimistic PIM GenPIP 50 NormalizedSpeedup 40 1.4x 30 8.4x 20 41.6x 10 0 E. coli Human GMEAN GenPIP provides 41.6x, 8.4x, and 1.4x speedupoverCPU,GPU, and optimistic PIM BothCPand ER are critical to the speedup 22
KeyResultsEnergyEfficiency CPU GPU Optimistic PIM GenPIP 45 Normalized Energy Reduction 40 35 30 1.37x 25 20 20.8x 15 32.8x 10 5 0 E. coli Human GMEAN GenPIP provides 32.8x, 20.8x, and 1.37x energy savings overCPU,GPU, and optimistic PIM ER is especially critical to the energy efficiency 23
MoreinthePaper Details ofCP and ER DetailedGenPIP architecture andimplementations o GenPIPcontroller o Timely early rejection implementation https://arxiv.org/pdf/2209.08600.pdf o In-memory seeding accelerator Morecomparison points Sensitivity analysis ofthe chunksused forER Area and power analysis 24
MoreinthePaper Details ofCP and ER DetailedGenPIP implementation o GenPIP controller o Early rejection implementation o In-memory seeding accelerator Results of applying CP and ER in CPU and GPU Sensitivity analysis onthe number of sampled chunks used for ER Area and power analysis 25
Outline Background and Motivation GenPIP:Tight Integration ofGenomeAnalysisSteps o Chunk-based Pipeline (CP) o Early Rejection (ER) GenPIP Implementation Evaluation Conclusion 26
Conclusion Problem: Thegenomeanalysis pipelinehaslargedatamovement between genome analysisstepsandasignificantamountof wasted computation on useless data Goal:Tightly integrate genome analysis steps to reduce thedata movement between steps andeliminatecomputation on useless data GenPIP:Thefirst in-memory genome analysis accelerator that tightly integrates genomeanalysissteps GenPIPhastwokey techniques o A chunk-based pipeline o A new early-rejection technique GenPIPoutperforms state-of-the-art software & hardware solutions using CPU, GPU, and optimisticPIM by41.6 ,8.4x, and1.4x, respectively. 27
GenPIP In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping Haiyu Mao, Mohammed Alser, Mohammad Sadrosadati, Can Firtina, Akanksha Baranwal, Damla Senol Cali, Aditya Manglik, Nour Almadhoun Alserr, Onur Mutlu