Comprehensive Analysis of Somatic Mutations in eMERGE Datasets
This investigation delves into somatic mutations across normal tissues, revealing macroscopic clonal expansions. The study processed a large number of samples and identified candidate mutations, with characteristics and functional predictions detailed. Genes with the most somatic mutations are highlighted, shedding light on genetic variations with potential implications in health and disease.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Update on an investigation of somatic mutations in eMERGE datasets Ken Kaufman CCHMC 6-20-19
Somatic Mutations RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues Keren Yizhak, Fran ois Aguet, Jaegil Kim, et al Science 07 Jun 2019: Vol. 364, Issue 6444, eaaw0726
Mutations Germline Somatic +/- +/+ +/+ +/- +/- +/- +/+ +/+ +/- +/- +/- +/- +/- +/- 50% 25%
Somatic Mutation Pipeline Samtools pileup Ref and alt alleles detected Each base sequenced Calculate ratio alt/depth allele Filter Ratio 1% to 30% Amino-Acid altering Depth of 10 Not in a Duplicated regions
Samples processed eMERGE 3 set A 16,170 samples screened 801 candidate somatic mutations in 773 samples PGX ~10,000 samples processed Initially 2798 samples 4403 variations Filtered 555 candidate somatic mutations in 541 samples 66 samples with 2 or more candidates (5 highest) 58 candidates found in 2 or more samples (4 highest). 419 candidates found in 1 sample 252 of the 555 candidate somatic mutations are found in EXAC database MAF 0.009 to 8.2x10-6
Characteristics of Somatic Mutations 178 genes 488 type A variants 483 Non-syn 2 Ins 3 Del 61 type B (loss of function) 24 Stopgain 1 Stoploss 21 Frameshift 3 Init codon 12 Splicing SNV 522 Ins 2 Del 24
Functional Predictions N of 5 Predicted Damaging N 0 of 5 65 1 of 5 114 2 of 5 82 3 of 5 80 4 of 5 103 5 of 5 79
Genes with Most Somatic Mutations Gene RYR2 RYR1 SCN5A CACNA1S ABCC2 EGFR NTRK2 ARID5B DPYD ABCB1 CYP2C19 CYP4F12 SLCO1B3 HTR2A ABCA1 DRD2 CYP2C9 SLC22A8 ABCB11 Total 53 25 15 14 12 11 10 9 9 8 8 8 8 7 7 6 6 6 6 Type-A 43 23 11 13 10 11 9 9 6 6 7 8 8 6 6 6 6 6 6 Type-B 10 2 4 1 2 0 1 0 3 2 1 0 0 1 1 0 0 0 0 Contstrained (pLI) 1 2.72E-06 0.999188 1.76E-06 1.19E-22 0.999785 0.999744 0.999771 5.28E-06 0.777029 1.18E-06 2.81E-08 2.30E-17 0.729321 0.086307 0.659832 1.19E-08 6.51E-09 6.09E-11
GATK 150 555 75
Alt allele ratio Number of Variants Average Ratio
Validation Obtained DNA for 11 samples from Vanderbilt and Northwestern (Thank you Very Much!) Sanger Sequence PCR amplified product (with controls) Real-time PCR Digital droplet PCR
Validation Results Next Gen Seq Sanger Digital PCR Chr Position Source eMERGE ID Ref reads Alt reads % alt allele % alt allele % alt allele 7 1.51E+08 Vand 27200175 70 23 24.5 29.1 9 1.25E+08 Vand 27219105 48 19 28.4 41.2 19 15789125 Vand 27222305 534 90 14.4 20 7 87080965 NW 52001018 756 86 10.2 1.8 8.4 1 2.38E+08 NW 52001078 852 284 25 27.8 15 75042302 NW 52001101 491 108 18 18.2 7 87148711 NW 52001122 783 111 12.4 18.2 12.2 16 75512732 NW 52001157 42 19 29.9 42.9 7 55238099 NW 52001483 683 82 11 13.6 19 16034732 NW 52108536 651 63 8.8 12 7 55224337 NW 52108761 861 97 10.1 10.5
Validation PGX 9142 samples screened (6.1%) 555 candidate somatic mutations in 541 samples eMERGE 3 set A 16,170 samples screened (5.0%) 801 candidate somatic mutations in 773 samples eMERGE 3 set B In process downloading bam files from DNAnexus
Validation Strategy (iGENOMX Riptide) SNP Adapter Sequence biotinylated ddNTP
Strand displacing Extension SNP Adapter Barcode Random Sequence
Low cycle PCR SNP
Sequencing Samples from 96 to 960 Sequencers MiSeq 25M reads 96 samples 96 targets ~270 X coverage HiSeq 900M reads (3 Lane) 960 samples 960 targets ~100 X coverage
Sample Status Institution # samples Contact - PI MTA status Routing for Signature Routing for Signature External Review Processing External Review Notes CHOP 47 John Connolly Processing Columbia Harvard Mayo 65 109 96 Ali Gharavi Scott Weiss Iftikhar Kullo Processing Northwestern University of Washington /KP 118 Rex Chisholm Activated Processing Samples received 5.8.2019 124 Eric Larson Dan Roden/Sarah Bland Activated Vanderbilt 115 Activated Processing Total 674
Current Status Process remaining eMERGE 3 data set Obtain samples for validation Finalize validation strategy. Contact Ken Kaufman (Kenneth.Kaufman@cchmc.org) or Paul Gecaine (Paul.Gecaine@cchmc.org) to participate.
Acknowledgements eMERGE consortium Vanderbilt Northwestern University of Washington Baylor (Richard Gibbs group) CCHMC John Harley Scott Richards Paul Gecaine Beth Cobb Cindy Prows Bahram Namjou-Khales DNAnexus Andrew Carroll John Didion Contact Ken Kaufman (Kenneth.Kaufman@cchmc.org) or Paul Gecaine (Paul.Gecaine@cchmc.org) to participate
Strategy VCF files only have data where a variant was called. Bam files have data at every position sequenced. Ratio of Ref to Alternate allele skewed
Comparison other Programs Mosaic Hunter Tested against candidate samples 56 somatic candidates 3 overlapped Somaticsniper Normal vs tumor Failed in our application Most somatic mutation detection approaches require optimization for each data set.