Scaling Up and Modeling Integrated MPRAs
Chromosomally integrated MPRAs play a crucial role in functional characterization. Explore the work by Vikram Agarwal in the Ahituv-Shendure Functional Characterization Center. Learn about the innovative research in ENCODE 43/2/181.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Scaling Up and Modeling Scaling Up and Modeling Chromosomally Integrated MPRAs Chromosomally Integrated MPRAs Vikram Agarwal (Shendure laboratory) Ahituv-Shendure Functional Characterization Center ENCODE 4 3/2/18 1
Overarching questions: Overarching questions: Can we improve MPRA to be more robust and test more sequences in parallel? Can we develop models that better predict sequences that function as enhancers and their tissue-specific activity? Can we better predict the effect of nucleotide changes on enhancer activity? 2
Massively parallel reporter assays (MPRAs) & STARR-seq Link designed regulatory sequence to reporter and integrate into genome using lentivirus Measure expression of reporter using barcode relative to amount integrated into genome (mRNA:DNA ratio) STARR-seq, Arnold et al., 2013 3 Inoue et al., 2017
Expanding the number of sequences tested via lentiMPRA Test larger repertoire of sequences for enhancer activity Instead of designing barcode with enhancer, add barcodes w/ PCR This affords us a larger, 200nt enhancer to test Design ~19K sequences in total (9.5K x 2) on Agilent array, 200nt oligos to evaluate sequences with putative enhancer activity, including positive and negative control sequences 4
Choosing candidate enhancers Located all HepG2 DNase peaks in each cell line using UW ENCODE peak calls Retained subset that are non-promoter-overlapping (not in 1500nt centered at each gene s TSS) Count the number of TF binding sites overlapping each DHS 5
Choosing candidate enhancers 1 2 3 4 5 Counted # of overlapping TF binding sites, as ascertained by ENCODE ChIP- seq data from HepG2 cells Selected enhancers randomly across 5 bins of # of binding sites # of TF sites overlapping 200nt DNase peak 6
Choosing candidate enhancers HepG2 DNase sites not overlapping promoters 66,017 Counted # of overlapping TF binding sites, as ascertained by ENCODE ChIP- seq data from HepG2 cells Selected enhancers with range of TF binding sites Positive and negative HepG2 controls (from Inoue et al, Genome Research 2017) 9,172 Selected enhancers randomly across 5 bins of # of binding sites 100 Synthetic positive and negative HepG2 controls (from Smith et al, Nature Genetics 2013) All regions synthesized twice on Agilent array 100 Total probes designed 18,744 7
Oligo design & library prep for lenti-MPRA 5 tag sequence (15 nt) Enhancer sequences (200nt) 3 tag sequence (15 nt) Designed Agilent oligos 1st round PCR Minimal promoter Restriction sites Restriction sites Barcode (15nt) 2nd round PCR Clone into lentiviral plasmid GFP WPRE ARE Amplify & sequence to link barcode to enhancer (MiSeq) LTR ARE 8 LTR
Sequence RNA & DNA after transfection (in triplicate experiments) GFP WPRE ARE LTR Viral integration into human genome ARE LTR AAAA AAAA AAAA RNA: sequence barcodes (NextSeq) DNA: sequence barcodes (NextSeq) 9
Relationship between # TF binding sites and RNA/DNA ratio # of TF binding sites (binned according to # of overlapping TF sites) # of TF binding sites (binned in equally- sized bins according to Agilent design) 11
Differential motif analysis (what motifs are enriched for top 1000 most highly effective enhancers?) 12
Lasso regression model to explain enhancer activity Features considered in the model LS-GKM builds a gapped k-mer SVM model for each transcription factor trained on ChIP-seq data ENCODE: ChromHMM, DHS sites, Segway call sets Epigenomics roadmap: RNA-seq, methylation, CAGE, Histone mark CADD: GC, CpG content, conservation, motif overlap counts, CADD score 10-fold cross-validation to determine optimal lambda value For full technical details see method from Inoue et al, 2017 13
Selected features from lasso regression model Greater enhancer activity: P300 binding, SP1/GABPA, FOSL2/JUND, ELF1 binding Reduced enhancer activity: ZBTB33, REST/RCOR1, & RXRA binding: known to recruit histone deacetylases EZH2: component of PRC2 repressive complex, induces H3K27me3 14
Cross-validated lasso model using LS-GKM+ChIP+CADD features r2 = 0.4 15
Conclusions Optimized method successfully tests enhancer activity for ~9.5K enhancers in a reproducible manner Hypothesis confirmed that regions with higher density of TF binding sites tend to exhibit stronger enhancing activity Pulling out JunD/FOS (AP1 complex), HNF4A, and ELF5 as before, consistent portrait w/ previous results Evidence for epigenetic modifying complexes in regulating enhancer activity Model performs at r2 of 0.4, trying alternative models to improve performance Future plans: extend method to K562 and further expand the number of enhancers tested beyond 9.5K 16
Acknowledgements Nadav Ahituv Jay Shendure University of Washington University of California, San Francisco Beth Martin Martin Kircher Berlin Institute of Health Computational guidance Ajuni Sohota UCSF Experimental help Fumitaka Inoue UCSF Experimental help University of Washington Experimental guidance