Scaling Up and Modeling Integrated MPRAs

S
c
a
l
i
n
g
 
U
p
 
a
n
d
 
M
o
d
e
l
i
n
g
C
h
r
o
m
o
s
o
m
a
l
l
y
 
I
n
t
e
g
r
a
t
e
d
 
M
P
R
A
s
Vikram Agarwal (Shendure laboratory)
Ahituv-Shendure Functional Characterization Center
ENCODE 4
3/2/18
1
O
v
e
r
a
r
c
h
i
n
g
 
q
u
e
s
t
i
o
n
s
:
C
a
n
 
w
e
 
i
m
p
r
o
v
e
 
M
P
R
A
 
t
o
 
b
e
 
m
o
r
e
 
r
o
b
u
s
t
 
a
n
d
t
e
s
t
 
m
o
r
e
 
s
e
q
u
e
n
c
e
s
 
i
n
 
p
a
r
a
l
l
e
l
?
C
a
n
 
w
e
 
d
e
v
e
l
o
p
 
m
o
d
e
l
s
 
t
h
a
t
 
b
e
t
t
e
r
 
p
r
e
d
i
c
t
s
e
q
u
e
n
c
e
s
 
t
h
a
t
 
f
u
n
c
t
i
o
n
 
a
s
 
e
n
h
a
n
c
e
r
s
 
a
n
d
t
h
e
i
r
 
t
i
s
s
u
e
-
s
p
e
c
i
f
i
c
 
a
c
t
i
v
i
t
y
?
C
a
n
 
w
e
 
b
e
t
t
e
r
 
p
r
e
d
i
c
t
 
t
h
e
 
e
f
f
e
c
t
 
o
f
 
n
u
c
l
e
o
t
i
d
e
c
h
a
n
g
e
s
 
o
n
 
e
n
h
a
n
c
e
r
 
a
c
t
i
v
i
t
y
?
2
Massively parallel reporter assays (MPRAs) & STARR-seq
Inoue et al., 2017
Link designed regulatory sequence to
reporter and integrate into genome using
lentivirus
Measure expression of reporter using
barcode relative to amount integrated
into genome (mRNA:DNA ratio)
STARR-seq, Arnold et al., 2013
3
Expanding the number of sequences
tested via lentiMPRA
Test larger repertoire of sequences for enhancer activity
Instead of designing barcode with enhancer, add barcodes w/ PCR
This affords us a larger, 200nt enhancer to test
Design ~19K sequences in total (9.5K x 2) on Agilent array, 200nt oligos to
evaluate sequences with putative enhancer activity, including positive and
negative control sequences
4
Choosing candidate enhancers
Located all HepG2 DNase peaks in each cell line using UW ENCODE peak
calls
Retained subset that are non-promoter-overlapping (not in ±1500nt
centered at each gene’s TSS)
Count the number of TF binding sites overlapping each DHS
5
Choosing candidate enhancers
Counted # of overlapping TF binding
sites, as ascertained by ENCODE ChIP-
seq data from HepG2 cells
Selected enhancers randomly across 5
bins of # of binding sites
1
2
3
4
5
6
Choosing candidate enhancers
Counted # of overlapping TF binding
sites, as ascertained by ENCODE ChIP-
seq data from HepG2 cells
Selected enhancers randomly across 5
bins of # of binding sites
All regions synthesized twice on
Agilent array
7
Oligo design & library prep for lenti-MPRA
Enhancer sequences
(200nt)
3’ tag sequence
(15 nt)
5’ tag sequence
(15 nt)
Restriction
sites
Barcode
(15nt)
1
st
 round
PCR
Designed
Agilent oligos
Clone into
lentiviral
plasmid
Amplify &
sequence to
link barcode to
enhancer
(MiSeq)
GFP
ARE
LTR
WPRE
2
nd
 round
PCR
Restriction
sites
8
LTR
ARE
Sequence RNA & DNA after transfection
(in triplicate experiments)
Viral integration into
human genome
RNA: sequence
barcodes
(NextSeq)
DNA: sequence
barcodes
(NextSeq)
9
LTR
ARE
Replicates show general reproducibility
10
Relationship between # TF binding sites
and RNA/DNA  ratio
# of TF binding sites (binned in equally-
sized bins according to Agilent design)
# of TF binding sites (binned according
to # of overlapping TF sites)
11
Differential motif analysis (what motifs are enriched
for top 1000 most highly effective enhancers?)
12
Lasso regression model to explain enhancer activity
Features considered in the model
LS-GKM builds a gapped k-mer SVM model for each transcription factor trained on
ChIP-seq data
ENCODE: ChromHMM, DHS sites, Segway call sets
Epigenomics roadmap: RNA-seq, methylation, CAGE, Histone mark
CADD: GC, CpG content, conservation, motif overlap counts, CADD score
10-fold cross-validation to determine optimal lambda value
For full technical details see method from Inoue et al, 2017
13
Selected features from lasso regression model
Reduced enhancer activity
: ZBTB33, REST/RCOR1, & RXRA binding: known to recruit histone deacetylases
EZH2: component of PRC2 repressive complex, induces H3K27me3
Greater enhancer activity
: P300 binding, SP1/GABPA, FOSL2/JUND, ELF1 binding
14
Cross-validated lasso model
using LS-GKM+ChIP+CADD features
r
2
 = 0.4
15
Conclusions
Optimized method successfully tests enhancer activity for ~9.5K
enhancers in a reproducible manner
Hypothesis confirmed that regions with higher density of TF binding sites
tend to exhibit stronger enhancing activity
Pulling out JunD/FOS (AP1 complex), HNF4A, and ELF5 as before,
consistent portrait w/ previous results
Evidence for epigenetic modifying complexes in regulating enhancer
activity
Model performs at r
2
 of 0.4, trying alternative models to improve
performance
Future plans: extend method to K562 and further expand the number of
enhancers tested beyond 9.5K
16
Acknowledgements
Martin Kircher
Berlin Institute of Health
Computational guidance
Beth Martin
University of Washington
Experimental guidance
Nadav Ahituv
University of California, San Francisco
Fumitaka Inoue
UCSF
Experimental help
Jay Shendure
University of Washington
Ajuni Sohota
UCSF
Experimental help
Reproducibility among positive
 and negative controls (combining barcodes
18
Barcodes supporting each insert
(insert linked to barcode with at least 3
reads and >90% of the barcode mapping to
the same insert)
(insert linked to barcode with at least 3
reads and >50% of the barcode mapping to
the same insert)
19
Slide Note
Embed
Share

Chromosomally integrated MPRAs play a crucial role in functional characterization. Explore the work by Vikram Agarwal in the Ahituv-Shendure Functional Characterization Center. Learn about the innovative research in ENCODE 43/2/181.

  • Chromosomal Integration
  • MPRA
  • Functional Characterization
  • ENCODE

Uploaded on Mar 07, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Scaling Up and Modeling Scaling Up and Modeling Chromosomally Integrated MPRAs Chromosomally Integrated MPRAs Vikram Agarwal (Shendure laboratory) Ahituv-Shendure Functional Characterization Center ENCODE 4 3/2/18 1

  2. Overarching questions: Overarching questions: Can we improve MPRA to be more robust and test more sequences in parallel? Can we develop models that better predict sequences that function as enhancers and their tissue-specific activity? Can we better predict the effect of nucleotide changes on enhancer activity? 2

  3. Massively parallel reporter assays (MPRAs) & STARR-seq Link designed regulatory sequence to reporter and integrate into genome using lentivirus Measure expression of reporter using barcode relative to amount integrated into genome (mRNA:DNA ratio) STARR-seq, Arnold et al., 2013 3 Inoue et al., 2017

  4. Expanding the number of sequences tested via lentiMPRA Test larger repertoire of sequences for enhancer activity Instead of designing barcode with enhancer, add barcodes w/ PCR This affords us a larger, 200nt enhancer to test Design ~19K sequences in total (9.5K x 2) on Agilent array, 200nt oligos to evaluate sequences with putative enhancer activity, including positive and negative control sequences 4

  5. Choosing candidate enhancers Located all HepG2 DNase peaks in each cell line using UW ENCODE peak calls Retained subset that are non-promoter-overlapping (not in 1500nt centered at each gene s TSS) Count the number of TF binding sites overlapping each DHS 5

  6. Choosing candidate enhancers 1 2 3 4 5 Counted # of overlapping TF binding sites, as ascertained by ENCODE ChIP- seq data from HepG2 cells Selected enhancers randomly across 5 bins of # of binding sites # of TF sites overlapping 200nt DNase peak 6

  7. Choosing candidate enhancers HepG2 DNase sites not overlapping promoters 66,017 Counted # of overlapping TF binding sites, as ascertained by ENCODE ChIP- seq data from HepG2 cells Selected enhancers with range of TF binding sites Positive and negative HepG2 controls (from Inoue et al, Genome Research 2017) 9,172 Selected enhancers randomly across 5 bins of # of binding sites 100 Synthetic positive and negative HepG2 controls (from Smith et al, Nature Genetics 2013) All regions synthesized twice on Agilent array 100 Total probes designed 18,744 7

  8. Oligo design & library prep for lenti-MPRA 5 tag sequence (15 nt) Enhancer sequences (200nt) 3 tag sequence (15 nt) Designed Agilent oligos 1st round PCR Minimal promoter Restriction sites Restriction sites Barcode (15nt) 2nd round PCR Clone into lentiviral plasmid GFP WPRE ARE Amplify & sequence to link barcode to enhancer (MiSeq) LTR ARE 8 LTR

  9. Sequence RNA & DNA after transfection (in triplicate experiments) GFP WPRE ARE LTR Viral integration into human genome ARE LTR AAAA AAAA AAAA RNA: sequence barcodes (NextSeq) DNA: sequence barcodes (NextSeq) 9

  10. Replicates show general reproducibility 10

  11. Relationship between # TF binding sites and RNA/DNA ratio # of TF binding sites (binned according to # of overlapping TF sites) # of TF binding sites (binned in equally- sized bins according to Agilent design) 11

  12. Differential motif analysis (what motifs are enriched for top 1000 most highly effective enhancers?) 12

  13. Lasso regression model to explain enhancer activity Features considered in the model LS-GKM builds a gapped k-mer SVM model for each transcription factor trained on ChIP-seq data ENCODE: ChromHMM, DHS sites, Segway call sets Epigenomics roadmap: RNA-seq, methylation, CAGE, Histone mark CADD: GC, CpG content, conservation, motif overlap counts, CADD score 10-fold cross-validation to determine optimal lambda value For full technical details see method from Inoue et al, 2017 13

  14. Selected features from lasso regression model Greater enhancer activity: P300 binding, SP1/GABPA, FOSL2/JUND, ELF1 binding Reduced enhancer activity: ZBTB33, REST/RCOR1, & RXRA binding: known to recruit histone deacetylases EZH2: component of PRC2 repressive complex, induces H3K27me3 14

  15. Cross-validated lasso model using LS-GKM+ChIP+CADD features r2 = 0.4 15

  16. Conclusions Optimized method successfully tests enhancer activity for ~9.5K enhancers in a reproducible manner Hypothesis confirmed that regions with higher density of TF binding sites tend to exhibit stronger enhancing activity Pulling out JunD/FOS (AP1 complex), HNF4A, and ELF5 as before, consistent portrait w/ previous results Evidence for epigenetic modifying complexes in regulating enhancer activity Model performs at r2 of 0.4, trying alternative models to improve performance Future plans: extend method to K562 and further expand the number of enhancers tested beyond 9.5K 16

  17. Acknowledgements Nadav Ahituv Jay Shendure University of Washington University of California, San Francisco Beth Martin Martin Kircher Berlin Institute of Health Computational guidance Ajuni Sohota UCSF Experimental help Fumitaka Inoue UCSF Experimental help University of Washington Experimental guidance

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#