Insights into Gene Regulation through Genomic Data Analysis

 
Understanding Gene Regulation
Through Integrated Analysis of Genomic
Data
 
 
G
u
o
-
C
h
e
n
g
 
Y
u
a
n
 
Department of Biostatistics and Computational Biology
Dana-Farber Cancer Institute
Harvard School of Public Health
 
Faculty Workshop, July 23
rd
, 2014
 
Biology used to be about memorizing
terms and facts
 
Genome sequencing has digitized biology
 
aggcctttgttgttggcagattgctagggtctgaatgtttatgcccctg
tgaaatttctttgttgaaatcttcacccctaaggtaatgctattagaagg
tgggaaccttagaataattaggtgatggggacagagccctcatgaagggg
atcagtgcccttataaaagaaatctgagagagaccctttgccacttctgc
catgtgggttagagtgagaagaaggttatttacgagaaagtagcccttac
tagacgctgaatcttctggtgccttgatcttagactcaccagctttcaga
actgtaagaaataaatttctagtgtttacaagccacccagcctatggtat
tttgttatagcatctggaatggactaagacacagaacaagataatgggtg
gatatgctaaactttgtatatacacatgtccatttatatttccatatgtc
tccatctgttatctatatcaagctaaacatgagttcatattgatgtttcc
aattccaattgttacaaaatggatcatcaccttgtttttctgtaatcctc
tattcagtgaaaaaccttgctcccatactatgacatccatttatttaatt
gttcaatttcattatatatgtacagcaatatccaaattaataacatgtac
ccctgtggacatgattatgtgaactagagtatagggcttatAAATTAAAA
AAATTTAtttttattttggaaaatgcatataacaaaatgtggcattttaa
tgatttttaagggtaaaatttagtgacattaattatattactaacgttgt
acagctatcattactatctactttgaaaatacttttaagaacccaaacag
aaaatccatacccactaagcaataaccctattgccccctcctttcagccc
ttggcaatgaccattgtacttttagtctgtatgagtttgccttttctgga
tatttcattttagtgaaatcatagaatatttgctcttttgtgtgtggatt
atttcacttatttttaaagtttattcatttgtaacatgtattaaaacttt
attcctttttttggttgaataatattctattatgtgtatataacacattt
tgtttattcattcatttgttggtgaatacttgggttatttccaccttcta
gaaattgtgagtcatgctgcagtggacataggcatacaattatctgagtt
tctactttctattgttttggatatataatcagaattttaattgctggtgc
atatggtaattttatgtatactaatttgaggagaatccatactgtttttc
tcaatggctacaccattttacattcccaccagcaatgcattatggggcaa
tttatccacaccaacagcaacacttattattttctaggtttttttatctt
tttattttattaatgtttatcctaacagatatgaaataatatttcattgt
gattttgatttacatgctaatgattagtgatgttgaacagtatttcatgt
gcttatgggctatcttgtatcttttttagataaatgtctatttaaatcct
ttgtttatttttgagctgaaatgtttagtttttgtggagttgtgggaatt
Variation of genetic information may
predict disease risk
wikipedia
 
Most DNA is not transcribed
Most transcripts are noncoding
Most proteins has unknown functions
Courtesy of National Health Museum
2007
 
The human genome encodes the
blueprint of life, but the function of
the vast majority of its nearly three
billion bases is unknown. The
Encyclopedia of DNA Elements
(ENCODE) project has … These
data enabled us to assign
biochemical functions for 80%
of the genome
, in particular
outside of the well-studied protein-
coding regions. Many discovered
candidate 
regulatory elements
are physically associated with one
another and with expressed genes
… The newly identified elements
also show a statistical
correspondence to sequence
variants linked to human disease,
and can thereby guide
interpretation of this variation…
 
Courtesy of Broad Institute
Quantifying cross cell-type plasticity
H3K27me3
 
Highly Plastic Regions (HPR): the  top 1% with highest plastic score.
Lowly Plastic Regions (LPR): the  bottom 1% with lowest plastic score.
HPRs are associated with regulatory
regions
Chromatin plasticity is related to DNA
sequence
 
A pipeline to identify regulatory TFs
 
Pinello
, 
PNAS. 2014 Jan 21;111(3):E344-53
Example: PAX5 in GM12878
 
ChIPseq confirms colocalization between Pax5 and
H3K27me3 in GM12878
 
Haystack is (almost) available!
 
INPUT: Aligned reads from ChIP-seq (.bam files)
 
ONE COMMAND ONLY
:
haystack_pipeline my_bam_folder hg19
 
OUTPUTS:
Highly plastic regions
Tracks normalized for IGV or Genome Browser
List of candidate regulatory TFs.
 
 
 
Take home message
 
We shouldn't just focus on a snapshot of the
histone patterns and try to interpret what
they all mean. Dynamic change is the key to
understand biological function.
Conclusions
 
Biology has entered a data-rich era.
“All models are wrong; but some are useful.”
         
---- George E. P. Box
 
Acknowledgement
 
Our group
 
Luca Pinello
Kimberly Glass
Eugenio Marco
Jialiang Huang
 
 
NIH, Barr Award, Milton
Foundation, HSPH CIF
 
Stuart Orkin
 
Jian Xu
Zhen Shao
Dan Bauer
Slide Note
Embed
Share

Explore the intricate world of gene regulation and genomic data analysis through a comprehensive study presented at a faculty workshop in 2014. Delve into the evolution and classification of biology, the impact of genome sequencing on digitizing biology, and the potential of genetic information variations in predicting disease risks. Gain knowledge about the uncharted territories of the human genome and the significant findings by the ENCODE project in unraveling the mysteries of DNA elements and regulatory elements.

  • Gene regulation
  • Genomic data analysis
  • Biology evolution
  • Genome sequencing
  • Disease risk prediction

Uploaded on Sep 21, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Understanding Gene Regulation Through Integrated Analysis of Genomic Data Guo Guo- -Cheng Yuan Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute Harvard School of Public Health Faculty Workshop, July 23rd, 2014

  2. Biology used to be about memorizing terms and facts Category Human Zebrafish Domain Eukarya Eukarya Kingdom Animalia Animalia Phylum Chordata Chordata Class Mammalia Actinopterygii Order Primates Cypriniformes Family Hominidae Cyprinidae Genus Homo Danio Species H. Sapiens D. Rerio

  3. Genome sequencing has digitized biology aggcctttgttgttggcagattgctagggtctgaatgtttatgcccctg tgaaatttctttgttgaaatcttcacccctaaggtaatgctattagaagg tgggaaccttagaataattaggtgatggggacagagccctcatgaagggg atcagtgcccttataaaagaaatctgagagagaccctttgccacttctgc catgtgggttagagtgagaagaaggttatttacgagaaagtagcccttac tagacgctgaatcttctggtgccttgatcttagactcaccagctttcaga actgtaagaaataaatttctagtgtttacaagccacccagcctatggtat tttgttatagcatctggaatggactaagacacagaacaagataatgggtg gatatgctaaactttgtatatacacatgtccatttatatttccatatgtc tccatctgttatctatatcaagctaaacatgagttcatattgatgtttcc aattccaattgttacaaaatggatcatcaccttgtttttctgtaatcctc tattcagtgaaaaaccttgctcccatactatgacatccatttatttaatt gttcaatttcattatatatgtacagcaatatccaaattaataacatgtac ccctgtggacatgattatgtgaactagagtatagggcttatAAATTAAAA AAATTTAtttttattttggaaaatgcatataacaaaatgtggcattttaa

  4. Variation of genetic information may predict disease risk wikipedia

  5. Most DNA is not transcribed Most transcripts are noncoding Most proteins has unknown functions Courtesy of National Health Museum

  6. The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein- coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation 2007 2012 2012 2012

  7. Courtesy of Broad Institute

  8. Quantifying cross cell-type plasticity H3K27me3 variance mean = Score / Plasticity Variance Mean Highly Plastic Regions (HPR): the top 1% with highest plastic score. Lowly Plastic Regions (LPR): the bottom 1% with lowest plastic score.

  9. HPRs are associated with regulatory regions

  10. Chromatin plasticity is related to DNA sequence

  11. A pipeline to identify regulatory TFs Pinello, PNAS. 2014 Jan 21;111(3):E344-53

  12. Example: PAX5 in GM12878 1. Motif Enrichment 3. Centralization PAX5 is one of the most enriched motifs in GM12878 specific MPRs Enrichment Score 2. Coordinated Expression (z-score) PAX5 Targeted HPR Genes -2KB MPR_Center 2KB GM12878 PAX5

  13. ChIPseq confirms colocalization between Pax5 and H3K27me3 in GM12878 HPR Center HPR Center -2KB -2KB -2KB -2KB

  14. Haystack is (almost) available! INPUT: Aligned reads from ChIP-seq (.bam files) ONE COMMAND ONLY: haystack_pipeline my_bam_folder hg19 OUTPUTS: Highly plastic regions Tracks normalized for IGV or Genome Browser List of candidate regulatory TFs.

  15. Take home message We shouldn't just focus on a snapshot of the histone patterns and try to interpret what they all mean. Dynamic change is the key to understand biological function.

  16. Conclusions Biology has entered a data-rich era. All models are wrong; but some are useful. ---- George E. P. Box

  17. Acknowledgement Our group Stuart Orkin Luca Pinello Kimberly Glass Eugenio Marco Jialiang Huang Jian Xu Zhen Shao Dan Bauer NIH, Barr Award, Milton Foundation, HSPH CIF

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#