Bioinformatics for Genomics Lecture Series 2022 Overview

 
Genetics and Genome Evolution (GGE)
Bioinformatics for Genomics
Lecture series 2022
 
Sven Bergmann
Department of Computational Biology
(Sven.Bergmann@unil.ch)
 
Bioinformatics for Genomics
Overview / Schedule
 
28 March 2022, 08:00-9:15: Lecture 1: "RNA-seq & DE" (Sven)
28 March 2022, 09:30-10:30: Tutorial 1: "RNA-seq & DE" (Anneke)
28 March 2022, 10:45-12:00: Lecture 2: "Clustering" (Sven)
29 March 2022, 08:00-9:00: Tutorial 2: "Clustering" (Alex)
29 March 2022, 09:15-10:45: Lecture 3: "More seq-data" (Sven)
29 March 2022, 11:00-12:00: Tutorial 3: "More seq-data" (Daniel)
4 April 2022, 14:00-18:00: ”Flipped classroom discussing exercises”
 
5 April 2022, 8:00-9:30: Lecture 4: "Biological Networks" (Sven)
5 April 2022, 10:00-11:30: Tutorial 4: "Biological Networks" (Daniel)
6 April 2021, 8:00-9:30: Lecture 5: "Advanced clustering" (Sven)
6 April 2021, 10:00-11:30: Final Session: "Wrap-up & feedback" (all)
 
What will you learn?
 
Analysis of gene expression data
What information do you get from 
RNA-seq
?
How to do a simple 
differential expression analysis
?
How to correct for 
multiple hypotheses testing
?
How do use some standard tools for large-scale data analysis
(PCA, SVD, clusteri
ng)?
How to perform 
functional enrichment analysis 
(given known
gene sets)?
Analysis of epigenetic data and integrative
analysis
What information do you get from 
CHIP-seq 
(and similar
techniques)?
What do you learn from 
HiC data 
on chromatin structure?
How to 
perform integrative analysis 
with gene expression data?
 
GWAS
Genetic
Risk
Score
 
PCA
Network
 
Module on GWAS
 
Genetics and Genome Evolution (GGE)
Bioinformatics for Genomics
Lecture 1:
RNA-seq & DE
 
Sven Bergmann
Department of Computational Biology
(Sven.Bergmann@unil.ch)
What is gene expression?
 
Where is gene expression?
 
How to measure gene expression?
 
 
Northern Blot:
  
Single genes
 
RT-PCR:
     
Multiple genes
 
Microarrays:
   
Whole genomes
 
RNA-seq:
    
Whole genomes+
 
Microarrays
 
Microarrays
 
RNA seq
 
RNA
 
quantification
 using
next generation 
seq
uencing (NGS)
 
NGS platforms
 
Oxford Nanopore
:
direct, electronic
analysis of single molecules
The future?
 
Illumina
:
sequence by synthesis
market leader
 
Illumina 
products:
 
Illumina 
procedure:
 
Three basic steps:
1.
 Amplify
2.
 Sequence
3.
 Analyze
 
Illumina 
amplification:
 
1.
The process begins with 
purified DNA
 (cDNA when analyzing RNA).
 
2.
The DNA gets chopped up into smaller pieces and given 
adapters
, indices,
and other kinds of molecular modifications that act as reference points
during amplification, sequencing, and analysis.
 
3.
The modified DNA is loaded onto a specialized chip (“flow cell”
) where
amplification and sequencing will take place.
 
4.
Along the bottom of the chip are hundreds of thousands of oligonucleotides
(short, synthetic pieces of DNA).
 
5.
They are anchored to the chip and able to grab DNA fragments that have
complementary sequences. Once the fragments have attached, a phase
called 
cluster generation
 begins. This step makes about a thousand copies of
each fragment of DNA.
 
Cluster generation
 
Cluster generation
 
Illumina 
sequencing:
 
1.
Primers and 
modified nucleotides
 enter
the chip. These nucleotides have
reversible 3' blockers that force the
polymerase to add on only one
nucleotide at a time as well as
fluorescent tags.
 
2.
After each round of synthesis, a 
camera
takes a picture
 of the chip. A computer
determines what base was added by the
wavelength of the fluorescent tag and
records it for every spot on the chip.
 
3.
After each round, non-incorporated
molecules are washed away. A chemical
deblocking
 step is then used in the
removal of the 3’ terminal blocking
group and the dye in a single step.
 
Illumina 
sequencing:
 
Massive image processing to generate the sequence
 
Illumina 
short reads analysis:
 
Detecting splice variants:
 
RNAseq vs microarrays
 
What is RNA-seq good for?
 
RNA-seq uses NGS to reveal the presence and
quantity of RNA in a biological sample at a given
moment:
quantifies mRNA, as well as long-non-coding RNA
can quantify 
de-novo 
transcripts
facilitates the ability to look at alternative gene
spliced transcripts
can quantify small RNA, such as miRNA, tRNA
 
Applications
 
What is the 
genome-wide
 response of the transcriptome when
challenge with  some “test” as compared to a “control”?
 
When using microarrays this could be done in a single experiment:
test
control
Differential gene expression analysis
 
Better experimental design: Replicates
for both test (T) and control (C) group
 
Which genes are expressed differently in the two groups?
 
Test
 
Control
 
 
t
-statistic: difference between means in units of average error
Significance can be translated into 
p
-value (probability) assuming normal distributions
 
http://www.physics.csbsju.edu/stats/t-test.html
 
Simplest approach: t-test
 
 
Same difference in mean,
but different variance
 
Quantifying
 
Significance
 
T-test limitations
 
1.
Assumption of normality is not fulfilled for small sets of tests and controls
(One cannot estimate any distribution well based on small sample size.)
 
1.
Assumption of normality is usually not fulfilled for lowly expressed genes
(Counts are discrete and follow Poisson or negative binomial distribution.)
 
Possible workaround: Estimate p-values using permutation analysis:
 
Tool of choice for RNA-seq differential expression analysis: EdgeR
 
https://bioinformatics-core-shared-training.github.io/cruk-bioinf-
sschool/Day3/rnaSeq_DE.pdf
 
Let’s try it out!
Slide Note
Embed
Share

Delve into the Genetics and Genome Evolution (GGE) Bioinformatics for Genomics Lecture Series 2022 presented by Sven Bergmann. Explore topics like RNA-seq, differential expression analysis, clustering, gene expression data analysis, epigenetic data analysis, integrative analysis, CHIP-seq, HiC data, and more. Enhance your understanding of gene expression, data analysis techniques such as PCA, SVD, and clustering, and perform functional enrichment analysis. The series covers modules on GWAS, PCA, genetic risk scores, and biological networks, offering a comprehensive insight into genomics bioinformatics. Don't miss this opportunity to expand your knowledge in the field of genetics and genomics!


Uploaded on Jul 08, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Genetics and Genome Evolution (GGE) Bioinformatics for Genomics Lecture series 2022 Sven Bergmann Department of Computational Biology (Sven.Bergmann@unil.ch)

  2. Bioinformatics for Genomics Overview / Schedule 28 March 2022, 08:00-9:15: Lecture 1: "RNA-seq & DE" (Sven) 28 March 2022, 09:30-10:30: Tutorial 1: "RNA-seq & DE" (Anneke) 28 March 2022, 10:45-12:00: Lecture 2: "Clustering" (Sven) 29 March 2022, 08:00-9:00: Tutorial 2: "Clustering" (Alex) 29 March 2022, 09:15-10:45: Lecture 3: "More seq-data" (Sven) 29 March 2022, 11:00-12:00: Tutorial 3: "More seq-data" (Daniel) 4 April 2022, 14:00-18:00: Flipped classroom discussing exercises 5 April 2022, 8:00-9:30: Lecture 4: "Biological Networks" (Sven) 5 April 2022, 10:00-11:30: Tutorial 4: "Biological Networks" (Daniel) 6 April 2021, 8:00-9:30: Lecture 5: "Advanced clustering" (Sven) 6 April 2021, 10:00-11:30: Final Session: "Wrap-up & feedback" (all)

  3. What will you learn? Analysis of gene expression data What information do you get from RNA-seq? How to do a simple differential expression analysis? How to correct for multiple hypotheses testing? How do use some standard tools for large-scale data analysis (PCA, SVD, clustering)? How to perform functional enrichment analysis (given known gene sets)? Analysis of epigenetic data and integrative analysis What information do you get from CHIP-seq (and similar techniques)? What do you learn from HiC data on chromatin structure? How to perform integrative analysis with gene expression data?

  4. Module on GWAS PCA GWAS Network Genetic Risk Score

  5. Genetics and Genome Evolution (GGE) Bioinformatics for Genomics Lecture 1: RNA-seq & DE Sven Bergmann Department of Computational Biology (Sven.Bergmann@unil.ch)

  6. What is gene expression?

  7. Where is gene expression?

  8. How to measure gene expression? Northern Blot: Single genes RT-PCR: Multiple genes Microarrays: Whole genomes RNA-seq: Whole genomes+

  9. Microarrays

  10. Microarrays

  11. RNA seq RNAquantification using next generation sequencing (NGS)

  12. NGS platforms Oxford Nanopore: direct, electronic analysis of single molecules The future? Illumina: sequence by synthesis market leader

  13. Illumina products:

  14. Illumina procedure: Three basic steps: 1. Amplify 2. Sequence 3. Analyze

  15. Illumina amplification: 1. The process begins with purified DNA (cDNA when analyzing RNA). 2. The DNA gets chopped up into smaller pieces and given adapters, indices, and other kinds of molecular modifications that act as reference points during amplification, sequencing, and analysis. 3. The modified DNA is loaded onto a specialized chip ( flow cell ) where amplification and sequencing will take place. 4. Along the bottom of the chip are hundreds of thousands of oligonucleotides (short, synthetic pieces of DNA). 5. They are anchored to the chip and able to grab DNA fragments that have complementary sequences. Once the fragments have attached, a phase called cluster generation begins. This step makes about a thousand copies of each fragment of DNA.

  16. Cluster generation

  17. Cluster generation

  18. Illumina sequencing: 1. Primers and modified nucleotides enter the chip. These nucleotides have reversible 3' blockers that force the polymerase to add on only one nucleotide at a time as well as fluorescent tags. 2. After each round of synthesis, a camera takes a picture of the chip. A computer determines what base was added by the wavelength of the fluorescent tag and records it for every spot on the chip. 3. After each round, non-incorporated molecules are washed away. A chemical deblocking step is then used in the removal of the 3 terminal blocking group and the dye in a single step.

  19. Illumina sequencing: Massive image processing to generate the sequence

  20. Illumina short reads analysis:

  21. Detecting splice variants:

  22. RNAseq vs microarrays

  23. What is RNA-seq good for? RNA-seq uses NGS to reveal the presence and quantity of RNA in a biological sample at a given moment: quantifies mRNA, as well as long-non-coding RNA can quantify de-novo transcripts facilitates the ability to look at alternative gene spliced transcripts can quantify small RNA, such as miRNA, tRNA

  24. Applications

  25. Differential gene expression analysis What is the genome-wide response of the transcriptome when challenge with some test as compared to a control ? When using microarrays this could be done in a single experiment: control test

  26. Better experimental design: Replicates for both test (T) and control (C) group Test Control Which genes are expressed differently in the two groups?

  27. Simplest approach: t-test t-statistic: difference between means in units of average error Significance can be translated into p-value (probability) assuming normal distributions http://www.physics.csbsju.edu/stats/t-test.html

  28. Same difference in mean, but different variance

  29. QuantifyingSignificance

  30. T-test limitations 1. Assumption of normality is not fulfilled for small sets of tests and controls (One cannot estimate any distribution well based on small sample size.) 1. Assumption of normality is usually not fulfilled for lowly expressed genes (Counts are discrete and follow Poisson or negative binomial distribution.) Possible workaround: Estimate p-values using permutation analysis:

  31. Tool of choice for RNA-seq differential expression analysis: EdgeR

  32. Lets try it out! https://bioinformatics-core-shared-training.github.io/cruk-bioinf- sschool/Day3/rnaSeq_DE.pdf

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#