Bioinformatics for Genomics Lecture Series 2022 Overview

Genetics and Genome Evolution (GGE)

Bioinformatics for Genomics

Lecture series 2022

Sven Bergmann

Department of Computational Biology

(Sven.Bergmann@unil.ch)

Bioinformatics for Genomics

Overview / Schedule

•

28 March 2022, 08:00-9:15: Lecture 1: "RNA-seq & DE" (Sven)

•

28 March 2022, 09:30-10:30: Tutorial 1: "RNA-seq & DE" (Anneke)

•

28 March 2022, 10:45-12:00: Lecture 2: "Clustering" (Sven)

•

29 March 2022, 08:00-9:00: Tutorial 2: "Clustering" (Alex)

•

29 March 2022, 09:15-10:45: Lecture 3: "More seq-data" (Sven)

•

29 March 2022, 11:00-12:00: Tutorial 3: "More seq-data" (Daniel)

•

4 April 2022, 14:00-18:00: ”Flipped classroom discussing exercises”

•

5 April 2022, 8:00-9:30: Lecture 4: "Biological Networks" (Sven)

•

5 April 2022, 10:00-11:30: Tutorial 4: "Biological Networks" (Daniel)

•

6 April 2021, 8:00-9:30: Lecture 5: "Advanced clustering" (Sven)

•

6 April 2021, 10:00-11:30: Final Session: "Wrap-up & feedback" (all)

What will you learn?

Analysis of gene expression data



What information do you get from

RNA-seq



How to do a simple

differential expression analysis



How to correct for

multiple hypotheses testing



How do use some standard tools for large-scale data analysis

(PCA, SVD, clusteri

ng)?



How to perform

functional enrichment analysis

(given known

gene sets)?

Analysis of epigenetic data and integrative

analysis



What information do you get from

CHIP-seq

(and similar

techniques)?



What do you learn from

HiC data

on chromatin structure?



How to

perform integrative analysis

with gene expression data?

GWAS

Genetic

Risk

Score

PCA

Network

Module on GWAS

Genetics and Genome Evolution (GGE)

Bioinformatics for Genomics

Lecture 1:

RNA-seq & DE

Sven Bergmann

Department of Computational Biology

(Sven.Bergmann@unil.ch)

What is gene expression?

Where is gene expression?

How to measure gene expression?

Northern Blot:

Single genes

RT-PCR:

Multiple genes

Microarrays:

Whole genomes

RNA-seq:

Whole genomes+

Microarrays

Microarrays

RNA seq

RNA

quantification

 using

next generation

seq

uencing (NGS)

NGS platforms

Oxford Nanopore



direct, electronic

analysis of single molecules



The future?

Illumina



sequence by synthesis



market leader

Illumina

products:

Illumina

procedure:

Three basic steps:

1.

 Amplify

2.

 Sequence

3.

 Analyze

Illumina

amplification:

1.

The process begins with

purified DNA

 (cDNA when analyzing RNA).

2.

The DNA gets chopped up into smaller pieces and given

adapters

, indices,

and other kinds of molecular modifications that act as reference points

during amplification, sequencing, and analysis.

3.

The modified DNA is loaded onto a specialized chip (“flow cell”

) where

amplification and sequencing will take place.

4.

Along the bottom of the chip are hundreds of thousands of oligonucleotides

(short, synthetic pieces of DNA).

5.

They are anchored to the chip and able to grab DNA fragments that have

complementary sequences. Once the fragments have attached, a phase

called

cluster generation

 begins. This step makes about a thousand copies of

each fragment of DNA.

Cluster generation

Cluster generation

Illumina

sequencing:

1.

Primers and

modified nucleotides

 enter

the chip. These nucleotides have

reversible 3' blockers that force the

polymerase to add on only one

nucleotide at a time as well as

fluorescent tags.

2.

After each round of synthesis, a

camera

takes a picture

 of the chip. A computer

determines what base was added by the

wavelength of the fluorescent tag and

records it for every spot on the chip.

3.

After each round, non-incorporated

molecules are washed away. A chemical

deblocking

 step is then used in the

removal of the 3’ terminal blocking

group and the dye in a single step.

Illumina

sequencing:

Massive image processing to generate the sequence

Illumina

short reads analysis:

Detecting splice variants:

RNAseq vs microarrays

What is RNA-seq good for?

RNA-seq uses NGS to reveal the presence and

quantity of RNA in a biological sample at a given

moment:

•

quantifies mRNA, as well as long-non-coding RNA

•

can quantify

de-novo

transcripts

•

facilitates the ability to look at alternative gene

spliced transcripts

•

can quantify small RNA, such as miRNA, tRNA

Applications

What is the

genome-wide

 response of the transcriptome when

challenge with  some “test” as compared to a “control”?

When using microarrays this could be done in a single experiment:

test

control

Differential gene expression analysis

Better experimental design: Replicates

for both test (T) and control (C) group

Which genes are expressed differently in the two groups?

Test

Control

-statistic: difference between means in units of average error

Significance can be translated into

-value (probability) assuming normal distributions

http://www.physics.csbsju.edu/stats/t-test.html

Simplest approach: t-test

Same difference in mean,

but different variance

Quantifying

Significance

T-test limitations

1.

Assumption of normality is not fulfilled for small sets of tests and controls

(One cannot estimate any distribution well based on small sample size.)

1.

Assumption of normality is usually not fulfilled for lowly expressed genes

(Counts are discrete and follow Poisson or negative binomial distribution.)

Possible workaround: Estimate p-values using permutation analysis:

Tool of choice for RNA-seq differential expression analysis: EdgeR

https://bioinformatics-core-shared-training.github.io/cruk-bioinf-

sschool/Day3/rnaSeq_DE.pdf

Let’s try it out!

Slide Note

Embed Share

Download Presentation

Delve into the Genetics and Genome Evolution (GGE) Bioinformatics for Genomics Lecture Series 2022 presented by Sven Bergmann. Explore topics like RNA-seq, differential expression analysis, clustering, gene expression data analysis, epigenetic data analysis, integrative analysis, CHIP-seq, HiC data, and more. Enhance your understanding of gene expression, data analysis techniques such as PCA, SVD, and clustering, and perform functional enrichment analysis. The series covers modules on GWAS, PCA, genetic risk scores, and biological networks, offering a comprehensive insight into genomics bioinformatics. Don't miss this opportunity to expand your knowledge in the field of genetics and genomics!

eleni Follow

Uploaded on Jul 08, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Genetics and Genome Evolution (GGE) Bioinformatics for Genomics Lecture series 2022 Sven Bergmann Department of Computational Biology (Sven.Bergmann@unil.ch)

Bioinformatics for Genomics Overview / Schedule 28 March 2022, 08:00-9:15: Lecture 1: "RNA-seq & DE" (Sven) 28 March 2022, 09:30-10:30: Tutorial 1: "RNA-seq & DE" (Anneke) 28 March 2022, 10:45-12:00: Lecture 2: "Clustering" (Sven) 29 March 2022, 08:00-9:00: Tutorial 2: "Clustering" (Alex) 29 March 2022, 09:15-10:45: Lecture 3: "More seq-data" (Sven) 29 March 2022, 11:00-12:00: Tutorial 3: "More seq-data" (Daniel) 4 April 2022, 14:00-18:00: Flipped classroom discussing exercises 5 April 2022, 8:00-9:30: Lecture 4: "Biological Networks" (Sven) 5 April 2022, 10:00-11:30: Tutorial 4: "Biological Networks" (Daniel) 6 April 2021, 8:00-9:30: Lecture 5: "Advanced clustering" (Sven) 6 April 2021, 10:00-11:30: Final Session: "Wrap-up & feedback" (all)

What will you learn? Analysis of gene expression data What information do you get from RNA-seq? How to do a simple differential expression analysis? How to correct for multiple hypotheses testing? How do use some standard tools for large-scale data analysis (PCA, SVD, clustering)? How to perform functional enrichment analysis (given known gene sets)? Analysis of epigenetic data and integrative analysis What information do you get from CHIP-seq (and similar techniques)? What do you learn from HiC data on chromatin structure? How to perform integrative analysis with gene expression data?

Module on GWAS PCA GWAS Network Genetic Risk Score

Genetics and Genome Evolution (GGE) Bioinformatics for Genomics Lecture 1: RNA-seq & DE Sven Bergmann Department of Computational Biology (Sven.Bergmann@unil.ch)

What is gene expression?

Where is gene expression?

How to measure gene expression? Northern Blot: Single genes RT-PCR: Multiple genes Microarrays: Whole genomes RNA-seq: Whole genomes+

Microarrays

Microarrays

RNA seq RNAquantification using next generation sequencing (NGS)

NGS platforms Oxford Nanopore: direct, electronic analysis of single molecules The future? Illumina: sequence by synthesis market leader

Illumina products:

Illumina procedure: Three basic steps: 1. Amplify 2. Sequence 3. Analyze

Illumina amplification: 1. The process begins with purified DNA (cDNA when analyzing RNA). 2. The DNA gets chopped up into smaller pieces and given adapters, indices, and other kinds of molecular modifications that act as reference points during amplification, sequencing, and analysis. 3. The modified DNA is loaded onto a specialized chip ( flow cell ) where amplification and sequencing will take place. 4. Along the bottom of the chip are hundreds of thousands of oligonucleotides (short, synthetic pieces of DNA). 5. They are anchored to the chip and able to grab DNA fragments that have complementary sequences. Once the fragments have attached, a phase called cluster generation begins. This step makes about a thousand copies of each fragment of DNA.

Cluster generation

Cluster generation

Illumina sequencing: 1. Primers and modified nucleotides enter the chip. These nucleotides have reversible 3' blockers that force the polymerase to add on only one nucleotide at a time as well as fluorescent tags. 2. After each round of synthesis, a camera takes a picture of the chip. A computer determines what base was added by the wavelength of the fluorescent tag and records it for every spot on the chip. 3. After each round, non-incorporated molecules are washed away. A chemical deblocking step is then used in the removal of the 3 terminal blocking group and the dye in a single step.

Illumina sequencing: Massive image processing to generate the sequence

Illumina short reads analysis:

Detecting splice variants:

RNAseq vs microarrays

What is RNA-seq good for? RNA-seq uses NGS to reveal the presence and quantity of RNA in a biological sample at a given moment: quantifies mRNA, as well as long-non-coding RNA can quantify de-novo transcripts facilitates the ability to look at alternative gene spliced transcripts can quantify small RNA, such as miRNA, tRNA

Applications

Differential gene expression analysis What is the genome-wide response of the transcriptome when challenge with some test as compared to a control ? When using microarrays this could be done in a single experiment: control test

Better experimental design: Replicates for both test (T) and control (C) group Test Control Which genes are expressed differently in the two groups?

Simplest approach: t-test t-statistic: difference between means in units of average error Significance can be translated into p-value (probability) assuming normal distributions http://www.physics.csbsju.edu/stats/t-test.html

Same difference in mean, but different variance

QuantifyingSignificance

T-test limitations 1. Assumption of normality is not fulfilled for small sets of tests and controls (One cannot estimate any distribution well based on small sample size.) 1. Assumption of normality is usually not fulfilled for lowly expressed genes (Counts are discrete and follow Poisson or negative binomial distribution.) Possible workaround: Estimate p-values using permutation analysis: