RNA Secondary Structure Prediction in Bioinformatics

RNA Secondary Structure
Prediction
BMI/CS 776
www.biostat.wisc.edu/bmi776/
Spring 2018
Anthony Gitter
gitter@biostat.wisc.edu
These slides, excluding third-party material, are licensed 
under 
 by Mark 
Craven, Colin Dewey, and Anthony Gitter
CC BY-NC 4.0
Goals for Lecture
Key concepts
RNA secondary structure
Secondary structure features: stems, loops, bulges
Pseudoknots
Nussinov algorithm
Adapting Nussinov to take free energy into account
2
Why RNA is Interesting
Messenger RNA (mRNA) isn’t the only important class of RNA
ribosomal RNA (rRNA)
ribosomes are complexes that incorporate several RNA
subunits in addition to numerous protein units
transfer RNA (tRNA)
transport amino acids to the ribosome during translation
the spliceosome, which performs intron splicing, is a
complex with several RNA units
microRNAs and others that play regulatory roles
many viruses (e.g. HIV) have RNA genomes
guide RNA
sequence complementary determines whether to cleave
DNA
Folding of an mRNA can be involved in regulating the gene’s
expression
3
RNA Secondary Structure
RNA is typically single stranded
Folding, in large part is determined by base-pairing
A
-
U
 and 
C
-
G
 are the canonical base pairs
other bases will sometimes pair, especially 
G
-
U
Base-paired structure is referred to as the 
secondary
structure
 of RNA
Related RNAs often have homologous secondary
structure without significant sequence similarity
4
tRNA Secondary Structure
tertiary
structure
Scitable
5
Small Subunit Ribosomal RNA
Secondary Structure
6
6S RNA Secondary Structure
7
Secondary Structure Features
bulge
internal loop
stem
hairpin loop
8
Four Key Problems
 
Predicting RNA secondary structure
G
i
v
e
n
:
 
R
N
A
 
s
e
q
u
e
n
c
e
D
o
:
 
p
r
e
d
i
c
t
 
s
e
c
o
n
d
a
r
y
 
s
t
r
u
c
t
u
r
e
 
t
h
a
t
 
s
e
q
u
e
n
c
e
 
w
i
l
l
 
f
o
l
d
 
i
n
t
o
 
Searching for instances of a given structure
G
i
v
e
n
:
 
a
n
 
R
N
A
 
s
e
q
u
e
n
c
e
 
o
r
 
i
t
s
 
s
e
c
o
n
d
a
r
y
 
s
t
r
u
c
t
u
r
e
D
o
:
 
f
i
n
d
 
s
e
q
u
e
n
c
e
s
 
t
h
a
t
 
w
i
l
l
 
f
o
l
d
 
i
n
t
o
 
a
 
s
i
m
i
l
a
r
 
s
t
r
u
c
t
u
r
e
 
Modeling a family of RNAs
G
i
v
e
n
:
 
a
 
s
e
t
 
o
f
 
R
N
A
 
s
e
q
u
e
n
c
e
s
 
w
i
t
h
 
s
i
m
i
l
a
r
 
s
e
c
o
n
d
a
r
y
 
s
t
r
u
c
t
u
r
e
D
o
:
 
c
o
n
s
t
r
u
c
t
 
a
 
m
o
d
e
l
 
t
h
a
t
 
c
a
p
t
u
r
e
s
 
t
h
e
 
s
e
c
o
n
d
a
r
y
 
s
t
r
u
c
t
u
r
e
r
e
g
u
l
a
r
i
t
i
e
s
 
o
f
 
t
h
e
 
s
e
t
 
Identifying novel RNA genes
G
i
v
e
n
:
 
a
 
p
a
i
r
 
o
f
 
h
o
m
o
l
o
g
o
u
s
 
D
N
A
 
s
e
q
u
e
n
c
e
s
D
o
:
 
i
d
e
n
t
i
f
y
 
s
u
b
s
e
q
u
e
n
c
e
s
 
t
h
a
t
 
a
p
p
e
a
r
 
t
o
 
h
a
v
e
 
h
i
g
h
l
y
 
c
o
n
s
e
r
v
e
d
R
N
A
 
s
e
c
o
n
d
a
r
y
 
s
t
r
u
c
t
u
r
e
 
(
p
u
t
a
t
i
v
e
 
R
N
A
 
g
e
n
e
s
)
 
Focus for today
9
RNA Folding Assumption
Algorithms we’ll consider assume that base
pairings do not cross
For base-paired positions 
i, i’
 and 
j, j’,
 with 
i < i’
and 
j < j’
, we must have either
 
i < i’ < j < j’
  or  
j < j’ < i < i’
 (not nested)
 
i < j < j’ < i’ 
 or  
j < i < i’ < j’
 (nested)
Can’t have 
i < j < i’ < j’
 or 
j < i < j’ < i’
i
i’
j
j’
i
i’
j’
j
10
Figure from Seliverstov et al. 
BMC Microbiology
, 2005
pseudoknot
Pseudoknots
These crossings are called
pseudoknots
Dynamic programming breaks down
if pseudoknots are allowed
Fortunately, they are not very
frequent
Modern software does support them
Akiyama et al. 2018
11
Simplest RNA Secondary Structure
Task
Given:
An RNA sequence
The constraint that pseudoknots are not allowed
Do:
Find a secondary structure for the RNA that
maximizes the number of base pairing positions
12
Predicting RNA Secondary Structure:
the Nussinov Algorithm
[Nussinov et al., 
SIAM Journal of Applied Mathematics
 1978]
Key idea:
Do this using dynamic programming
start with small subsequences
progressively work to larger ones
13
DP in the Nussinov Algorithm
14
j
i
Figure 10.8 from textbook
max # of
paired bases in
subsequence [
i
, 
j
]
DP in the Nussinov Algorithm
Let
Initialization:
Recursion
max # of
paired bases in
subsequence [
i
, 
j
]
15
Nussinov Algorithm Traceback
16
Predicting RNA Secondary Structure
by Energy Minimization
It’s naïve to predict folding just by maximizing the
number of base pairs
However, we can generalize the key recurrence
relation so that we’re 
minimizing
 free energy instead
case that 
i
 and 
j
are base paired
17
Predicting RNA Secondary Structure
by Energy Minimization
A sophisticated program, such as Mfold [Zuker et al.],
can take into account free energy of  the “local
environment” of [
i, j
]
18
19
Predicting RNA Secondary Structure
by Energy Minimization
20
Mfold example
GGGAAAUCC
http://unafold.rna.albany.edu/
Δ
G = -0.80 kcal/mol
Δ
G = 0.20 kcal/mol
Slide Note
Embed
Share

RNA secondary structure prediction is a key concept in bioinformatics, encompassing features like stems, loops, and bulges. This presentation delves into the importance of RNA beyond mRNA, highlighting rRNA, tRNA, and regulatory RNA roles. The canonical base pairs A-U and C-G shape the single-stranded RNA folding, with base-paired structures defining its secondary structure. Visual aids detail the secondary structures of tRNA, ribosomal RNA, and 6S RNA. The lecture also covers common secondary structure features such as bulges, hairpins, and internal loops in RNA molecules.

  • RNA secondary structure prediction
  • bioinformatics
  • RNA roles
  • base pairing
  • secondary structure features

Uploaded on Sep 27, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. RNA Secondary Structure Prediction BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Mark Craven, Colin Dewey, and Anthony Gitter

  2. Goals for Lecture Key concepts RNA secondary structure Secondary structure features: stems, loops, bulges Pseudoknots Nussinov algorithm Adapting Nussinov to take free energy into account 2

  3. Why RNA is Interesting Messenger RNA (mRNA) isn t the only important class of RNA ribosomal RNA (rRNA) ribosomes are complexes that incorporate several RNA subunits in addition to numerous protein units transfer RNA (tRNA) transport amino acids to the ribosome during translation the spliceosome, which performs intron splicing, is a complex with several RNA units microRNAs and others that play regulatory roles many viruses (e.g. HIV) have RNA genomes guide RNA sequence complementary determines whether to cleave DNA Folding of an mRNA can be involved in regulating the gene s expression 3

  4. RNA Secondary Structure RNA is typically single stranded Folding, in large part is determined by base-pairing A-U and C-G are the canonical base pairs other bases will sometimes pair, especially G-U Base-paired structure is referred to as the secondary structure of RNA Related RNAs often have homologous secondary structure without significant sequence similarity 4

  5. tRNA Secondary Structure Scitable tertiary structure 5

  6. Small Subunit Ribosomal RNA Secondary Structure 6

  7. 6S RNA Secondary Structure 7

  8. Secondary Structure Features bulge hairpin loop stem internal loop 8

  9. Four Key Problems Predicting RNA secondary structure Given: RNA sequence Do: predict secondary structure that sequence will fold into Focus for today Searching for instances of a given structure Given: an RNA sequence or its secondary structure Do: find sequences that will fold into a similar structure Modeling a family of RNAs Given: a set of RNA sequences with similar secondary structure Do: construct a model that captures the secondary structure regularities of the set Identifying novel RNA genes Given: a pair of homologous DNA sequences Do: identify subsequences that appear to have highly conserved RNA secondary structure (putative RNA genes) 9

  10. RNA Folding Assumption Algorithms we ll consider assume that base pairings do not cross For base-paired positions i, i and j, j , with i < i and j < j , we must have either i < i < j < j or j < j < i < i (not nested) i i j j i < j < j < i or j < i < i < j (nested) i j j i Can t have i < j < i < j or j < i < j < i 10

  11. Pseudoknots pseudoknot These crossings are called pseudoknots Dynamic programming breaks down if pseudoknots are allowed Fortunately, they are not very frequent Modern software does support them Akiyama et al. 2018 Figure from Seliverstov et al. BMC Microbiology, 2005 11

  12. Simplest RNA Secondary Structure Task Given: An RNA sequence The constraint that pseudoknots are not allowed Do: Find a secondary structure for the RNA that maximizes the number of base pairing positions 12

  13. Predicting RNA Secondary Structure: the Nussinov Algorithm [Nussinov et al., SIAM Journal of Applied Mathematics 1978] Key idea: Do this using dynamic programming start with small subsequences progressively work to larger ones 13

  14. DP in the Nussinov Algorithm j G G G A A A U C C i G G G A A A U C C max # of paired bases in subsequence [i, j] Figure 10.8 from textbook 14

  15. DP in the Nussinov Algorithm 1 if and are complementary i j x x = Let ( , ) i j 0 otherwise Initialization: , ( = = ) 1 = 0 for to 2 i i i L , ( = ) 0 for to 1 i i i L Recursion + ( , 1 ) i j , ( i ) 1 - j i , ( i = ) max j + + ( , 1 ) 1 - k , ( k ) j i j + + max k i , ( i ) ( , 1 ) j max # of paired bases in subsequence [i, j] j 15

  16. Nussinov Algorithm Traceback push 1 ( onto ) stack ,L repeat until i,j stack empty is pop ( ) if continue i + i j = + else if ( , 1 j + ) , ( i push ) ( , 1 j ) j j i j = ) 1 else if , ( i ) 1 j , ( i + push ) j , ( i i j ) 1 , ( i = , ( i else if ( , 1 i, j + ) ) j record i = base j + pair ) 1 push ( , 1 i + , ( i + , 1 + = , ( i else for to 1 , j if : 1 j- ) ( ) ) k k k j j push ( 1 ) k push ( ) i,k break 16

  17. Predicting RNA Secondary Structure by Energy Minimization It s na ve to predict folding just by maximizing the number of base pairs However, we can generalize the key recurrence relation so that we re minimizing free energy instead + ( 1 j- ) E i j , ( ) 1 E E i, = , ( i E ) min j + + min k i ( ) ( 1 j , ) i, k E k i j , ( P ) j case that i and j are base paired 17

  18. Predicting RNA Secondary Structure by Energy Minimization A sophisticated program, such as Mfold [Zuker et al.], can take into account free energy of the local environment of [i, j] + , ( , ( i, ) LoopEnergy ( ) 1 j i j j i + ( i, + + + ) 1 ( ) StackingEn j ergy , 1 P ) 1 k ( 1 j , ) 1 j i i j , P i + + + + min k ) BulgeEnerg y( ) ( 1 k i j , 1 = , ( i P ) min j + + + min k ( ) BulgeEnerg y( ) ( 1 j , ) 1 i, j k P i k 1 + + + + + min , l k ( ) LoopEnergy ( ) ( 1 j , ) 1 i, j k l P i k l 1 + + + min k j ( ) ( 1 k , + ) ( , 1 ) 1 i, j E i E k j i 18

  19. Predicting RNA Secondary Structure by Energy Minimization ) 1 + ) 1 + + + + + , ( i ) LoopEnergy ( j j i min , l k ( ) LoopEnergy ( ) ( 1 j , i, j k l P i k l 1 c a g u c i+k+1 j-l-1 c u u u c g i j c c a u c c a g u c j i ) 1 + + + + min k ( ) BulgeEnerg y( ) ( 1 i, j k P i k j , 1 a g u c i+k+1 + + ) 1 + + ) 1 j-1 ( ) StackingEn ergy , ( i , 1 j , ( 1 j , i, j j i P i u g c c c g i+1 j-1 c a g u c j i c g i j a u 19

  20. Mfold example GGGAAAUCC G = -0.80 kcal/mol G = 0.20 kcal/mol http://unafold.rna.albany.edu/ 20

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#