Sequence Alignment in Bioinformatics

C
O
M
P
U
T
A
T
I
O
N
A
L
 
M
E
T
H
O
D
S
O
F
SEQUENCE
 
ALIGNMENT
O
U
T
L
I
N
E
Bioinformatics
Sequence Alignment
Types of 
a sequence
 
alignment
Methods
 
of
 sequence
 
alignment
Sequence alignment is 
a way 
of arranging
 sequences of 
DNA,RNA
 
or
protein to identify regions of 
similarity
. The 
similarity 
may indicate
functional, structural and  evolutionary significance.
The 
sequence alignment is made between 
a 
known sequence and
unknown sequence or between two unknown
 
sequences.
The 
known sequence is called 
a 
reference sequence
. T
he
unknown  sequence is called 
the 
query
 
sequenc
e
.
D
e
f
i
n
i
t
i
o
n
 
o
f
 
s
e
q
u
e
n
c
e
 
a
l
i
g
n
m
e
n
t
I
n
t
e
r
p
r
e
t
a
t
i
o
n
 
o
f
 
s
e
q
u
e
n
c
e
 
a
l
i
g
n
m
e
n
t
Sequence alignment is useful for discovering 
structural, 
functional
and evolutionary
 
information.
Sequences that are 
highly 
alike may have 
similar 
secondary and  
3D
structure, similar 
function and likely 
a 
common ancestral  sequence. It
is extremely unlikely that such sequences obtained  
similarity 
by
chance. 
Large scale genome studies revealed existence of horizontal
transfer of genes and other sequences between species, which may
cause 
similarity 
between some sequences in very distant
 
species.
Types of Sequence
 
Alignment
Sequence Alignment 
is 
of two types 
, 
namely
 
:
Global
 
Alignment
Local
 
Alignment
Global Alignment 
: is 
matching the residues of two 
sequences
across their entire
 
length.
 
G
lobal alignment 
matches 
the
identical 
sequences
 
.
Local Alignment 
: is a 
matching two sequence from regions which
have 
more 
similarity with each
 
other.
Types of Sequence
 
Alignment
Global
 
alignment
Input: treat the two sequences as potentially
 
equivalent
Goal: identify conserved regions and
 
differences
Applications:
-
Comparing two genes with same function (in human vs.
 
mouse).
-
Comparing two proteins with 
similar
 
function.
Types of Sequence
 
Alignment
Local
 
alignment
Input: 
The 
two sequences may or may not be
 
related
Goal: see whether 
a 
substring 
in one sequence aligns well with 
a
substring 
in the other
Note: for local matching, overhangs at the ends are not treated as
 
gaps
Applications:
-
Searching for local 
similarities 
in large sequences
(e.g., newly sequenced
 
genomes).
-
Looking for 
conserved domains 
or 
motifs in two
 
proteins
Types of Sequence
 
Alignment
Globalalignment
L G P S S K Q T G K G S - S R I W D
 
N
L N - I T K S A G K G A I M R L G D
 
A
Localalignment
- - - - - - - T G K G - - - - - - -
 
-
- - - - - - - A G K G - - - - - - -
 
-
Dot 
matrix method
Dynamic programming method
Word 
or 
k
-tuple
 
methods
M
e
t
h
o
d
 
o
f
 
s
e
q
u
e
n
c
e
 
a
l
i
g
n
m
e
n
t
A 
dot 
matrix 
is 
a 
grid system where the 
similar 
nucleotides of two 
DNA
sequences are represented as
 
dots.
It also called dot
 
plots.
It is 
a 
pairwise sequence alignment made in the
 
computer.
The 
dots appear as colourless dots in the computer
 
screen.
In dot 
matrix
, 
nucleotides of one sequence are written from the left to
right 
on 
the top row and those of the other sequence are written from the
top to bottom 
on 
the column of the matrix.
 
At every point,  where the two
nucleotides are the same
, a 
dot in the 
intersection 
of row and column
becomes 
a 
dark dot. 
Each 
dot in the plot represents 
a 
matching
nucleotide or amino
 
acid.
D
o
t
 
m
a
t
r
i
x
 
a
n
a
l
y
s
i
s
Dot 
matrix 
method is 
a 
qualitative and 
simple 
to analyze
sequences.however ,it takes much 
time 
to analyze large
 
sequences.
Dot 
matrix 
method is useful for the following studies
 
:
Sequence 
similarity 
between two nucleotide sequences or two amino
acid
 
sequences.
Insertion 
of short stretches in 
DNA 
or amino acid
 
sequence.
Deletion of short stretches from 
a DNA 
or amino acid
 
sequence.
Repeats or inserted repeats in 
a DNA 
or amino acid
 
sequence.
D
o
t
 
m
a
t
r
i
x
 
a
n
a
l
y
s
i
s
Nucleic Acids 
Dot
 
Plots
D
o
t
 
m
a
t
r
i
x
 
a
n
a
l
y
s
i
s
:
 
T
w
o
 
i
d
e
n
t
i
c
a
l
 
s
e
q
u
e
n
c
e
s
Nucleic Acids 
Dot 
Plots of
 
genes
D
o
t
 
m
a
t
r
i
x
 
a
n
a
l
y
s
i
s
:
 
t
w
o
 
v
e
r
y
 
d
i
f
f
e
r
e
n
t
 
s
e
q
u
e
n
c
e
s
Nucleic Acids 
Dot 
Plots of
 
genes
D
o
t
 
m
a
t
r
i
x
 
a
n
a
l
y
s
i
s
:
 
t
w
o
 
s
i
m
i
l
a
r
 
s
e
q
u
e
n
c
e
s
Each alignment has a score, several different alignments can have identical
scores as the method can produce more than one optimal alignment.
Manipulation of parameters can discriminate alignments with similar scores.
Global alignment is based on the Needleman-Wunsch algorithm and local
alignment on the Smith-Waterman algorithm. The Smith-Waterman
underpins tools that align sequencing data to reference genomes e.g. BWA.
Both algorithms are derived from the basic dynamic programming algorithm.
Word 
Method 
or K-tuple
 
method
It is used to find an optimal alignment solution
.
This method is useful in large-scale database searches to find whether there
is significant match available with the query
 
sequence.
Word method is used in the database search tools 
like FASTA and the
BLAST
 
family 
.
They 
identify 
a 
series of short,
 
non-overlapping subsequences (words) of
the query
 
sequence.
Then 
they are matched to candidate database sequences to get 
a 
result
 
.
Word Method or K-tuple
 
method
In the FASTA method, the user defines 
a 
value 
k 
to 
use as the word length
to search the database. It is slower but more 
sensitive 
at lower values of  
k
.
They 
are also preferred for searches involving 
a 
very short query sequence
.
BLAST 
provides 
a 
number of algorithms optimized for particular types  of
queries
 e.g. 
for 
distantly 
related sequence
 
matches.
It is 
a 
good alternative to FASTA. Like FASTA, 
BLAST uses 
a 
word
search of length 
k
,
 
but 
evaluates only  the most significant word
matches rather than every word match
 
.
Later we will study BLAST in greater depth and try out BLAST
alignments!
Pairwise v Multiple alignment
Here we have focused on pairwise alignments, however there are cases
when we wish to compare more than one sequence i.e. multiple sequence
alignments 
We can cluster groups of sequences according to similarity and we typically
use different tools for this
More to come….
Slide Note
Embed
Share

Sequence alignment in bioinformatics involves arranging DNA, RNA, or protein sequences to identify similarities for functional, structural, and evolutionary insights. It helps in comparing genes, proteins, and discovering conserved regions, highlighting the importance of global and local alignment methods.

  • Bioinformatics
  • Sequence alignment
  • DNA
  • Protein
  • Evolutionary insights

Uploaded on Sep 07, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. COMPUTATIONAL METHODS OF SEQUENCE ALIGNMENT

  2. OUTLINE Bioinformatics Sequence Alignment Types of a sequence alignment Methods of sequence alignment

  3. Definition of sequence alignment Sequence alignment is a way of arranging sequences of DNA,RNA or protein to identify regions of similarity. The similarity may indicate functional, structural and evolutionary significance. The sequence alignment is made between a known sequence and unknown sequence or between two unknown sequences. The known sequence is called a reference sequence. The unknown sequence is called the query sequence.

  4. Interpretation of sequence alignment Sequence alignment is useful for discovering structural, functional and evolutionary information. Sequences that are highly alike may have similar secondary and 3D structure, similar function and likely a common ancestral sequence. It is extremely unlikely that such sequences obtained similarity by chance. Large scale genome studies revealed existence of horizontal transfer of genes and other sequences between species, which may cause similarity between some sequences in very distant species.

  5. Types of Sequence Alignment Sequence Alignment is of two types , namely : Global Alignment LocalAlignment Global Alignment : is matching the residues of two sequences across their entire length. Global alignment matches the identical sequences . Local Alignment : is a matching two sequence from regions which have more similarity with each other.

  6. Types of Sequence Alignment Global alignment Input: treat the two sequences as potentially equivalent Goal: identify conserved regions and differences Applications: - Comparing two genes with same function (in human vs. mouse). - Comparing two proteins with similar function.

  7. Types of Sequence Alignment Local alignment Input: The two sequences may or may not be related Goal: see whether a substring in one sequence aligns well with a substring in the other Note: for local matching, overhangs at the ends are not treated as gaps Applications: -Searching for local similarities in large sequences (e.g., newly sequenced genomes). - Looking for conserved domains or motifs in two proteins

  8. Types of Sequence Alignment L G P S S K Q T G K G S - S R I W D N Globalalignment L N - I T K S A G K G A I M R L G D A - - - - - - - T G K G - - - - - - - - Localalignment - - - - - - - A G K G - - - - - - - -

  9. Method of sequence alignment Dot matrix method Dynamic programming method Word or k-tuplemethods

  10. Dot matrix analysis A dot matrix is a grid system where the similar nucleotides of two DNA sequences are represented as dots. It also called dot plots. It is a pairwise sequence alignment made in the computer. The dots appear as colourless dots in the computer screen. In dot matrix, nucleotides of one sequence are written from the left to right on the top row and those of the other sequence are written from the top to bottom on the column of the matrix. At every point, where the two nucleotides are the same, a dot in the intersection of row and column becomes a dark dot. Each dot in the plot represents a matching nucleotide or amino acid.

  11. Dot matrix analysis Dot matrix method is a qualitative and simple to analyze sequences.however ,it takes much time to analyze large sequences. Dot matrix method is useful for the following studies : Sequence similarity between two nucleotide sequences or two amino acid sequences. Insertion of short stretches in DNA or amino acid sequence. Deletion of short stretches from a DNA or amino acid sequence. Repeats or inserted repeats in a DNA or amino acid sequence.

  12. Dot matrix analysis: Two identical sequences Nucleic Acids Dot Plots

  13. Dot matrix analysis: two very different sequences Nucleic Acids Dot Plots of genes

  14. Dot matrix analysis: two similar sequences Nucleic Acids Dot Plots of genes

  15. Dynamic Programming Method Is the process of solving problems where one needs to find the best decision one after another. It was introduced by Richard Bellman in 1940. The word programming here denotes finding an acceptable plan of action not computer programming. It is useful in aligning nucleotide sequence of DNA and amino acid sequence of proteins coded by that DNA . Dynamic programming is a three step process that involves : Breaking of the problem into small subproblems. 1) 2) 3) Solving subproblems using recursive methods. Construction of optimal solutions for original problem using the optimal solutions .

  16. Dynamic programming algorithm for sequence alignment Dynamic programming algorithm for sequence alignment alignment Dynamic programming algorithm for sequence The method compares every pair of characters in the two sequences and generates an alignment, which is the best or optimal. The method compares every pair of characters in the two sequences and generates an alignment, which is the best or optimal. generates an alignment, which is the best or optimal. The method compares every pair of characters in the two sequences and This is a highly computationally demanding method. However the latest algorithmic improvements and ever increasing computer capacity make possible to align a query sequence against a large DB in a few minutes. algorithmic improvements and ever increasing computer capacity make possible to align a query sequence against a large DB in a few minutes. to align a query sequence against a large DB in a few minutes. This is a highly computationally demanding method. However the latest This is a highly computationally demanding method. However the latest algorithmic improvements and ever increasing computer capacity make possible Each alignments has its own score and it is essential to recognise that several different alignments may have nearly identical scores, which is an indication that the dynamic programming methods may produce more than one optimal alignment. However intelligent manipulation of some parameters is important and may discriminate the alignments with similar scores. Global alignment is based on the Needleman-Wunsch algorithm and local alignment on the Smith-Waterman algorithm. The Smith-Waterman underpins tools that align sequencing data to reference genomes e.g. BWA. Both algorithms are derived from the basic dynamic programming algorithm. alignment on Smith-Waterman. Both algorithms are derivates from the basic dynamic programming algorithm. dynamic programming algorithm. Each alignment has a score, several different alignments can have identical scores as the method can produce more than one optimal alignment. Manipulation of parameters can discriminate alignments with similar scores. different alignments may have nearly identical scores, which is an indication that the dynamic programming methods may produce more than one optimal alignment. However intelligent manipulation of some parameters is important and may discriminate the alignments with similar scores. and may discriminate the alignments with similar scores. Each alignments has its own score and it is essential to recognise that several Each alignments has its own score and it is essential to recognise that several different alignments may have nearly identical scores, which is an indication that the dynamic programming methods may produce more than one optimal alignment. However intelligent manipulation of some parameters is important Global alignment program is based on Needleman-Wunsch algorithm and local alignment on Smith-Waterman. Both algorithms are derivates from the basic dynamic programming algorithm. Global alignment program is based on Needleman-Wunsch algorithm and local alignment on Smith-Waterman. Both algorithms are derivates from the basic Global alignment program is based on Needleman-Wunsch algorithm and local

  17. Word Method or K-tuple method It is used to find an optimal alignment solution. This method is useful in large-scale database searches to find whether there is significant match available with the query sequence. Word method is used in the database search tools like FASTA and the BLAST family . They identify a series of short, non-overlapping subsequences (words) of the query sequence. Then they are matched to candidate database sequences to get a result .

  18. Word Method or K-tuple method In the FASTA method, the user defines a value k to use as the word length to search the database. It is slower but more sensitive at lower values of k. They are also preferred for searches involving a very short query sequence. BLAST provides a number of algorithms optimized for particular types of queries e.g. for distantly related sequence matches. It is a good alternative to FASTA. Like FASTA, BLAST uses a word search of length k, but evaluates only the most significant word matches rather than every word match . Later we will study BLAST in greater depth and try out BLAST alignments!

  19. Pairwise v Multiple alignment Here we have focused on pairwise alignments, however there are cases when we wish to compare more than one sequence i.e. multiple sequence alignments We can cluster groups of sequences according to similarity and we typically use different tools for this More to come .

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#