Transition Bias and Substitution Models in Genetics

 
Transition Bias and Substitution models
 
Xuhua Xia
xxia@uottawa.ca
http://
dambe.bio.uottawa.ca
 
Xuhua Xia
 
Transition bias refers to the degree by
which the s/v ratio deviates from the
expected 1/2. The observed s/v ratio is
almost always much larger than 1/2.
 
A
 
  G
C
 
  T
 
A
 
  G
C
 
  T
 
A
 
  G
C
 
  T
 
Transitions and Transversions
 
Transition: t
he substitution of a
purine for a purine or a pyrimidine
for a pyrimidine. Symbolized by s.
 
Transversion: t
he substitution of a
purine for a pyrimidine or vice
versa.  Symbolized by v.
 
What is transition bias?
 
Xuhua Xia
 
Transition Bias is Ubiquitous. Why?
 
For both invertebrate and
vertebrate genes:
 
 
 
What causes transition bias?
Mutation bias
Selection bias
 
Selection bias in fixation probability
Protein-coding genes
RNA genes
Mutation bias
 
Xuhua Xia
 
Mitochondrial Genetic Code
 
Synonymous and
nonsynonymous
Degeneracy:
Non-degenerate
Two-fold
degenerate
Four-fold
degenerate
Transitions are
synonymous and
transversions are
nonsynonymous at
two-fold
degenerate sites.
 
Xuhua Xia
 
RNA secondary structure
 
Seq1: CA
Seq1: CA
C
C
GA
GA
      |||||
      |||||
      GUGCU
      GUGCU
Seq2: CA
U
GA
      |||||
      GUGCU
 
Seq1: C
Seq1: C
A
A
CGA
CGA
      |||||
      |||||
      GUGCU
      GUGCU
Seq2: C
G
CGA
      |||||
      GUGCU
 
G/U pair, although not as strong as A/U or C/G
pair, generally does not disrupt RNA secondary
structure (and occurs frequently in RNA secondary
structure).
 
Xuhua Xia
 
Causes of transition bias
 
I often say that when you can measure what you are speaking
about, and express it in numbers, you know something about it; but
when you cannot measure it, when you cannot express it in
numbers, your knowledge is of a meagre and unsatisfactory kind; it
may be the beginning of knowledge, but you have scarcely in your
thoughts advanced to the state of Science, whatever the matter may
be."
Lord Kelvin: Phys. Letter A, vol. 1, "Electrical Units of Measurement", 1883-05-03
 
Xuhua Xia
 
At Four-fold Degenerate Sites
 
At four-fold degenerate sites, all
nucleotide substitutions are
synonymous and subject to roughly
the same selection pressure (similar
fixation probabilities)
     Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...
Fold   4   2   2       2   2   4   4   4       2
S1   GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...
S2   
GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ...
       
s                           s   v
                 Glu                     Gly Trp
 
Xuhua Xia
 
At Nondegenerate Sites
 
At nondegenerate sites, all
nucleotide substitutions are
nonsynonymous and subject to
roughly the same selection pressure
(similar fixation probabilities)
 
     Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...
S1   GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...
S2   
GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ...
                  
s                       v
                 Glu                     Gly Trp
 
Xuhua Xia
 
At Two-fold Degenerate Sites
 
At two-fold degenerate sites, all
transitional substitutions are
synonymous, and all transversional
substitutions are nonsynonymous
 
A transition is about 40 time as like to
become fixed as a transversion.
 
     Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...
Fold   4   2   2       2   2   4   4   4       2
S1   GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...
S2   
GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ...
           
s           s   s                   v
                 Glu                     Gly Trp
 
Xuhua Xia
 
Methylation and deamination
Xuhua Xia
Methylation and DNA Repair in 
E. coli
DNA alphabets: ACGT
RNA alphabets: ACGU
DNA duplication and Watson-Crick paring rule:
A-T, C-G
3’
--CT
A
G-
-
--CT
A
GGTAT----C-----C--CT
A
G-----------5
    ||||    ||||||||    ?     ?  ||||
5’--G
A
TC----GATCCATA----U-----T--GATC-----...   3’
H
3
C 
 
      H
3
C                                                        H
3
C
H
3
C
Spacing of GATC: consequences of being too far.
 
Xuhua Xia
 
Methylation-Modification System
 
TGGC*CA
AC*CGGT
Brevibacterium
albidum
 
dsDNA
phage
 
Bacterial
Genome
 
 
 
 
 
 
Restriction
enzyme
 
Transcription
and Translation
 
Bacterial Membrane
 
----TGG|CCA---
----ACC|GGT---
 
Methylase
 
Xuhua Xia
 
CpG-Specific DNA Methylation
 
Mammalian DNA methyltransferase 1
(DNMT1)
NLS-containing domain
replication foci-directing domain
ZnD
, Zn-binding domain
polybromo domain
CatD
, the catalytic domain
 
Fatemi, M., A. Hermann, S. Pradhan and A. Jeltsch, 2001 J Mol Biol 309: 1189-99.
 
1
 
343
 
350
 
613
 
746
 
1124
 
609
 
748
 
1110
 
NlsD
 
ZnD
 
CatD
 
CpG                                mCpG                              
m
CpG
 
RFDD
 
PBD
 
1620
 
Xuhua Xia
 
CpG-Specific DNA Methylation
 
5’ATG
C
GA-------C
C
GA--------A
C
GGC--TAA 3’
  ||||||       ||||        |||||
3’TACG
C
T-------GG
C
T--------TG
C
CG--ATT 5’
 
H
3
C
 
H
3
C
 
H
3
C
 
Fully methylated    Hemi-methylated    Unmethylated
 
Note: 5’CG3’ = CpG
 
Xuhua Xia
 
Methylation and Gene Regulation
 
Proteins with a methyl-CpG binding domain (MBD)
MBD1, MBD2, and MBD3
MeCP2
Deacetylases: 
An enzyme that removes an acetyl group
Histone deacetylases:
 deacetylate lysyl residues in histones (the half life of an
acetyl group is ~10min). Acetylation removes a positive charge on the lysine 
-
amino group and promote nucleosome melting (and gene expression).
Deacetylation tend to decrease or turn off gene expression.
 
---
m
CpG-----------------
 
Histone
deacetylase
Condensed
DNA with
repressed
transcription
 
Wade, P. A., and A. P. Wolffe, 2001 Nat Struct Biol 8: 575-7.
 
Lysine demethylation
Xuhua Xia
Slide 16
Methylation and Mutation
O
 
Cytocine is converted to Thymine
 
methylation
 
Spontaneous deamination
 
Xuhua Xia
 
Vertebrate mitochondrion
 
 
Xuhua Xia
 
Spontaneous deamination
 
 
Xuhua Xia
 
Transversion can erase transitions
 
Transitions can erase transitions, and transversions can erase transversions.
However, a transversion can erase many transitions occurring before it, and
subsequent transitions cannot erase the transversion:
AACGCTT
G
ACG
AACGCTT
A
ACG
AACGCTT
G
ACG
AACGCTT
C
ACG
AACGCTT
T
ACG
Although a transition could also erase 
2n transversions
 occurring before it,
this is rare because transversions are in generally much rarer than transitions.
 Transitions tend to be missed in counting much more frequently than
transversions.
 
AACGCTT
G
ACG
AACGCTT
T
ACG
AACGCTT
A
ACG
AACGCTT
G
ACG
 
Xuhua Xia
 
Summary
 
Selection: Transitions are tolerated more than transversion by
natural selection because
they are more likely synonymous in protein-coding sequences than
transversions
they are less likely to disrupt RNA secondary structure than
transversions.
Mutation: Transitional mutation occurs more frequently than
transversions because
Misincorporation during DNA replication occur more frequently
between two purines or between two pyrimidines than between a
purine and a pyrimidine
A purine is more likely to mutate chemically to another purine than to
a pyrimidine (e.g., through spontaneous deamination) . The same for
pyrimidine.
Bias in counting: 
Transitions tend to be missed in counting
much more frequently than transversions (which necessitates
the substitution models)
 
Xuhua Xia
 
Nucleotide Substitutions
 
ACACTCGGATTAGGCT
 
ACACTCGGATTAGGCT
 
 
A
T
ACTC
A
G
G
TTA
A
GCT
ACA
A
TC
C
G
G
TTA
A
GCT
   
T        C  C
 
AGACTCGGATTAGGCT
 
Observed
sequences
 
single
 
multiple
 
coincidental
 
parallel
 
convergent
 
back
 
Actual number of changes during the evolution of the two daughter sequences: 12
Observed number of differences between the two daughter sequences: 3.
Correcting for multiple substitutions to to estimate the true number of changes, i.e., 12.
 
From WHL
 
Xuhua Xia
 
Substitution models and phylogenetics
 
A substitution model is to model the evolutonary
process so as to correct for multiple hits.
A phylogenetic reconstruction method implicitly or
explicitly assumes a substitution model.
A phylogenetic method assuming a wrong
substitution model will typically lead to wrong trees
produced.
An alignment with an inappropriate substitution
score matrix will typically lead to inaccurate
alignment (e.g., strong transition bias among
sequences but a substitution score matrix
without strong penalty against transversion)
 
A
 
  G
C
 
  T
   A
 
G
 
C
 
T
A
 
a
1
 
a
2
 
a
3
G  a
7
  
a
4
 
a
5
C  a
8
 
 
a
9
 
  
a
6
T  a
10
 
a
11
 
a
12
   A
 
G
 
C
 
T
A
 
a
1
G
 
 
a
2
C
 
 
a
3
T
G  a
1
A
 
  
a
4
C
 
 
a
5
T
C  a
2
A
 
 
a
4
G
 
  
a
6
T
T  a
3
A
 
 
a
5
G
 
 
a
6
C
 
The diagonal of a transition probability
matrix is subject to the constraint that each
row sums up to 1.
JC69
i
 = 0.25
a
i
 = c
F81/TN84
A
, 
C
, 
G
, 
T
a
i
 = c
K80
i 
=0.25
a
1
 = a
6
 = a
7
 = a
12
 = 
a
2
 = a
3
 = a
4
 = a
5
 = a
8
 = a
9
 = a
10
 = a
11
= 
HKY85
A
, 
C
, 
G
, 
T
a
1
 = a
6
 = a
7
 = a
12
 = 
a
2
 = a
3
 = a
4
 = a
5
 = a
8
 = a
9
 = a
10
 = a
11
= 
TN93
A
, 
C
, 
G
, 
T
a
1
 = a
7
 = 
1
a
6
 = a
12
 = 
2
a
2
 = a
3
 = a
4
 = a
5
 = a
8
 = a
9
 = a
10
 =a
11
= 
 
GTR
 
Unrestricted: no equilibrium 
i
 
Xuhua Xia
 
The TN93 model as an example
 
 
 - frequency parameters
 - rate ratio parameters
In addition to illustrated assumptions, it also assumes that
the frequency and rate ratio parameters do not change over
time, i.e., the substitution process is stationary.
 
A
 
  G
C
 
  T
 
T              C              A             G
 
Xuhua Xia
 
Substitution Models
 
There are three types of substitution models in molecular
evolution
Nucleotide-based
Amino acid-based
Codon-based
Substitution models are characterized by two categories of
parameters: the frequency parameters and the rate ratio
parameters, and different models differ by their assumptions
concerning these two categories of parameters.
Substitution models, substitution score matrix and sequence
alignment.
Slide Note
Embed
Share

Transition bias and substitution models, explored by Xuhua Xia, delve into the concepts of transitions and transversions in genetic mutations, the causes of transition bias, the ubiquitous nature of transition bias in invertebrate and vertebrate genes, the mitochondrial genetic code, and RNA secondary structure. Transition bias refers to deviations from the expected s/v ratio and is influenced by mutation bias and selection bias. The genetic code illustrates synonymous and nonsynonymous mutations, while RNA structure is influenced by base pairing preferences. Lord Kelvin's quote reflects on the importance of quantifying knowledge in science.

  • Genetics
  • Transition Bias
  • Substitution Models
  • Mutations
  • Mitochondrial Code

Uploaded on Sep 18, 2024 | 3 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Transition Bias and Substitution models Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca

  2. Transitions and Transversions Purine Transition: the substitution of a purine for a purine or a pyrimidine for a pyrimidine. Symbolized by s. A G Pyrimidine C T A G Transversion: the substitution of a purine for a pyrimidine or vice versa. Symbolized by v. C T What is transition bias? Transition bias refers to the degree by which the s/v ratio deviates from the expected 1/2. The observed s/v ratio is almost always much larger than 1/2. A G C T Xuhua Xia

  3. Transition Bias is Ubiquitous. Why? For both invertebrate and vertebrate genes: obs s v 1 2 obs What causes transition bias? Mutation bias Selection bias obs s v s s P P Selection bias in fixation probability = Protein-coding genes RNA genes obs v v Mutation bias Xuhua Xia

  4. Mitochondrial Genetic Code Synonymous and nonsynonymous Amino acid Amino acid Amino acid Amino acid Codon Codon Codon Codon UUU UUC UUA UUG Phe Phe Leu Leu UCU UCC UCA UCG Ser Ser Ser Ser UAU UAC UAA UAG Tyr Tyr Stop Stop UGU UGC UGA UGG Cys Cys Trp Trp Degeneracy: Non-degenerate CUU CUC CUA CUG Leu Leu Leu Leu CCU CCC CCA CCG Pro Pro Pro Pro CAU CAC CAA CAG His His Gln Gln CGU CGC CGA CGG Arg Arg Arg Arg Two-fold degenerate Four-fold degenerate AUU AUC AUA AUG lle Ile Met Met ACU ACC ACA ACG Thr Thr Thr Thr AAU AAC AAA AAG Asn Asn Lys Lys AGU AGC AGA AGG Ser Ser Stop Stop Transitions are synonymous and transversions are nonsynonymous at two-fold degenerate sites. GUU GUC GUA GUG Val Val Val Val GCU GCC GCA GCG Ala Ala Ala Ala GAU GAC GAA GAG Asp Asp Glu Glu GGU GGC GGA GGG Gly Gly Gly Gly Xuhua Xia

  5. RNA secondary structure CCAAU CCAAU CCAAU CCAAU Seq1: CACGA ||||| GUGCU Seq1: CACGA ||||| GUGCU Seq2: CAUGA ||||| GUGCU Seq2: CGCGA ||||| GUGCU G/U pair, although not as strong as A/U or C/G pair, generally does not disrupt RNA secondary structure (and occurs frequently in RNA secondary structure). Xuhua Xia

  6. Causes of transition bias obs s v s s P P = obs v v I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be." Lord Kelvin: Phys. Letter A, vol. 1, "Electrical Units of Measurement", 1883-05-03 Xuhua Xia

  7. At Four-fold Degenerate Sites Glycine codon: At four-fold degenerate sites, all nucleotide substitutions are synonymous and subject to roughly the same selection pressure (similar fixation probabilities) GGA GGC GGG GGT Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ... Fold 4 2 2 2 2 4 4 4 2 S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ... S2 GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ... s s v Glu Gly Trp obs s v P P s s s = 2 Four-fold degenerate site obs v v v Xuhua Xia

  8. At Nondegenerate Sites At nondegenerate sites, all nucleotide substitutions are nonsynonymous and subject to roughly the same selection pressure (similar fixation probabilities) Glycine codon: GGA GGC GGG Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ... S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ... S2 GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ... s v Glu Gly Trp GGT obs s v P P s s s = 2 nondegenerate site obs v v v Xuhua Xia

  9. At Two-fold Degenerate Sites At two-fold degenerate sites, all transitional substitutions are synonymous, and all transversional substitutions are nonsynonymous GAA His GAG His GAC Gln GAT Gln Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ... Fold 4 2 2 2 2 4 4 4 2 S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ... S2 GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ... s s s v Glu Gly Trp obs s P P s s s = = 2 80 2-fold degenerate site obs v A transition is about 40 time as like to become fixed as a transversion. P P v v v Xuhua Xia

  10. Methylation and deamination H3C- Methyltransferase H3C- + Donor Acceptor Xuhua Xia

  11. Methylation and DNA Repair in E. coli DNA alphabets: ACGT RNA alphabets: ACGU DNA duplication and Watson-Crick paring rule: A-T, C-G H3C H3C H3C 3 --CTAG----CTAGGTAT----C-----C--CTAG-----------5 |||| |||||||| ? ? |||| 5 --GATC----GATCCATA----U-----T--GATC-----... 3 H3C mutL mutH mutS Spacing of GATC: consequences of being too far. Xuhua Xia

  12. Methylation-Modification System Bacterial Genome Methylase TGGC*CA AC*CGGT Transcription and Translation Restriction enzyme ----TGG|CCA--- ----ACC|GGT--- Bacterial Membrane dsDNA phage Brevibacterium albidum Xuhua Xia

  13. CpG-Specific DNA Methylation Mammalian DNA methyltransferase 1 (DNMT1) NLS-containing domain replication foci-directing domain ZnD, Zn-binding domain polybromo domain CatD, the catalytic domain CpG mCpG mCpG 748 343 609 1110 1 RFDD PBD NlsD ZnD CatD 350 746 1620 613 1124 Fatemi, M., A. Hermann, S. Pradhan and A. Jeltsch, 2001 J Mol Biol 309: 1189-99. Xuhua Xia

  14. CpG-Specific DNA Methylation H3C H3C 5 ATGCGA-------CCGA--------ACGGC--TAA 3 |||||| |||| ||||| 3 TACGCT-------GGCT--------TGCCG--ATT 5 H3C Fully methylated Hemi-methylated Unmethylated Note: 5 CG3 = CpG Xuhua Xia

  15. Methylation and Gene Regulation Proteins with a methyl-CpG binding domain (MBD) MBD1, MBD2, and MBD3 MeCP2 Deacetylases: An enzyme that removes an acetyl group Histone deacetylases: deacetylate lysyl residues in histones (the half life of an acetyl group is ~10min). Acetylation removes a positive charge on the lysine - amino group and promote nucleosome melting (and gene expression). Deacetylation tend to decrease or turn off gene expression. Histone deacetylase Condensed DNA with repressed transcription MBD ---mCpG----------------- Wade, P. A., and A. P. Wolffe, 2001 Nat Struct Biol 8: 575-7. Lysine demethylation Xuhua Xia

  16. Methylation and Mutation NH2 O Spontaneous deamination O H3C H3C methylation N N O N O N Cytocine is converted to Thymine Xuhua Xia Slide 16

  17. Vertebrate mitochondrion Parental H OH Parental L Daughter H Daughter L OL Xuhua Xia

  18. Spontaneous deamination NH2 NH2 NH2 O CH3 N N N N N NH N H N H N H O N H O N N NH2 Adenine Guanine Cytosine Methylcytosine H2O H2O H2O H2O NH3 NH3 NH3 O NH3 O O O CH3 N N N N NH NH N H O N H N H O N H N O N H Hypoxanthine Xanthine (Pair with C) (Pair with C) Uracil (Pair with A) (Pair with A) Thymine Xuhua Xia

  19. Transversion can erase transitions Transitions can erase transitions, and transversions can erase transversions. However, a transversion can erase many transitions occurring before it, and subsequent transitions cannot erase the transversion: AACGCTTGACG AACGCTTAACG AACGCTTGACG AACGCTTTACG AACGCTTAACG AACGCTTGACG AACGCTTGACG AACGCTTCACG AACGCTTTACG Although a transition could also erase 2n transversions occurring before it, this is rare because transversions are in generally much rarer than transitions. Transitions tend to be missed in counting much more frequently than transversions. Xuhua Xia

  20. Summary Selection: Transitions are tolerated more than transversion by natural selection because they are more likely synonymous in protein-coding sequences than transversions they are less likely to disrupt RNA secondary structure than transversions. Mutation: Transitional mutation occurs more frequently than transversions because Misincorporation during DNA replication occur more frequently between two purines or between two pyrimidines than between a purine and a pyrimidine A purine is more likely to mutate chemically to another purine than to a pyrimidine (e.g., through spontaneous deamination) . The same for pyrimidine. Bias in counting: Transitions tend to be missed in counting much more frequently than transversions (which necessitates the substitution models) Xuhua Xia

  21. Nucleotide Substitutions ACACTCGGATTAGGCT coincidental convergent parallel single ATACTCAGGTTAAGCT Observed sequences ACACTCGGATTAGGCT ACAATCCGGTTAAGCT multiple back T C C From WHL AGACTCGGATTAGGCT Actual number of changes during the evolution of the two daughter sequences: 12 Observed number of differences between the two daughter sequences: 3. Correcting for multiple substitutions to to estimate the true number of changes, i.e., 12. Xuhua Xia

  22. Substitution models and phylogenetics A substitution model is to model the evolutonary process so as to correct for multiple hits. A phylogenetic reconstruction method implicitly or explicitly assumes a substitution model. A phylogenetic method assuming a wrong substitution model will typically lead to wrong trees produced. An alignment with an inappropriate substitution score matrix will typically lead to inaccurate alignment (e.g., strong transition bias among sequences but a substitution score matrix without strong penalty against transversion) A G C T Xuhua Xia

  23. The diagonal of a transition probability matrix is subject to the constraint that each row sums up to 1. JC69 i = 0.25 ai = c K80 i =0.25 a1 = a6 = a7 = a12 = a2 = a3 = a4 = a5 = a8 = a9 = a10 = a11= F81/TN84 A, C, G, T ai = c Unrestricted: no equilibrium i A G a1 C a2 a4 T a3 a5 a6 HKY85 A, C, G, T a1 = a6 = a7 = a12 = a2 = a3 = a4 = a5 = a8 = a9 = a10 = a11= A G a7 C a8 T a10 a11 a12 a9 A G a1 Ga2 C a3 T a4 C a5 T C T TN93 A, C, G, T a1 = a7 = 1 a6 = a12 = 2 a2 = a3 = a4 = a5 = a8 = a9 = a10 =a11= A G a1 A C a2 A a4 G T a3 A a5 G a6 C a6 T GTR

  24. The TN93 model as an example T C A G . A G 1 C A G . 1 T A G = Q . C T 2 T C G . 2 T C A - frequency parameters - rate ratio parameters In addition to illustrated assumptions, it also assumes that the frequency and rate ratio parameters do not change over time, i.e., the substitution process is stationary. Xuhua Xia

  25. Substitution Models There are three types of substitution models in molecular evolution Nucleotide-based Amino acid-based Codon-based Substitution models are characterized by two categories of parameters: the frequency parameters and the rate ratio parameters, and different models differ by their assumptions concerning these two categories of parameters. Substitution models, substitution score matrix and sequence alignment. Xuhua Xia

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#