Bioinformatics and Molecular Evolution Relationship

Slide Note

Bioinformatics plays a crucial role in understanding the molecular evolution process, from DNA sequences to protein function, highlighting the importance of evolutionary context in predicting protein function. Homology and analogy concepts provide insights into shared ancestry and convergent evolution, shaping our understanding of related proteins and their functions.

rost_a Follow

Uploaded on Feb 18, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

MCB3421 - 2024 Class 2

What does Bioinformatics have to do with Molecular Evolution? DNA sequence -> transcription -> translation -> protein folding -> protein function (catalytic and other properties) -> properties of the organism(s) -> properties of the population and community -> ecology (taking also the non biological environment into account) Most scientists believe that the principle of reductionism (plus new laws and relations emerging on each level); however, at several steps along the way from DNA to function our understanding of the chemical and physical processes involved is so incomplete that prediction of protein function based on only a single DNA sequence and first principles is at present impossible for proteins of reasonable size. Solution: Use evolutionary context: "Nothing in biology makes sense except in the light of evolution" "Nothing in biology makes sense except in the light of evolution" (Theodosius Dobzhansky) Present day proteins evolved through substitution and selection from ancestral proteins. Related proteins have similar sequence AND similar structure AND similar function.

Theodosius Dobzhansky "Nothing in biology makes sense except in the light of evolution"

Homology bird wing bat wing human arm by Bob Friedman homology is similarity due to shared ancestry

homology vs analogy A priori sequences could be similar due to convergent evolution Homology (shared ancestry) versus Analogy (convergent evolution) bird wing butterfly wing bat wing fly wing

Related proteins Present day proteins evolved through substitution and selection from ancestral proteins. Related proteins have similar sequence AND similar structure AND similar function (at least if they did not diverge too much). In the above mantra "similar function" can refer to: identical function, similar function, e.g.: identical reactions catalyzed in different organisms; or same catalytic mechanism but different substrate (malic and lactic acid dehydrogenases); similar subunits and domains that are brought together through a (hypothetical) process called domain shuffling, e.g. nucleotide binding domains in hexokinse, myosin, HSP70, and ATPsynthases.

homology Two sequences are homologous, if there existed an ancestral molecule in the past that is ancestral to both of the extant sequences Homology is a "yes" or "no" character (don't know is also possible as is very likely). Either sequences (or characters) share ancestry or they don't (like pregnancy). Molecular biologist often use homology as synonymous to similarity or percent identity. One often reads: sequence A and B are 70% homologous. To an evolutionary biologist this sounds as wrong as 70% pregnant. Important types of Homology Orthology: bifurcation in molecular tree reflects speciation Paralogy: bifurcation in molecular tree reflects gene duplication other types of homology that are often distinguished are synology (due to genome fusion) and xenology (due to gene transfer)

Sequence Similarity vs Homology The following is based on observation and not on an a priori truth: If two (complex) sequences show significant similarity in their primary sequence, they have shared ancestry (i.e. they are homologs), and probably similar function (although some proteins acquired radically new functional assignments, lense crystalline).

The Size of Protein Sequence Space (back of the envelope calculation) Consider a protein of 600 amino acids. Assume that for every position there could be any of the twenty possible amino acid. Then the total number of possibilities is 20 choices for the first position times 20 for the second position times 20 to the third .... = 20 to the 600 = 4*10780 different proteins possible with lengths of 600 amino acids. For comparison: the universe contains only about 1089 protons and has an age of about 5*1017 seconds or 5*1029 picoseconds. If every proton in the universe were a super-computer that explored one possible protein sequence per picosecond, we only would have explored 5*10118 sequences, i.e. a negligible fraction of the possible sequences with length 600 (one in about 10662).

Ways to construct sequence Space Figure from Eigen et al. 1988 illustrating the construction of a high dimensional sequence space. Each additional sequence position adds another dimension, doubling the diagram for the shorter sequence. Shown is the progression from a single sequence position (line) to a tetramer (hypercube). A four (or twenty) letter code can be accommodated either through allowing four (or twenty) values for each dimension (Rechenberg 1973; Casari et al. 1995), or through additional dimensions (Eigen and Winkler-Oswatitsch 1992). Eigen, M. and R. Winkler-Oswatitsch (1992). Steps Towards Life: A Perspective on Evolution. Oxford; New York, Oxford University Press. Eigen, M., R. Winkler-Oswatitsch and A. Dress (1988). "Statistical geometry in sequence space: a method of quantitative comparative sequence analysis." Proc Natl Acad Sci U S A85(16): 5913-7 Casari, G., C. Sander and A. Valencia (1995). "A method to predict functional residues in proteins." Nat Struct Biol2(2): 171-8 Rechenberg, I. (1973). Evolutionsstrategie; Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Stuttgart-Bad Cannstatt, Frommann-Holzboog.

Size of protein space versus connectivity: While the size of the combinatoric space for proteins is unimaginable (for 600 amino acid long proteins this space has 4*10780 vertices), this space is also highly connected, that is, it takes less than 600 steps (counting a muatation of an amino acid in the sequence as a step) to get from an arbitrary point in this space to any other arbitrary point.

no similarity vs no homology If two (complex) sequences show significant similarity in their primary sequence, they have shared ancestry, and probably similar function. >>> THE REVERSE IS NOT TRUE !! <<< PROTEINS WITH THE SAME OR SIMILAR FUNCTION DO NOT ALWAYS SHOW SIGNIFICANT SEQUENCE SIMILARITY for one of two reasons: a) they evolved independently (e.g. different types of nucleotide binding sites), i.e. they are not homologous; or b) they underwent so many substitution events that there is no readily detectable similarity remaining; i.e. they are homologous, but the homology can no longer be inferred from the similarity of the primary sequence (too many substitutions; Corollary: PROTEINS WITH SHARED ANCESTRY DO NOT ALWAYS SHOW SIGNIFICANT SIMILARITY.

The space of of possible protein folds According to the rules of combinatorics, an astronomical number of possible different amino acid sequences exists sequences. Levinthal estimated the possible number of conformations for a given protein sequence also is astronomical. He calculated that a protein with 100 amino acids has 99 peptide bonds, each with a phi and psi angles, resulting in a total of 198 angles around which the peptide backbone can rotate. If each of these can exist in three possible energetic minima (the sidechains are staggered), one arrives at 3198 = 3 1094 possible conformations. Levinthal's paradox says: If a protein would randomly explore all possible conformations, it would require a time longer than the age of the universe to arrive at its correct native conformation.

The Phi and Psi angles associated with each alpha carbon along the peptide chain Formation of a peptide bond Illustration of the peptide plane (gray area) and - angles. The red line formed by the repeating -C -C-N-C - is the backbone of the peptide chain. From: https://en.wikipedia.org/wiki/Peptide_bond