Understanding the Basics of Biology - Introduction to DNA, Genes, and Proteins

 
A
 
Z
e
r
o
-
K
n
o
w
l
e
d
g
e
 
B
a
s
e
d
 
I
n
t
r
o
d
u
c
t
i
o
n
 
t
o
B
i
o
l
o
g
y
 
B
o
 
Y
o
o
J
a
n
u
a
r
y
 
1
3
,
 
2
0
2
1
A
n
n
o
u
n
c
e
m
e
n
t
s
 
Website: cs273a.stanford.edu
Please sign up for Piazza
 
CA Office Hours:
Vote on Piazza by 5PM PST 1/15
Starting the week of 1/18
A
n
n
o
u
n
c
e
m
e
n
t
s
 
Homework 1 will be released next Wednesday (1/20)
Due 11:59PM 2/1 (via email)
You have 3 late days (can use on homework only)
Read the instructions carefully (what files to submit etc.)
Post questions on Piazza instead of emailing us
Include question number on the subject line
 
2 Problems
Refer to tutorials for clarifications/examples
 
I
n
t
r
o
d
u
c
t
i
o
n
 
t
o
 
t
h
e
 
H
u
m
a
n
 
G
e
n
o
m
e
H
u
m
a
n
 
G
e
n
o
m
e
 
3 billion base pairs: A,T,G,C
Complementary bases:
A-T and C-G
Full DNA sequence in virtually all
cells
D
N
A
 
i
s
 
t
h
e
 
b
l
u
e
p
r
i
n
t
 
f
o
r
 
l
i
f
e
:
 
Cookbook with many “recipes” for
proteins - genes
 
Proteins do most of the work in
biology
 
P
r
o
t
e
i
n
 
c
o
d
i
n
g
 
g
e
n
e
s
 
In human: set of 20-25K genes that eventually
become translated to proteins
The number of genes differ by species!
Seemingly less complex organisms may have large
number of genes
E.g. Human (20-25k genes) vs. Rice (51k genes)
 
How are proteins made from DNA?
 
Central Dogma of Biology
 
G
e
n
e
 
T
r
a
n
s
c
r
i
p
t
i
o
n
DNA -> RNA
 
DNA
 (
Deoxyribonucleic acid) 
 vs RNA
 (
ribonucleic acid)
Deoxyribose in DNA                Ribose in RNA
 
RNA Nucleobases
 
Adenine (A)
 
Cytosine (C)
 
Guanine (G)
 
Uracil (U)
 
pyrimidines
 
purines
 
G
e
n
e
s
 
a
r
e
 
t
r
a
n
s
c
r
i
b
e
d
 
f
r
o
m
 
t
h
e
 
t
e
m
p
l
a
t
e
 
s
t
r
a
n
d
 
Gene Transcription
 (DNA -> RNA)
 
3’
 
5’
 
5’
 
3’
 
G A T T A C A . . .
 
C T A A T G T . . .
 
Gene Transcription
 (DNA -> RNA)
 
3’
 
5’
 
5’
 
3’
 
G A T T A C A . . .
 
C T A A T G T . . .
 
Coding strand (+)
 
Template strand (-)
 
Gene Transcription
 
3’
 
5’
 
5’
 
3’
 
G A T T A C A . . .
 
C T A A T G T . . .
 
Template strand (-)
 
Coding strand (+)
 
Gene Transcription
 
3’
 
5’
 
5’
 
3’
 
G A T T A C A . . .
 
C T A A T G T . . .
 
Strands are separated (DNA helicase)
 
Template strand (-)
 
Coding strand (+)
 
Gene Transcription
 
3’
 
5’
 
5’
 
3’
 
G A T T A C A . . .
 
C T A A T G T . . .
 
G A U U A C A
 
An RNA copy 
that matches the coding strand (besides T->U) is
made from the template strand
 
Template strand (-)
 
Coding strand (+)
 
Gene Transcription
 
3’
 
5’
 
5’
 
3’
 
G A U U A C A . . .
 
G A T T A C A . . .
 
C T A A T G T . . .
 
pre-mRNA
 
5’
 
3’
 
Template strand (-)
 
Coding strand (+)
G
e
n
e
s
 
c
a
n
 
b
e
 
f
o
u
n
d
 
o
n
 
b
o
t
h
 
s
t
r
a
n
d
s
 
Coding and template strands are relative to the gene
A gene can be on the minus strand (reverse
complement)
 
 
 
 
 
In general genomic sequence are written in the
positive strand coordinate
3’
5’
5’
3’
G A T T A C A
C T A A T G T
 U G U A A U C
 . . .
pre-mRNA
5’
3’
Reverse complement
5’
3’
G A U U A C A . . .
G A T T A C A . . .
pre-mRNA
 on 
the positive strand
5’
3’
 
Positive strand
From the positive strand, you can use reverse complement to get
what the gene on the minus strand would be
Reverse complement: reverse the sequence and change the bases
to the complementary bases (i.e., A to T/U, T/U to A, C to G, G to C)
 
. . . 
A T 
G G A A 
C 
G
 U U C C A U
 . . .
pre-mRNA
 on 
the minus strand
5’
3’
RNA Processing
5’ cap
poly(A) tail
intron
exon
mRNA
5’ UTR
3’ UTR
 
G
e
n
e
 
T
r
a
n
s
l
a
t
i
o
n
RNA -> Protein
F
r
o
m
 
R
N
A
 
t
o
 
P
r
o
t
e
i
n
 
Proteins are long strings of amino acids joined by
peptide bonds
Translation from RNA sequence to amino acid
sequence performed by ribosomes
20 amino acids 
 3 RNA letters required to specify
a single amino acid
 (codons)
o
1 letter can code for up to 4
o
2 letters can code for 
up to
16
o
3 letters can code 
up to 64
O
p
e
n
 
R
e
a
d
i
n
g
 
F
r
a
m
e
 
(
O
R
F
)
Open reading frame is a frame that has an ability to
be translated (RNA->Protein)
Contains a continuous codons starting with a start codon
(usually AUG) and end with a stop codon (usually UAA,
UAG, UGA) (inclusive).
ORF
F
i
n
d
i
n
g
 
O
R
F
s
6 strand/frame combinations
+/- strands
3 frames because codons are triplets
All of them can contain an open reading frame
 
A
A
T
T
C
A
T
G
C
G
T
T
T
T
G
A
C
C
A
T
C
A
A
A
T
G
G
C
A
T
A
A
C
G
 
C
G
T
T
A
T
G
C
C
A
T
T
T
G
A
T
G
G
T
C
A
A
A
A
C
G
C
A
T
G
A
A
T
T
 
Reverse complement
(change A->T, T->A, C->G, G->C,
then reverse the sequence)
 
+ strand
 
- strand
F
i
n
d
i
n
g
 
O
R
F
s
A
A
T
T
C
A
T
G
C
G
T
T
T
T
G
A
C
C
A
T
C
A
A
A
T
G
G
C
A
T
A
A
C
G
+ strand
 
A
A
T
 
T
C
A
 
T
G
C
 
G
T
T
 
T
T
G
 
A
C
C
 
A
T
C
 
A
A
A
 
T
G
G
 
C
A
T
 
A
A
C
 
G
 
+ strand/frame 0
 
A
 
A
T
T
 
C
A
T
 
G
C
G
 
T
T
T
 
T
G
A
 
C
C
A
 
T
C
A
 
A
A
T
 
G
G
C
 
A
T
A
 
A
C
G
 
+ strand/frame 1
 
A
A
 
T
T
C
 
A
T
G
 
C
G
T
 
T
T
T
 
G
A
C
 
C
A
T
 
C
A
A
 
A
T
G
 
G
C
A
 
T
A
A
 
C
G
 
+ strand/frame 2
 
A
A
T
 
T
C
A
 
T
G
C
 
G
T
T
 
T
T
G
 
A
C
C
 
A
T
C
 
A
A
A
 
T
G
G
 
C
A
T
 
A
A
C
 
G
 
+ strand/frame 3
 
Same as frame0!
 
F
i
n
d
i
n
g
 
O
R
F
s
 
A
A
T
T
C
A
T
G
C
G
T
T
T
T
G
A
C
C
A
T
C
A
A
A
T
G
G
C
A
T
A
A
C
G
 
+ strand
 
A
A
T
 
T
C
A
 
T
G
C
 
G
T
T
 
T
T
G
 
A
C
C
 
A
T
C
 
A
A
A
 
T
G
G
 
C
A
T
 
A
A
C
 
G
 
+ strand/frame 0
 
A
 
A
T
T
 
C
A
T
 
G
C
G
 
T
T
T
 
T
G
A
 
C
C
A
 
T
C
A
 
A
A
T
 
G
G
C
 
A
T
A
 
A
C
G
 
+ strand/frame 1
 
A
A
 
T
T
C
 
A
T
G
 
C
G
T
 
T
T
T
 
G
A
C
 
C
A
T
 
C
A
A
 
A
T
G
 
G
C
A
 
T
A
A
 
C
G
 
+ strand/frame 2
 
Red – start codon
Blue – stop codon
 
F
i
n
d
i
n
g
 
O
R
F
s
 
A
A
T
T
C
A
T
G
C
G
T
T
T
T
G
A
C
C
A
T
C
A
A
A
T
G
G
C
A
T
A
A
C
G
 
+ strand
 
A
A
T
 
T
C
A
 
T
G
C
 
G
T
T
 
T
T
G
 
A
C
C
 
A
T
C
 
A
A
A
 
T
G
G
 
C
A
T
 
A
A
C
 
G
 
+ strand/frame 0
 
A
 
A
T
T
 
C
A
T
 
G
C
G
 
T
T
T
 
T
G
A
 
C
C
A
 
T
C
A
 
A
A
T
 
G
G
C
 
A
T
A
 
A
C
G
 
+ strand/frame 1
 
A
A
 
T
T
C
 
A
T
G
 
C
G
T
 
T
T
T
 
G
A
C
 
C
A
T
 
C
A
A
 
A
T
G
 
G
C
A
 
T
A
A
 
C
G
 
+ strand/frame 2
 
Red – start codon
Blue – stop codon
High
lighted - ORF
 
F
i
n
d
i
n
g
 
O
R
F
s
 
C
G
T
T
A
T
G
C
C
A
T
T
T
G
A
T
G
G
T
C
A
A
A
A
C
G
C
A
T
G
A
A
T
T
 
- strand
 
C
G
T
 
T
A
T
 
G
C
C
 
A
T
T
 
T
G
A
 
T
G
G
 
T
C
A
 
A
A
A
 
C
G
C
 
A
T
G
 
A
A
T
 
T
 
- strand/frame 0
 
C
 
G
T
T
 
A
T
G
 
C
C
A
 
T
T
T
 
G
A
T
 
G
G
T
 
C
A
A
 
A
A
C
 
G
C
A
 
T
G
A
 
A
T
T
 
- strand/frame 1
 
C
G
 
T
T
A
 
T
G
C
 
C
A
T
 
T
T
G
 
A
T
G
 
G
T
C
 
A
A
A
 
A
C
G
 
C
A
T
 
G
A
A
 
T
T
 
- strand/frame 2
 
Red – start codon
Blue – stop codon
 
F
i
n
d
i
n
g
 
O
R
F
s
 
C
G
T
T
A
T
G
C
C
A
T
T
T
G
A
T
G
G
T
C
A
A
A
A
C
G
C
A
T
G
A
A
T
T
 
- strand
 
C
G
T
 
T
A
T
 
G
C
C
 
A
T
T
 
T
G
A
 
T
G
G
 
T
C
A
 
A
A
A
 
C
G
C
 
A
T
G
 
A
A
T
 
T
 
- strand/frame 0
 
C
 
G
T
T
 
A
T
G
 
C
C
A
 
T
T
T
 
G
A
T
 
G
G
T
 
C
A
A
 
A
A
C
 
G
C
A
 
T
G
A
 
A
T
T
 
- strand/frame 1
 
C
G
 
T
T
A
 
T
G
C
 
C
A
T
 
T
T
G
 
A
T
G
 
G
T
C
 
A
A
A
 
A
C
G
 
C
A
T
 
G
A
A
 
T
T
 
- strand/frame 2
 
Red – start codon
Blue – stop codon
High
lighted - ORF
 
Translation
: codons code for different amino acids
 
Translation
 
The ribosome (a complex of protein and RNA) synthesizes a
protein by reading the mRNA in triplets (codons). Each codon is
translated to an amino acid.
 
Gene Structure
 
5’
 
3’
 
promoter
 
5’ UTR
 
exons
 
3’ UTR
 
introns
 
coding
 
non-coding
A
l
t
e
r
n
a
t
i
v
e
 
s
p
l
i
c
i
n
g
 
Alternative splicing gives rise to different proteins from
the same sequence
Use of different exons may result in different start codon,
stop codon, and even frames.
Different isoforms (functionally similar proteins but do not
have identical AA sequence) of the gene
CATGA
TGCATGT
CTAAGTAG
 
Exon 1
 
Exon 2
 
Exon 3
 
Note: Exons don’t have to be in triplets
A
l
t
e
r
n
a
t
i
v
e
 
s
p
l
i
c
i
n
g
CATGA
TGCATGT
CTAAGTAG
Exon 1
Exon 2
Exon 3
 
CATGA
TGCATGT
CTAAGTAG
Exon1, Exon2, Exon3
Exons used (splicing)
 
C 
ATG
 
A
TG
 
CAT
 
GT
C TAA 
GTAG
 
Full sequence
 
Coding sequence
A
l
t
e
r
n
a
t
i
v
e
 
s
p
l
i
c
i
n
g
CATGA
TGCATGT
CTAAGTAG
Exon 1
Exon 2
Exon 3
C 
ATG
 
A
TG
 
CAT
 
GT
C TAA 
GTAG
Coding sequence
 
Resulting AA sequence
 
MMHV
 
A
l
t
e
r
n
a
t
i
v
e
 
s
p
l
i
c
i
n
g
CATGA
TGCATGT
CTAAGTAG
 
Exon 1
 
Exon 2
 
Exon 3
 
M
o
s
t
 
o
f
 
O
u
r
 
G
e
n
o
m
e
 
D
o
 
N
o
t
 
C
o
d
e
 
f
o
r
 
P
r
o
t
e
i
n
s
!
W
h
a
t
 
d
o
e
s
 
t
h
e
 
r
e
s
t
 
o
f
 
t
h
e
 
g
e
n
o
m
e
 
d
o
?
 
3 billion base pairs in our genome
1-2% coding (codes for proteins)
10-20% regulatory
These regulatory elements give rise to differentiation
1 million Regulatory elements (switches) enable:
Precise control for turning genes on/off
Diverse cell types (lung, heart, skin)
Analogy: Making specific recipes (genes) for a full
meal from a large cookbook (genome) at a given
time
 
G
e
n
e
 
E
x
p
r
e
s
s
i
o
n
 
R
e
g
u
l
a
t
i
o
n
 
Determines when
 each gene 
should 
be expressed
W
h
y
?
 
 
E
v
e
r
y
 
c
e
l
l
 
h
a
s
 
s
a
m
e
 
D
N
A
 
b
u
t
 
e
a
c
h
 
c
e
l
l
e
x
p
r
e
s
s
e
s
 
d
i
f
f
e
r
e
n
t
 
p
r
o
t
e
i
n
s
.
 
D
i
f
f
e
r
e
n
t
 
C
e
l
l
 
T
y
p
e
s
 
Subsets of the DNA sequence determine the identity and function
of different cells
R
e
g
u
l
a
t
o
r
y
 
E
l
e
m
e
n
t
s
 
Expression Modulated by Regulatory elements
Enhancer, Promoters, Silencers
Regulates transcription (DNA -> RNA) of a gene
CS analogy:
Genes are like variable assignments (a = 7)
Regulatory elements are control flow, complex logic
 
 
R
e
g
u
l
a
t
o
r
y
 
E
l
e
m
e
n
t
s
 
Transcription factors (TFs):
Proteins that recognize sequence motifs in enhancers,
promoters
Combinatorial switches that turn genes on/off
Complex assists or inhibits formation of the RNA polymerase
machinery
 
 
 
T
r
a
n
s
c
r
i
p
t
i
o
n
 
F
a
c
t
o
r
 
B
i
n
d
i
n
g
 
S
i
t
e
s
 
Short, degenerate DNA sequences recognized by
particular transcription factors
For complex organisms, cooperative binding of
multiple transcription factors required to initiate
transcription
 
Binding Sequence Logo
 
R
e
p
e
a
t
s
 
Sequences that repeat many times in the genome
About 50% of the genome
 
R
e
p
e
a
t
s
 
1.
Interspersed Repeats (Transposable elements)
Using some unknown mechanic to multiply themselves
and move around in the genome
R
e
p
e
a
t
s
 
2.
Simple repeats
Every possible motif of mono-, di, tri- and
tetranucleotide repeats is vastly overrepresented in
the human genome.
These are called microsatellites,
Longer repeating units are called minisatellites,
The real long ones are called satellites.
AAAAAAAAA
CACACACAC
CAACAACAA
 
S
t
i
l
l
 
a
 
l
o
t
 
t
h
a
t
 
w
e
 
d
o
n
t
 
k
n
o
w
 
M
u
t
a
t
i
o
n
:
 
E
r
r
o
r
s
M
u
t
a
t
i
o
n
s
 
i
n
 
t
h
e
 
G
e
n
o
m
e
 
Over our lifetime, our DNA replicates trillions of times with the help of DNA
polymerase
 
But even polymerase is “imperfect”, every now and then (roughly 1 in
every 100,000 bp), DNA polymerase makes a mistake in replication
resulting in “mutations”
 
There are other sources of mutation, including smoking, sunlight and
radiation
 
Single Nucleotide Changes
 
Single Nucleotide Changes
 
Mutation: 
Structural Abnormalities
 
E
v
o
l
u
t
i
o
n
 
=
 
M
u
t
a
t
i
o
n
 
+
 
S
e
l
e
c
t
i
o
n
H
u
m
a
n
 
M
u
t
a
t
i
o
n
 
R
a
t
e
 
Recent sequencing analysis suggests ~40-60 new
mutations in a child that were not present in either
parent.
Mutations range from the smallest possible (single
base pair change) to the largest – whole genome
duplication
 
Selection does not tolerate all of these mutations, but
it sure does tolerate some.
Selection
time
Harmful mutation
Beneficial mutation
Neutral
 mutation
S
u
m
m
a
r
y
 
All hereditary information encoded in double-
stranded DNA
Each cell in an organism has same DNA
DNA 
  RNA 
 protein
Proteins have many diverse roles in cell
Gene regulation diversifies protein products within
different cells
S
u
m
m
a
r
y
 
Very small portion of the genome actually codes for
proteins, a lot of it is repeats and regulatory
elements
Mutations and repeats may get passed down
generations
 Evolutions happens through mutations and
selection processes
Slide Note
Embed
Share

Explore the fundamental concepts of biology, including the human genome, protein coding genes, central dogma of biology, gene transcription, DNA vs. RNA, and more. Discover how DNA serves as the blueprint for life, how genes are translated into proteins, and the essential processes involved in gene expression.


Uploaded on Jul 08, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. A Zero-Knowledge Based Introduction to Biology Bo Yoo January 13, 2021

  2. Announcements Website: cs273a.stanford.edu Please sign up for Piazza CA Office Hours: Vote on Piazza by 5PM PST 1/15 Starting the week of 1/18

  3. Announcements Homework 1 will be released next Wednesday (1/20) Due 11:59PM 2/1 (via email) You have 3 late days (can use on homework only) Read the instructions carefully (what files to submit etc.) Post questions on Piazza instead of emailing us Include question number on the subject line 2 Problems Refer to tutorials for clarifications/examples

  4. Introduction to the Human Genome

  5. Human Genome 3 billion base pairs: A,T,G,C Complementary bases: A-T and C-G Full DNA sequence in virtually all cells DNA is the blueprint for life: Cookbook with many recipes for proteins - genes Proteins do most of the work in biology

  6. Protein coding genes In human: set of 20-25K genes that eventually become translated to proteins The number of genes differ by species! Seemingly less complex organisms may have large number of genes E.g. Human (20-25k genes) vs. Rice (51k genes) How are proteins made from DNA?

  7. Central Dogma of Biology

  8. Gene Transcription DNA -> RNA

  9. DNA (Deoxyribonucleic acid) vs RNA (ribonucleic acid) Deoxyribose in DNA Ribose in RNA

  10. RNA Nucleobases purines Adenine (A) Guanine (G) Uracil (U) Cytosine (C) pyrimidines

  11. Genes are transcribed from the template strand

  12. Gene Transcription (DNA -> RNA) G A T T A C A . . . 5 3 3 5 C T A A T G T . . .

  13. Gene Transcription (DNA -> RNA) Coding strand (+) G A T T A C A . . . 5 3 3 5 C T A A T G T . . . Template strand (-)

  14. Gene Transcription Coding strand (+) G A T T A C A . . . 5 3 3 5 C T A A T G T . . . Template strand (-)

  15. Gene Transcription Coding strand (+) 5 3 3 5 Template strand (-) Strands are separated (DNA helicase)

  16. Gene Transcription Coding strand (+) 5 3 3 5 Template strand (-) An RNA copy that matches the coding strand (besides T->U) is made from the template strand

  17. Gene Transcription Coding strand (+) G A T T A C A . . . 5 3 3 5 C T A A T G T . . . Template strand (-) G A U U A C A . . . pre-mRNA 5 3

  18. Genes can be found on both strands Coding and template strands are relative to the gene A gene can be on the minus strand (reverse complement) G A T T A C A 5 3 3 5 C T A A T G T U G U A A U C . . . pre-mRNA 5 3 In general genomic sequence are written in the positive strand coordinate

  19. Reverse complement From the positive strand, you can use reverse complement to get what the gene on the minus strand would be Reverse complement: reverse the sequence and change the bases to the complementary bases (i.e., A to T/U, T/U to A, C to G, G to C) Positive strand G A T T A C A . . . . . . A T G G A A C 5 3 pre-mRNA on the positive strand G A U U A C A . . . 5 3 pre-mRNA on the minus strand G U U C C A U . . . 5 3

  20. RNA Processing 5 cap poly(A) tail exon intron mRNA 5 UTR 3 UTR

  21. Gene Translation RNA -> Protein

  22. From RNA to Protein Proteins are long strings of amino acids joined by peptide bonds Translation from RNA sequence to amino acid sequence performed by ribosomes 20 amino acids 3 RNA letters required to specify a single amino acid (codons) o 1 letter can code for up to 4 o 2 letters can code for up to16 o 3 letters can code up to 64

  23. Open Reading Frame (ORF) Open reading frame is a frame that has an ability to be translated (RNA->Protein) Contains a continuous codons starting with a start codon (usually AUG) and end with a stop codon (usually UAA, UAG, UGA) (inclusive). ORF 5 . . . A U U A U G G C C U G G A C U U G A . . . 3 UTR Met Ala Trp Thr Start Codon Stop Codon

  24. Finding ORFs 6 strand/frame combinations +/- strands 3 frames because codons are triplets All of them can contain an open reading frame + strand AATTCATGCGTTTTGACCATCAAATGGCATAACG Reverse complement (change A->T, T->A, C->G, G->C, then reverse the sequence) CGTTATGCCATTTGATGGTCAAAACGCATGAATT - strand

  25. Finding ORFs + strand AATTCATGCGTTTTGACCATCAAATGGCATAACG + strand/frame 0 AAT TCA TGC GTT TTG ACC ATC AAA TGG CAT AAC G + strand/frame 1 A ATT CAT GCG TTT TGA CCA TCA AAT GGC ATA ACG + strand/frame 2 AA TTC ATG CGT TTT GAC CAT CAA ATG GCA TAA CG Same as frame0! + strand/frame 3 AAT TCA TGC GTT TTG ACC ATC AAA TGG CAT AAC G

  26. Finding ORFs + strand AATTCATGCGTTTTGACCATCAAATGGCATAACG + strand/frame 0 AAT TCA TGC GTT TTG ACC ATC AAA TGG CAT AAC G + strand/frame 1 A ATT CAT GCG TTT TGA CCA TCA AAT GGC ATA ACG + strand/frame 2 AA TTC ATG CGT TTT GAC CAT CAA ATG GCA TAA CG Red start codon Blue stop codon

  27. Finding ORFs + strand AATTCATGCGTTTTGACCATCAAATGGCATAACG + strand/frame 0 AAT TCA TGC GTT TTG ACC ATC AAA TGG CAT AAC G + strand/frame 1 A ATT CAT GCG TTT TGA CCA TCA AAT GGC ATA ACG + strand/frame 2 AA TTC ATG CGT TTT GAC CAT CAA ATG GCA TAA CG Red start codon Blue stop codon Highlighted - ORF

  28. Finding ORFs - strand CGTTATGCCATTTGATGGTCAAAACGCATGAATT - strand/frame 0 CGT TAT GCC ATT TGA TGG TCA AAA CGC ATG AAT T - strand/frame 1 C GTT ATG CCA TTT GAT GGT CAA AAC GCA TGA ATT - strand/frame 2 CG TTA TGC CAT TTG ATG GTC AAA ACG CAT GAA TT Red start codon Blue stop codon

  29. Finding ORFs - strand CGTTATGCCATTTGATGGTCAAAACGCATGAATT - strand/frame 0 CGT TAT GCC ATT TGA TGG TCA AAA CGC ATG AAT T - strand/frame 1 C GTT ATG CCA TTT GAT GGT CAA AAC GCA TGA ATT - strand/frame 2 CG TTA TGC CAT TTG ATG GTC AAA ACG CAT GAA TT Red start codon Blue stop codon Highlighted - ORF

  30. Translation: codons code for different amino acids

  31. Translation The ribosome (a complex of protein and RNA) synthesizes a protein by reading the mRNA in triplets (codons). Each codon is translated to an amino acid.

  32. Gene Structure introns 5 3 promoter exons 3 UTR 5 UTR coding non-coding

  33. Alternative splicing Alternative splicing gives rise to different proteins from the same sequence Use of different exons may result in different start codon, stop codon, and even frames. Different isoforms (functionally similar proteins but do not have identical AA sequence) of the gene Exon 1 Exon 2 Exon 3 CATGA TGCATGT CTAAGTAG Note: Exons don t have to be in triplets

  34. Alternative splicing Exon 1 Exon 2 Exon 3 CATGA TGCATGT CTAAGTAG Exons used (splicing) Exon1, Exon2, Exon3 Full sequence CATGATGCATGTCTAAGTAG Coding sequence C ATG ATG CAT GTC TAA GTAG

  35. Alternative splicing Exon 1 Exon 2 Exon 3 CATGA TGCATGT CTAAGTAG Coding sequence Resulting AA sequence C ATG ATG CAT GTC TAA GTAG MMHV

  36. Alternative splicing Exon 1 Exon 2 Exon 3 CATGA TGCATGT CTAAGTAG Exons used (splicing) Full sequence Coding sequence Resulting AA sequence 1,2,3 CATGATGCATGTCTAAGTAG C ATG ATG CAT GTC TAA GTAG MMHV 1,3 CATGACTAAGTAG C ATG ACT AAG TAG MTK 2,3 TGCATGTCTAAGTAG TGC ATG TCT AAG TAG MSK 1,2 CATGATGCATGT C ATG ATG CAT GT No stop codon found

  37. Most of Our Genome Do Not Code for Proteins!

  38. What does the rest of the genome do? 3 billion base pairs in our genome 1-2% coding (codes for proteins) 10-20% regulatory These regulatory elements give rise to differentiation 1 million Regulatory elements (switches) enable: Precise control for turning genes on/off Diverse cell types (lung, heart, skin) Analogy: Making specific recipes (genes) for a full meal from a large cookbook (genome) at a given time

  39. Gene Expression Regulation Determines when each gene should be expressed Why? Every cell has same DNA but each cell expresses different proteins.

  40. Different Cell Types Subsets of the DNA sequence determine the identity and function of different cells

  41. Regulatory Elements Expression Modulated by Regulatory elements Enhancer, Promoters, Silencers Regulates transcription (DNA -> RNA) of a gene CS analogy: Genes are like variable assignments (a = 7) Regulatory elements are control flow, complex logic

  42. Regulatory Elements Transcription factors (TFs): Proteins that recognize sequence motifs in enhancers, promoters Combinatorial switches that turn genes on/off Complex assists or inhibits formation of the RNA polymerase machinery

  43. Transcription Factor Binding Sites Short, degenerate DNA sequences recognized by particular transcription factors For complex organisms, cooperative binding of multiple transcription factors required to initiate transcription Binding Sequence Logo

  44. Repeats Sequences that repeat many times in the genome About 50% of the genome

  45. Repeats 1. Interspersed Repeats (Transposable elements) Using some unknown mechanic to multiply themselves and move around in the genome

  46. Repeats 2. Simple repeats Every possible motif of mono-, di, tri- and tetranucleotide repeats is vastly overrepresented in the human genome. These are called microsatellites, Longer repeating units are called minisatellites, The real long ones are called satellites. AAAAAAAAA CACACACAC CAACAACAA

  47. Still a lot that we dont know

  48. Mutation: Errors

  49. Mutations in the Genome Over our lifetime, our DNA replicates trillions of times with the help of DNA polymerase But even polymerase is imperfect , every now and then (roughly 1 in every 100,000 bp), DNA polymerase makes a mistake in replication resulting in mutations There are other sources of mutation, including smoking, sunlight and radiation

  50. Single Nucleotide Changes

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#