this friday 10am beckman b-200 introduction to text processing lingos
DESCRIPTION
This Friday 10am Beckman B-200 Introduction to text processing lingos. Lecture 3. Genome Content: Repetitive Sequences Genes. Our Place in the Tree of Life. you are here. [Human Molecular Genetics, 3rd Edition]. Metazoans (multi-cellular organisms). you are here. - PowerPoint PPT PresentationTRANSCRIPT
http://cs273a.stanford.edu [Bejerano Fall09/10] 1
This Friday 10am Beckman B-200
Introduction to text processing lingos.
http://cs273a.stanford.edu [Bejerano Fall09/10] 2
Lecture 3
Genome Content:
Repetitive Sequences
Genes
http://cs273a.stanford.edu [Bejerano Fall09/10] 3
Our Place in the Tree of Life
[Human Molecular Genetics, 3rd Edition]
you are here
http://cs273a.stanford.edu [Bejerano Fall09/10] 4
Metazoans (multi-cellular organisms)
[Human Molecular Genetics, 3rd Edition]
you are here
http://cs273a.stanford.edu [Bejerano Fall09/10] 5
Vertebrates
[Human Molecular Genetics, 3rd Edition]
you are here
, Opossum
, Lizard
, Stickleback
Figure from Ryan Gregory (2005)
INTERSPECIES VARIATION IN GENOME SIZE WITHIN VARIOUS GROUPS OF ORGANISMS
6
http://cs273a.stanford.edu [Bejerano Fall09/10] 7
Meet Your Genome Continues
[Human Molecular Genetics, 3rd Edition]
http://cs273a.stanford.edu [Bejerano Fall09/10] 8
http://cs273a.stanford.edu [Bejerano Fall09/10] 9
Repeats / obile Elements ("selfish DNA")
HumanGenome:
3*109 letters1.5%
knownfunction >50%
junk
http://cs273a.stanford.edu [Bejerano Fall09/10] 10
[Adapted from Lunter]
http://cs273a.stanford.edu [Bejerano Fall09/10] 11
http://cs273a.stanford.edu [Bejerano Fall09/10] 12
TE composition and assortment vary among eukaryotic genomes
20%
40%
60%
80%
100%
Slim
e m
old
Budd
ing
yeas
t
Fiss
ion
yeas
tN
euro
spor
aAr
abid
opsi
sR
ice
Nem
atod
eD
roso
phila
Mos
quito
Fugu
Mou
seH
uman
DNA transposons
LTR Retro.
Non-LTR Retro.
Feschotte & Pritham 2006
13http://cs273a.stanford.edu [Bejerano Fall09/10]
http://cs273a.stanford.edu [Bejerano Fall09/10] 14
http://cs273a.stanford.edu [Bejerano Fall09/10] 15
http://cs273a.stanford.edu [Bejerano Fall09/10] 16
http://cs273a.stanford.edu [Bejerano Fall09/10] 17
http://cs273a.stanford.edu [Bejerano Fall09/10] 18
http://cs273a.stanford.edu [Bejerano Fall09/10] 19
http://cs273a.stanford.edu [Bejerano Fall09/10] 20
Assemby Challenges
http://cs273a.stanford.edu [Bejerano Fall09/10] 21
Inferring Phylogeny Using Repeats
[Nishihara et al, 2006]
http://cs273a.stanford.edu [Bejerano Fall09/10] 22
Functional elements from obile Elements
[Yass is a small town in New South Wales, Australia.]
Co-option event, probably due to favorable genomic context
[Bejerano et al., Nature 2006]
The amount of TE correlate positively with genome size
Pla
smod
ium
Slim
e m
old
Buddin
g y
east
Fiss
ion y
east
Neu
rosp
ora
Ara
bid
opsi
sBra
ssic
aRic
eM
aize
Nem
atod
e
Dro
sophila
Mos
quito
Sea
squirt
Zeb
rafish
Fugu
Mou
seHum
an
0
500
1000
1500
2000
2500
3000 Genomic DNA
TE DNA
Protein-codingDNA
Mb
Feschotte & Pritham 2006
23http://cs273a.stanford.edu [Bejerano Fall09/10]
TEs
Protein-coding genes
The proportion of protein-coding genes decreases with genome size, while the proportion of TEs increases with genome size
Gregory, Nat Rev Genet 2005 24http://cs273a.stanford.edu
[Bejerano Fall09/10]
http://cs273a.stanford.edu [Bejerano Fall09/10] 25
Genome Size Variability
1pg = 978 Mb
http://cs273a.stanford.edu [Bejerano Fall09/10] 26
Simple Repeats
•Every possible motif of mono-, di, tri- and tetranucleotide repeats is vastly overrepresented in the human genome.
•These are called microsatellites,Longer repeating units are called minisatellites,The real long ones are called satellites.
•Highly polymorphic in the human population.•Highly heterozygous in a single individual.•As a result microsatellites are used in paternity testing, forensics, and the inference of demographic processes.
•There is no clear definition of how many repetitions make a simple repeat, nor how imperfect the different copies can be.
•Highly variable between genomes: e.g., using the same search criteria the mouse & rat genomes have 2-3 times more microsatellites than the human genome. They’re also longer in mouse & rat.
http://cs273a.stanford.edu [Bejerano Fall09/10] 27
http://cs273a.stanford.edu [Bejerano Fall09/10] 28
http://cs273a.stanford.edu [Bejerano Fall09/10] 29
Restriction enzymes recognize and make a cut within specific
palindromic sequences, known as restriction sites, in the DNA. This
is usually a 4- or 6 base pair sequence.
blunt end
sticky end
30http://cs273a.stanford.edu [Bejerano Fall09/10]
DNA Fingerprint BasicsDNA Fingerprint Basics
DNA fragments of different size will be produced by a restriction enzyme that cuts at the points shown by the arrows.
3131http://cs273a.stanford.edu [Bejerano http://cs273a.stanford.edu [Bejerano
Fall09/10]Fall09/10]
DNA fragments are then separated DNA fragments are then separated based on size using gel based on size using gel
electrophoresiselectrophoresis..
3232http://cs273a.stanford.edu [Bejerano http://cs273a.stanford.edu [Bejerano
Fall09/10]Fall09/10]
DNA Fingerprinting can be DNA Fingerprinting can be used in paternity testing or used in paternity testing or
murder cases.murder cases.
3333http://cs273a.stanford.edu [Bejerano http://cs273a.stanford.edu [Bejerano
Fall09/10]Fall09/10]
http://cs273a.stanford.edu [Bejerano Fall09/10] 34
http://cs273a.stanford.edu [Bejerano Fall09/10] 35
From an evolutionary point of view transposons and simple repeats are very different.
Different instances of the same transposon share common ancestry (but not necessarily a direct common progenitor).
Different instances of the same simple repeat most often do not.
http://cs273a.stanford.edu [Bejerano Fall09/10] 36
The Gene-ome makes < 2% of the H.G.
[Human Molecular Genetics, 3rd Edition]
37
Gene Structure
Signal – a string of DNA recognized by the cellular machinery
http://cs273a.stanford.edu [Bejerano Fall09/10]
Gene Processing
Eukaryotic Gene Structure
38http://cs273a.stanford.edu [Bejerano Fall09/10]
http://cs273a.stanford.edu [Bejerano Fall09/10] 39
Gene Finding – The PracticeChallenge:
“The genes, the whole genes, and nothing but the genes”
Problems:
spliced ESTs legitimate gene isoform?
predicting gene isoforms
tissue/condition-specific genes / gene isoforms
single exon genes
pseudogenes
Practice:
http://cs273a.stanford.edu [Bejerano Fall09/10] 40
Evolution of Gene Finding Tools
1996
Procrustes
Ab-initio Alignment-based
Comparative Genomics
Informant HMM-based
Pair-HMM Phylo-HMM
Genie
DNA Protein
GenieESTExoFish
Rosetta
Slam
DoubleScan
Siepel-Haussler
Jojic-Haussler
1996
2004
2000
2002
Twinscan2001
1982
Genscan1997
GenieESTHOM2000
cDNA, Protein
intrinsic extrinsichybrid
etc
http://cs273a.stanford.edu [Bejerano Fall09/10] 41
The Human Gene Set
[HGC, 2001]
http://cs273a.stanford.edu [Bejerano Fall09/10] 42
[Celera, 2001]
http://cs273a.stanford.edu [Bejerano Fall09/10] 43
wrong!
http://cs273a.stanford.edu [Bejerano Fall09/10] 44
Signal Transduction
http://cs273a.stanford.edu [Bejerano Fall09/10] 45
Ancient Origins of Important Gene Families
46
Multigene families due to:
Single gene duplication; Segment duplication: Tandem duplication or
duplication transposition
a b c d e f g
a b c d e f b c d g
Horizontal gene transfer; Genome-wide doubling event
http://cs273a.stanford.edu [Bejerano Fall09/10]
http://cs273a.stanford.edu [Bejerano Fall09/10] 47
Horizontal Gene Transfer
http://cs273a.stanford.edu [Bejerano Fall09/10] 48
Horizontal Gene Transfer in the H.G.
[HGC, 2001]
…
http://cs273a.stanford.edu [Bejerano Fall09/10] 49
Or is it?
[Kurland et al., 2003]
http://cs273a.stanford.edu [Bejerano Fall09/10] 50
HGT between fish & their parasites
http://cs273a.stanford.edu [Bejerano Fall09/10] 51
Retroposed Genes and Pseudogenes