this friday 10am beckman b-200 introduction to text processing lingos

51
http://cs273a.stanford.edu [Bejerano Fall09/10] 1 This Friday 10am Beckman B-200 Introduction to text processing lingos.

Upload: hans

Post on 12-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

This Friday 10am Beckman B-200 Introduction to text processing lingos. Lecture 3. Genome Content: Repetitive Sequences Genes. Our Place in the Tree of Life.  you are here. [Human Molecular Genetics, 3rd Edition]. Metazoans (multi-cellular organisms).  you are here. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 1

This Friday 10am Beckman B-200

Introduction to text processing lingos.

Page 2: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 2

Lecture 3

Genome Content:

Repetitive Sequences

Genes

Page 3: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 3

Our Place in the Tree of Life

[Human Molecular Genetics, 3rd Edition]

you are here

Page 4: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 4

Metazoans (multi-cellular organisms)

[Human Molecular Genetics, 3rd Edition]

you are here

Page 5: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 5

Vertebrates

[Human Molecular Genetics, 3rd Edition]

you are here

, Opossum

, Lizard

, Stickleback

Page 6: This Friday 10am Beckman B-200 Introduction to text processing lingos

Figure from Ryan Gregory (2005)

INTERSPECIES VARIATION IN GENOME SIZE WITHIN VARIOUS GROUPS OF ORGANISMS

6

Page 7: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 7

Meet Your Genome Continues

[Human Molecular Genetics, 3rd Edition]

Page 8: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 8

Page 9: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 9

Repeats / obile Elements ("selfish DNA")

HumanGenome:

3*109 letters1.5%

knownfunction >50%

junk

Page 10: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 10

[Adapted from Lunter]

Page 11: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 11

Page 12: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 12

Page 13: This Friday 10am Beckman B-200 Introduction to text processing lingos

TE composition and assortment vary among eukaryotic genomes

20%

40%

60%

80%

100%

Slim

e m

old

Budd

ing

yeas

t

Fiss

ion

yeas

tN

euro

spor

aAr

abid

opsi

sR

ice

Nem

atod

eD

roso

phila

Mos

quito

Fugu

Mou

seH

uman

DNA transposons

LTR Retro.

Non-LTR Retro.

Feschotte & Pritham 2006

13http://cs273a.stanford.edu [Bejerano Fall09/10]

Page 14: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 14

Page 15: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 15

Page 16: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 16

Page 17: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 17

Page 18: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 18

Page 19: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 19

Page 20: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 20

Assemby Challenges

Page 21: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 21

Inferring Phylogeny Using Repeats

[Nishihara et al, 2006]

Page 22: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 22

Functional elements from obile Elements

[Yass is a small town in New South Wales, Australia.]

Co-option event, probably due to favorable genomic context

[Bejerano et al., Nature 2006]

Page 23: This Friday 10am Beckman B-200 Introduction to text processing lingos

The amount of TE correlate positively with genome size

Pla

smod

ium

Slim

e m

old

Buddin

g y

east

Fiss

ion y

east

Neu

rosp

ora

Ara

bid

opsi

sBra

ssic

aRic

eM

aize

Nem

atod

e

Dro

sophila

Mos

quito

Sea

squirt

Zeb

rafish

Fugu

Mou

seHum

an

0

500

1000

1500

2000

2500

3000 Genomic DNA

TE DNA

Protein-codingDNA

Mb

Feschotte & Pritham 2006

23http://cs273a.stanford.edu [Bejerano Fall09/10]

Page 24: This Friday 10am Beckman B-200 Introduction to text processing lingos

TEs

Protein-coding genes

The proportion of protein-coding genes decreases with genome size, while the proportion of TEs increases with genome size

Gregory, Nat Rev Genet 2005 24http://cs273a.stanford.edu

[Bejerano Fall09/10]

Page 25: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 25

Genome Size Variability

1pg = 978 Mb

Page 26: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 26

Simple Repeats

•Every possible motif of mono-, di, tri- and tetranucleotide repeats is vastly overrepresented in the human genome.

•These are called microsatellites,Longer repeating units are called minisatellites,The real long ones are called satellites.

•Highly polymorphic in the human population.•Highly heterozygous in a single individual.•As a result microsatellites are used in paternity testing, forensics, and the inference of demographic processes.

•There is no clear definition of how many repetitions make a simple repeat, nor how imperfect the different copies can be.

•Highly variable between genomes: e.g., using the same search criteria the mouse & rat genomes have 2-3 times more microsatellites than the human genome. They’re also longer in mouse & rat.

Page 27: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 27

Page 28: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 28

Page 29: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 29

Page 30: This Friday 10am Beckman B-200 Introduction to text processing lingos

Restriction enzymes recognize and make a cut within specific

palindromic sequences, known as restriction sites, in the DNA. This

is usually a 4- or 6 base pair sequence.

blunt end

sticky end

30http://cs273a.stanford.edu [Bejerano Fall09/10]

Page 31: This Friday 10am Beckman B-200 Introduction to text processing lingos

DNA Fingerprint BasicsDNA Fingerprint Basics

DNA fragments of different size will be produced by a restriction enzyme that cuts at the points shown by the arrows.

3131http://cs273a.stanford.edu [Bejerano http://cs273a.stanford.edu [Bejerano

Fall09/10]Fall09/10]

Page 32: This Friday 10am Beckman B-200 Introduction to text processing lingos

DNA fragments are then separated DNA fragments are then separated based on size using gel based on size using gel

electrophoresiselectrophoresis..

3232http://cs273a.stanford.edu [Bejerano http://cs273a.stanford.edu [Bejerano

Fall09/10]Fall09/10]

Page 33: This Friday 10am Beckman B-200 Introduction to text processing lingos

DNA Fingerprinting can be DNA Fingerprinting can be used in paternity testing or used in paternity testing or

murder cases.murder cases.

3333http://cs273a.stanford.edu [Bejerano http://cs273a.stanford.edu [Bejerano

Fall09/10]Fall09/10]

Page 34: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 34

Page 35: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 35

From an evolutionary point of view transposons and simple repeats are very different.

Different instances of the same transposon share common ancestry (but not necessarily a direct common progenitor).

Different instances of the same simple repeat most often do not.

Page 36: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 36

The Gene-ome makes < 2% of the H.G.

[Human Molecular Genetics, 3rd Edition]

Page 37: This Friday 10am Beckman B-200 Introduction to text processing lingos

37

Gene Structure

Signal – a string of DNA recognized by the cellular machinery

http://cs273a.stanford.edu [Bejerano Fall09/10]

Page 38: This Friday 10am Beckman B-200 Introduction to text processing lingos

Gene Processing

Eukaryotic Gene Structure

38http://cs273a.stanford.edu [Bejerano Fall09/10]

Page 39: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 39

Gene Finding – The PracticeChallenge:

“The genes, the whole genes, and nothing but the genes”

Problems:

spliced ESTs legitimate gene isoform?

predicting gene isoforms

tissue/condition-specific genes / gene isoforms

single exon genes

pseudogenes

Practice:

Page 40: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 40

Evolution of Gene Finding Tools

1996

Procrustes

Ab-initio Alignment-based

Comparative Genomics

Informant HMM-based

Pair-HMM Phylo-HMM

Genie

DNA Protein

GenieESTExoFish

Rosetta

Slam

DoubleScan

Siepel-Haussler

Jojic-Haussler

1996

2004

2000

2002

Twinscan2001

1982

Genscan1997

GenieESTHOM2000

cDNA, Protein

intrinsic extrinsichybrid

etc

Page 41: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 41

The Human Gene Set

[HGC, 2001]

Page 42: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 42

[Celera, 2001]

Page 43: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 43

wrong!

Page 44: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 44

Signal Transduction

Page 45: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 45

Ancient Origins of Important Gene Families

Page 46: This Friday 10am Beckman B-200 Introduction to text processing lingos

46

Multigene families due to:

Single gene duplication; Segment duplication: Tandem duplication or

duplication transposition

a b c d e f g

a b c d e f b c d g

Horizontal gene transfer; Genome-wide doubling event

http://cs273a.stanford.edu [Bejerano Fall09/10]

Page 47: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 47

Horizontal Gene Transfer

Page 48: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 48

Horizontal Gene Transfer in the H.G.

[HGC, 2001]

Page 49: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 49

Or is it?

[Kurland et al., 2003]

Page 50: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 50

HGT between fish & their parasites

Page 51: This Friday 10am Beckman B-200 Introduction to text processing lingos

http://cs273a.stanford.edu [Bejerano Fall09/10] 51

Retroposed Genes and Pseudogenes