1 the genome gamble, knowledge or carnage? comparative genomics leading the way @ organon tim...

25
1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

Upload: bernice-taylor

Post on 18-Jan-2016

259 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

1

The Genome Gamble, Knowledge or Carnage?

Comparative Genomics Leading the Way @ Organon

Tim Hulsen, Oss, November 11, 2003

Page 2: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

2

Summary

• (1) An introduction to orthology and paralogy

• (2) Orthology determination within eukaryotes

• (3) Testing the advantages of our ortholog set

• (4) Using evolutionary conservation of co-expression for function prediction

• (5) Evolutionary conservation of chromosomal distance and orientation

Page 3: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

3

(1) An introduction to orthology and paralogy

• Homologous genes: genes that have a common ancestor

• Orthologous genes: genes that evolved from a common ancestor through a speciation event ( equivalents in different species)

• Paralogous genes: genes that evolved from a common ancestor through a duplication event

Page 4: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

4

Orthology and paralogy explained graphically

(from http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html)

Page 5: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

5

The importance of orthology and paralogy

• Orthology relationships especially important for function prediction: orthologous genes generally have the same function but in different species

• Paralogy relationships can be used for function prediction too: paralogous genes are often involved in the same process, but have different molecular functions (e.g. globins)

Page 6: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

6

(2) Orthology determination within eukaryotes

• Not much eukaryotic orthology available at this moment:

•euKaryotic Orthologous Groups (KOG,NCBI)•Inparanoid•OrthoMCL

• Existing databases are either too inclusive or too restrict• Most methods rely on best bidirectional hit (E-value), while orthology is an evolutionary principle.. should be determined using phylogenetic trees!

Page 7: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

7

Our orthology determination

within eukaryotes Hs

At, Ce, Dm, Ec, Gt, Hs, Mm, Sc, Sp

Z>20, RH>0.5*QL

24,263 groups

PHYLOME

SELECTION OF HOMOLOGS

ALIGNMENTS AND TREE

GENOME

GENOMES

TREE SCANNING

LIST Hs-Mm:85,848 pairsHs-Dm:55,934 pairsetc.

Page 8: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

8

Our orthology determination: using phylogenetic trees

Example: BMP6 (Bone Morphogenetic Protein 6) 5 orthologous relations are defined, all Hs-Mm

Page 9: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

9

The ortholog database: Eukaryortho

http://t2.teras.sara.nl:4086(only accessible from Organon, CMBI and SARA)

Page 10: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

10

(3) Testing the advantages of our ortholog set

• Quality of orthology difficult to test• Orthologs should have more or less the same

function --> use conservation of function as an orthology benchmark

• Gene Ontology (GO) database: hierarchical system of function and location descriptions

• Orthologs are in same functional category when they are in the same 4th level GO Molecular Function class

Page 11: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

11

GO molecular function benchmark

01234

• Molecular function: one of the three ‘subroots’ (together with biological process and cellular location)

• ‘True’ orthologs should share a 4th level molecular function (here: GO0019912)

• Our Hs-Mm ortholog set: 67 %• KOG Hs-Mm ortholog set: 51 %

Page 12: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

12

Co-expression benchmark

• Second method: comparing expression profiles of each orthologous gene pair

• Using GeneLogic Expressor data set:– Human chips: 3269 samples, 44792

fragments, 115 tissue categories, 15 SNOMED tissue categories

– Mouse chips: 859 samples, 36701 fragments, 25 tissue categories, 12 SNOMED tissue categories

Page 13: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

13

SNOMED tissue categories used for co-expression calculation

HUMAN MOUSE

1 Blood vessel 1 Blood vessel

2 Cardiovascular system

2 Cardiovascular system

3 Digestive organs

3 Digestive organs

4 Digestive system

4 Digestive system

5 Endocrine gland -

6 Female genital system

5 Female genital system

7 Hematopoietic system

6 Hematopoietic system

8 Integumentary system

7 Integumentary system

HUMAN MOUSE

9 Male genital system

8 Male genital system

10 Musculoskeletal system

9 Musculoskeletal system

11 Nervous system

10 Nervous system

12 Product of conception

-

13 Respiratory system

11 Respiratory system

14 Topographic region

-

15 Urinary tract 12 Urinary tract

Page 14: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

14

Calculating the correlationNxy – (x)(y)

r = ------------------------------------------------- sqrt( (Nx2 - (x)2)(Ny2 – (y)2) )

Human gene 1: 206316_s_at

Mouse gene 1:162926_at

Tissue category

Human gene 2:205428_s_at

Mouse gene 2:97166_at

41.04 83.56 1 62.95 49.11

30.78 61.11 2 67.72 45.18

74.73 92.95 3 93.2 40.76

43.9 78.85 4 68.48 41.2

39.23 88.93 5 54.8 41.24

88.72 100.7 6 52.16 49.64

39.71 83.15 7 73.56 42.84

135.42 169.28 8 46.59 49.58

55.98 79.91 9 205.58 0

0 59.05 10 142.9 34.7

54.78 97.37 11 48.57 48.04

68.11 87.85 12 48.97 46.26

High correlation: 0.914167 Low correlation: -0.935731

Page 15: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

15

Co-expression comparison of our ortholog set to the KOG set

0

0,002

0,004

0,006

0,008

0,01

0,012

0,014

0,016

-1 -0,9 -0,8 -0,7 -0,6 -0,5 -0,4 -0,3 -0,2 -0,1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1

Co-expression (Correlation)

Fra

cti

on

of

pa

irs

in t

his

co

rre

lati

on

ra

ng

e

KOG rel. OUR rel.

Page 16: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

16

(4) Using evolutionary conservation of co-expression for

function predictionHuman

Gene A Gene B

Human/MouseGene A’ Gene B’

Co-expression = Cab (-1<=corr.<=1)

Ca’b’ >= Cab

Increases probability that A and B are involved in the same process

(Co-expression calculated over 115 tissues in human, 25 in mouse)

Page 17: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

17

GO biological process benchmark

01234

• Biological process: one of the three ‘subroots’ (together with cellular location and molecular function)

• Both orthologs and paralogs are often involved in the same process/pathway (=sharing a 4th level biological process, here: GO0007584)

Page 18: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

18

Conservation of co-expression used in function prediction

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

-0,3 -0,2 -0,1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7

Co-expression (Correlation)

Fra

ctio

n S

ame

GO

Bio

log

ical

Pro

cess

(4t

h L

evel

)

Human Human-Human (Paralogous conservation) Human-Mouse (Orthologous conservation)

Page 19: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

19

The importance of (conserved) co-expression for function

prediction• Co-expression without conservation can

already be used for function prediction• Paralogous conservation gives a 2x

higher accuracy• Orthologous conservation gives a 3x or

4x higher accuracy• Alternative for GO Biological Process:

KEGG Pathway database similar results

Page 20: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

20

(5) Evolutionary conservation of chromosomal distance and

orientation

HumanGene A Gene B Distance = Dab (# bp)

Orientation = Oab (,,)Co-expression = Cab (-1<=corr.<=1)

Da’b’ <= DabOa’b’ == OabCa’b’ >= Cab

Human/Mouse

Increases probability that A and B are involved in the same process

Gene A’ Gene B’

(Co-expression calculated over 115 tissues in human, 25 in mouse)

Page 21: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

21

Function prediction using co-expression and chromosomal

distance (without conservation)

0,000000

0,050000

0,100000

0,150000

0,200000

0,250000

0,300000

0,350000

0,400000

0,450000

0,500000

FractionSame GoProcess

(4th Level)

-1-0,9

-0,8-0,7

-0,6-0,5

-0,4-0,3

-0,2-0,10

0,10,2

0,30,4

0,50,6

0,70,8

0,9

1000000

5000000

9000000

13000000

17000000

21000000

25000000

29000000

33000000

37000000

41000000

45000000

49000000

53000000

57000000

61000000

65000000

69000000

73000000

77000000

81000000

85000000

89000000

93000000

97000000

Co-expression(Correlation)

Chromosomal Distance

Page 22: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

22

Conservation of chromosomal distance used in function

prediction

0.0

0.1

0.2

0.3

0.4

0.5

0.6

-0.3 -0.2 -0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Co-expression (Correlation)

Fra

ctio

n S

ame

GO

Pro

cess

(4t

h L

evel

)

Human Coexpr. Human Coexpr. + Dist. < 10 Mbp

Human-Human Paral. Cons. Coexpr. Human-Human Paral. Cons. Coexpr. + Dist. < 10 Mbp

Human-Mouse Orthol. Cons. Coexpr. Human-Mouse Orthol. Cons. Coexpr. + Dist. < 10 Mbp

Page 23: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

23

The importance of chromosomal distance and orientation for function

prediction

• Chromosomal distance in eukaryotes less important than in prokaryotes (due to the absence of operons)

• Only genes with distance < 1 Mbp seem to be coregulated

• Conservation of relative orientation seems to be important only for very close gene pairs

• Limited number of genes can be functional annotated using the conservation of chromosomal distance and orientation

Page 24: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

24

Conclusions

• Orthologous and paralogous relations can be used to improve function prediction

• Our orthologous pairs of Protein World proteins perform better than KOG, in terms of co-expression and involvement in the same process

• Chromosomal distance and relative orientation between genes can be used for function prediction too, in a limited number of cases

• Future plans: find examples where the function of a protein can be predicted using these methods

Page 25: 1 The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

25

Credits

• Martijn Huynen• Peter Groenen• Others at Comics• Others at Organon Bioinf.