comparative genomics and structural proteomics:...
TRANSCRIPT
COMPARATIVE GENOMICS AND STRUCTURAL PROTEOMICS:
SEQUENCE AND STRUCTURAL HOMOLOGY TO UNRAVEL BIOLOGICAL
HIDDEN FEATURES
Luca Ambrosino, PhD
Topics
• Comparative genomics: orthologs/paralogs prediction
• Comparative genomics: functional annotation
• Structural proteomics: 3D-modeling of protein structures
Comparative genomics studies the differences and similarities in genomic features of different organisms
Comparative Genomics
Ancestor
A B
A1
B1
A2
B2
Duplication
Speciation
paralogs paralogs
orthologs
The detection of ortholog genes is one of the key approaches
Comparative Genomics
A
A1
A2
A1A
A2B
A2C
A2D
ORIGINAL FUNCTION
NEW FUNCTION
NO FUNCTION
SUBFUNCTIONALIZATION
Gene duplication
GENETICNOVELTY
PROTEIN
AT1G07600
PROTEIN
AT5G56795
5’-UTR Coding region
mRNA
AT5G56795
3’-UTR 5’-UTR Coding region
mRNA
AT1G07600
3’-UTR
X
ORF misassignment of AT5G56795 gene
NO SIMILARITY
PARALOGS
Multilevel analysis
5’-UTR Coding region 3’-UTR
mRNA
5’-UTR 3’-UTRCoding regionIntron Intron
GENE
PROTEIN
Multilevel analysis
23
4 5
B
CD
E
1 AXSpecies Xgene collection
Species Ygene collection
Higher score
Lower score
67 F
G
Orthologs definition
23
4 5
B
CD
E
Species Xgene collection
Species Ygene collection
67 F
G
1 A
Lower e-value
Higher e-value
Paralogs definition
Lower e-value
Higher e-value
23
4 5 BCD
E
Species Xgene collection
Species Ygene collection
67 F
G
1 A
Networks construction
Networks construction
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5 6 7 8 9
3-9 gene networks
0
5
10
15
20
25
30
0 5 10 15 20 25 30
10+ gene networks
Spec
ies
y ge
nes
Species x genes
2143
1356
102
0 500 1000 1500 2000 2500
2
3-9
10+
N
Net
wo
rk S
ize
Paralog/ortholog networks
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5 6 7 8 9
3-9 gene networks
0
5
10
15
20
25
30
0 5 10 15 20 25 30
10+ gene networks
Spec
ies
y ge
nes
Species x genes
2143
1356
102
0 500 1000 1500 2000 2500
2
3-9
10+
N
Net
wo
rk S
ize
Auxin responsive proteins
Paralog/ortholog networks
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5 6 7 8 9
3-9 gene networks
0
5
10
15
20
25
30
0 5 10 15 20 25 30
10+ gene networks
Spec
ies
y ge
nes
Species x genes
2143
1356
102
0 500 1000 1500 2000 2500
2
3-9
10+
N
Net
wo
rk S
ize
Paralog/ortholog networks
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5 6 7 8 9
3-9 gene networks
0
5
10
15
20
25
30
0 5 10 15 20 25 30
10+ gene networks
V. v
inif
era
gen
es
S. lycopersicum genes
2143
1356
102
0 500 1000 1500 2000 2500
2
3-9
10+
N
Net
wo
rk S
ize
Glutaredoxin
Acetolactate synthase
Paralog/ortholog networks
16454 15631
11093 11477
4786 1828
PARALOGS E-50
BBHs
PARALOGS E-3
SPECIES-SPECIFIC
2394 1035
SINGLETONS
PARALOGS
SINGLETONS
PARALOGS
545
1849 928
107
Multilevel comparison
Species x Species y
• Orthologs• Paralogs• Networks of orthologs and paralogs• Multilevel approach
COMparative for PARalog and Ortolog genes
COMPARO
Input files
Sequence comparisons
Ortholgs preprocessing
BBHs detection
Networks construction
Alignments reconstruction
Paralogs preprocessingExtraction of .fasta files
Threshold assessment
COMPARO
Input files
Extraction of .fasta files
Sequence comparisons
Ortholgs preprocessing
BBHs detection
Networks construction
Alignments reconstruction
Paralogs preprocessing
Threshold assessmentGenomic.fasta
Genomic.gff
COMPARO
Topics
• Comparative genomics: orthologs/paralogs prediction
• Comparative genomics: functional annotation
• Structural proteomics: 3D-modeling of protein structures
Functional annotation
Functional annotation
Functional annotation
Functional annotation
Functional annotation
Functional annotation
Functional annotation
X
The GENOMA platform
Species x Species y
Functional annotation
16454 15631
11093 11477
4786 1828
PARALOGS E-50
BBHs
PARALOGS E-3
SPECIES-SPECIFIC
2394 1035
SINGLETONS
PARALOGS
SINGLETONS
PARALOGS
545
1849 928
107
Ribosomal protein L5
metallocarboxypeptidase inhibitor (fruit-specific protein)
Transcriptional regulator LysR family
Nuclear pore complex protein (different types)
ATP-dependent DNA helicase (different types)
E3 ubiquitin-protein ligase (different types)
Ethylene-responsive transcription factor (different types)Zinc finger domains (different
types)peptidylprolyl isomerase
EC: 5.2.1.8
Agglutinin
Fanconi anaemia protein FANCD2
GPI ethanolamine phosphate transferase 2
Immediate early response 3-interacting protein 1
DNA repair protein RAD50
Metallothionein-like protein type 3
Rubredoxin-type fold domain
Functional annotation
0
300
600
900
1200
1500
0 300 600 900 1200 1500
P-loop containing nucleoside triphosphate hydrolase
Protein kinase-like domain
Protein kinase domain
Inte
rPro
Do
mai
ns
occ
urr
ence
s(S
pe
cies
2)
InterPro Domains occurrences (Species 1)
0
300
600
900
1200
1500
0 300 600 900 1200 1500
0
50
100
150
200
250
300
0 50 100 150 200 250 300
Functional annotation
Chalcone/stilbene synthase
Germin, manganese binding site
Transposase, MuDR, plant
Ribonuclease III domain
Protein family Ycf2, chloroplast
InterPro Domains occurrences (Species 1)
Inte
rPro
Do
mai
ns
occ
urr
ence
s(S
pec
ies
2)
SPECIES-SPECIFIC DOMAINS
Functional annotation
Species 1 Species 2
21731 17
Carbapenem biosynthesis
Ethylbenzene degradation
Nicotinate and nicotinamide metabolism
Aflatoxin biosynthesis
Monoterpenoid biosynthesis
Glycosaminoglycan biosynthesis
(heparan sulfate / heparin)
Functional annotation
Species 1 Species 2
Topics
• Comparative genomics: orthologs/paralogs prediction
• Comparative genomics: functional annotation
• Structural proteomics: 3D-modeling of protein structures
Homology modeling
Homology modeling
Homology modeling
TemplateTarget sequence
Sequence alignment
Homology modeling
Target sequenceTemplate sequence
Homology modeling
Homology modeling
Conclusions
Orthologs, paralogs and species-specific genes
Functional annotations
3-D modeling
• COMPARO: COMparative for PARalog and Ortolog genes
• Molecular dynamics simulations
Already available services
Soon available