uniview

53
Web Web - - based application based application to survey properties of to survey properties of homologous proteins homologous proteins. Candidato: Diego Poggioli Relatore: Prof. Rita Casadio Correlatore: Dr. Brigitte Boeckmann

Upload: poggio84

Post on 10-May-2015

857 views

Category:

Education


0 download

TRANSCRIPT

Page 1: UniView

WebWeb--based application based application

to survey properties of to survey properties of homologous proteinshomologous proteins.

Candidato:

Diego Poggioli

Relatore:

Prof. Rita Casadio

Correlatore:

Dr. Brigitte Boeckmann

Page 2: UniView

• Bio-problem: Visualization and interaction with

biological data and performing a comparative protein analysis

• Info-solution: Web application – CGI

The portal gives access to four web pages: 1) Function-related annotation derived from UniProtKB/Swiss-Prot; 2) Feature of the protein group; 3) Conservation score; 4) Tree.

Page 3: UniView

Members of a protein family normally perform a general biochemical function in common, but one or more subgroups may evolve a slightly different function, such as different

substrate specificity.

Page 4: UniView

By comparing groups and subgroups of proteins it is possible to identify or estimate:

• similarity and differences between the proteins sequences

as well as the information available for the given protein

group;

• the ranges, within which functional information on proteins can be transferred from experimentally characterized proteins

to their homologs from poorly studied organism;

• errors in the annotations of proteins;

Page 5: UniView

Visualization and interact with biological dataVisualization and interact with biological data

Page 6: UniView

HTML JavaScript, PHP, Perl, Python, Ajax, ASP, Ruby…

C GIphp

System and browser

independent

Dinamic page

Available from

any PC

Page 7: UniView

P02701

P56732

P56734

O13153

P56733

P56735

P56736

AVID_CHICK

AVR2_CHICK

AVR4_CHICK

AVR1_CHICK

AVR3_CHICK

AVR6_CHICK

AVR7_CHICK

ID AVID_CHICK Reviewed; 152 AA.

AC P02701; Q91958; Q98SH4;

DT 21-JUL-1986, integrated into

DT 11-SEP-2007, sequence version 3.

DT 10-JUN-2008, entry version 87.

DE Avidin precursor.

GN Name=AVD;

OS Gallus gallus (Chicken).

OC Eukaryota; Metazoa; Chordata

OC Archosauria; Dinosauria

OC Neognathae; Galliformes

OX NCBI_TaxID=9031; RN [1] RP NUCLEOTIDE SEQUENCE [MRNA].

RX MEDLINE=87203384; PubMed

RA Gope M.L., Keinaenen R.A.,

RA Zarucki-Schulz T., O'Malley B.W.,

RT "Molecular cloning of the chicken

RL Nucleic Acids Res. 15:3595

RN [2] RP NUCLEOTIDE SEQUENCE [MRNA].

RX MEDLINE=90355928; PubMed

RA Chandra G., Gray J.G.;

RT "Cloning and expression of

RL Methods Enzymol. 184:70

Form filling and data type

Page 8: UniView
Page 9: UniView

BioViewBioView• overview on biological informations

• taxonomic descriptive statistics

a compact summary view on the biological information of

a protein group is important especially when having a large dataset. This way it will be possible to observe,

compare and count all common and dissimilar characteristics; it is also possible to analyze in every single detail of component with the same featuring.

- gene name, functional (catalytic activity, enzyme regulation, pathway…) and general

descriptive information;

- organism classification (OC) and organism species (OS);

- non-experimental qualifiers (by similarities, putative or probable).

Page 10: UniView

ID, AC, DE, CC:'FUNCTION', 'PATHWAY', 'CATALYTIC

ACTIVITY', 'ENZYME REGULATION', 'SUBUNIT',

'SIMILARITY', 'COFACTOR', 'DEVELOPMENTAL STAGE',

'INDUCTION', 'PTM', 'SUBCELLULAR LOCALIZATION',

'TISSUE SPECIFICITY'

OS, OC

Eukaryota -

Viridiplantae Eukaryota

Streptophyta Viridiplantae

Embryophyta Streptophyta

Tracheophyta Embryophyta

... ...

Pipeline BioView page

Page 11: UniView

Nuber of entries

Non-redundant annotation

Number of entries with non-experimental qualifier

Number of entries with annotated experimental qualifier

Page 12: UniView

Expande all the hierarchy

On mouse-click the relevant entry names are listed

Page 13: UniView
Page 14: UniView

FeatureViewFeatureView

• Interactive interface for visualizing function-related features on the protein sequence and 3D structure

• This page should allow the user to analyze combined sequences-structure on a broad set of data showing the greatest number of information available in a clear and intuitive way.

Page 15: UniView

Function-related features derived from the FT lines of UniProtKB:

active sites, binding sites, domain, transmembraneregion, DNA binding domain…

are mapped on the alignment and highlighted to allow a clear and compact presentation of the relevant information. The characteristics are mapped on the structure in the same way, allowing to identify regions and conserved sites.

Sequence � FT � Structure

Page 16: UniView

FeatureView

•• Choose the best structureChoose the best structure

• Alignment

• Mapping the feature on the alignment and on the structure

Page 17: UniView

F.P.A. David and Y.L. Yip. SSMap*: a new UniProt-PDB mapping resource for the curation of structural-related

information in the UniProt/Swiss-Prot Knowledgebase. Submitted

...

'91 ' => ‘91',

'25 ' => ‘25',

'92 ' => ‘92',

'81 ' => ‘82',

'71 ' => ‘71',

'21 ' => ‘23',

'-' => 'x',

'61 ' => ‘61',

'37 ' => ‘37',

'68 ' => ‘68',

'50 ' => ‘50',

'18 ' => ‘15',

...

Choose the best structureChoose the best structure

*

Page 18: UniView

Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/

Page 19: UniView
Page 20: UniView
Page 21: UniView
Page 22: UniView

FeatureView

• Choose the best structure

•• AlignmentAlignment

• Mapping the feature on the alignment and on the structure

Page 23: UniView

Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and

high throughput, Nucleic Acids Research 32(5), 1792-97.

Input file

AlignmentAlignment

Page 24: UniView

FeatureView

• Choose the best structure

• Alignment

•• Mapping the feature on the alignment Mapping the feature on the alignment

and on the structureand on the structure

Page 25: UniView

I group: ('CA_BIND', 'NP_BIND', 'MOTIF', 'ACT_SITE', 'METAL',

'BINDING', 'SITE', 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',

'DISULFID', 'CROSSLINK');

II group: ('PEPTIDE', 'TOPO_DOM', 'TRANSMEM', 'DOMAIN',

'REPEAT', 'ZN_FING', 'DNA_BIND', 'REGION', 'COILED');

Input file

AlignmentAlignment

FT (Feature Table) lines

Page 26: UniView

different background colour and a toolbox with the content as described above.

I group: ('CA_BIND', 'NP_BIND', 'MOTIF',

'ACT_SITE', 'METAL', 'BINDING', 'SITE',

'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',

'DISULFID', 'CROSSLINK');

II group: ('PEPTIDE', 'TOPO_DOM',

'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING',

'DNA_BIND', 'REGION', 'COILED');

distinct font color and with a toolbox containing the description of the feature (entry name, feature key, sequence position, description)

-overlapping into the first group � represented in toolbox.-ovelapping into the second group � different background color.

FT (Feature Table) lines

Page 27: UniView

ATOM 1817 N MET B 3 -31.380 87.126 39.296 1.0 100.00

ATOM 1818 CA MET B 3 -30.684 88.400 39.176 1.0 100.00

ATOM 1819 C MET B 3 -30.858 88.967 37.771 1.0 100.00

ATOM 1820 O MET B 3 -30.195 88.514 36.832 1.0 100.00

ATOM 1821 CB MET B 3 -29.190 88.285 39.498 1.0 100.00

ATOM 1822 CG MET B 3 -28.465 89.628 39.501 1.0 100.00

ATOM 1823 SD MET B 3 -26.671 89.415 39.661 1.0 100.00

ATOM 1824 CE MET B 3 -26.312 90.705 40.863 1.0 100.00

ATOM 1825 N GLU B 4 -31.750 89.938 37.638 1.0 50.00

ATOM 1826 CA GLU B 4 -31.927 90.498 36.300 1.0 50.00

… … … … … … … … … … …

50.00

100.00

00.00Alignment position

Page 28: UniView

On mouse-click run blastp on UniProt web page

Page 29: UniView

On mouse-click start Jalview applet

Page 30: UniView

Conservation

• Interactive interface for visualizing the structural conservation of protein groups on the protein sequence and 3D structure

• Highlight positions and regions conserved in the group of proteins

• Conservation scores are mapped on the multiple sequence alignment (MSA) and into the 3D-structure

Page 31: UniView

Input file

Scoring residue conservationScoring residue conservation

Page 32: UniView

0.000 # ---S--------

0.000 # ---T--------

0.000 # ---S--------

0.000 # ---T--------

0.000 # ---S--------

0.024 # ---TM-M-----

0.320 # MMMSV-VVMM--

0.278 # VVVDHMHHGGG-

0.500 # LLLYLLWWLLL-

0.603 # SSSSTTTSSSS-

0.391 # PAAAPAAEDDD-

0.424 # AAAAEEEVGGQT

0.809 # DDDDEEEEEEEE

Scoring methodsScoring methods

Method name Type of score Description

basicmdm Sum-of-Pairs (SP), matrix score Simplest SP score possible

entropynorm7 EntropicNormalized Shanon entropy with 7

symbol types

entropynorm21 EntropicNormalized Shannon entropy with

21 symbol types.

tridentEntropic, matrix score, sequence

weightedMixed model score.

valdar01SP, matrix score, sequence

weighted

Score used in Valdar & Thornton

2001

Page 33: UniView
Page 34: UniView
Page 35: UniView

• develop a method to compare two or more protein subgroups

• profile

At the moment it is a framework integrated for the development of the visualization of info such as annotation and for the

visualization of sites that differ in conservation between protein

subgroups.

Input file

Page 36: UniView

TreeTree

The phylogenetic tree of the protein group will be shown in this page .

Page 37: UniView

Software for phylogenetic tree visualization and manipulations

http://bioinfo.unice.fr/biodiv/Tree_editors.html

- Treedyn: works in local machine but not in server side (graphical applet needed)

- Phylodendron: trouble with cgi script

-phyfi: private program it is not possible to install on own server, eventually URL

request

-nexplorer: NEXUS format needed and it is not possible to install on own server

- dnd2svg.pl: strict sequence number – output only in SVG format

-TreeFam: only private program

� ATV 1.92

Page 38: UniView

http://www.phylosoft.org/atv/

Zmasek C.M. and Eddy S.R. (2001) ATV: display

and manipulation of annotated phylogenetic trees.

Bioinformatics, 17, 383-384.

Gascuel O.1997. BIONJ: an improved version of the NJ algorithm based on a

simple model of sequence data. Molecular Biology and Evolution, 14:685-695.

Input file

Tree in Newick format

((((ACADM_HUMAN:0.000925,ACADM_PANTR:0.003941):0.014922,ACADM_MACFA:0.021579):0.041621,((ACADM

_MOUSE:0.015113,ACADM_RAT:0.029420):0.051559,(ACADM_DROME:0.187088,((ACAD8_MOUSE:0.049728,ACAD

8_HUMAN:0.052753):0.013706,ACAD8_BOVIN:0.104627):1.146493):0.149078):0.010918):0.015504,ACADM_

PIG:0.057735,ACADM_BOVIN:0.023577);

http://www.jalview.org/

Clamp, M., Cuff, J., Searle, S. M. and

Barton, G. J. (2004). The Jalview Java

Alignment Editor. Bioinformatics, 20, 426-7

Page 39: UniView
Page 40: UniView

Future plansFuture plans

• Normalize HTML pages according to the W3C standard

• Improve the use of CSS

• Test the application on different web browser

• Write the application in a server side language

• Integrate the application with other databases

• Ensuring multiple access to the application and analysis history

• Develop a view of phylogenetic tree to show and to interact with additional information

• Hierarchical phylogeny-based classification in UniProtKB

Page 41: UniView

Following the hierarchical

phylogeny-based classification in

UniProtKB

Page 42: UniView
Page 43: UniView

AcknowledgementsAcknowledgements

• Brigitte Boeckmann & Rita Casadio

• Swiss-Prot lab, Biocomputing group

• Fabrice David & Marco Vassura

• Tutti i miei amici e Fra

• Dolores e Davide

And now?And now?

Page 44: UniView

- identify similarity and differences between the proteins

sequences as well as the information available for the given protein group;

- estimating the ranges, within which functional informationon proteins can be transferred from experimentally

characterized proteins to their homologs from poorly studied organism;

- identify errors in the annotations of proteins;

practical examples practical examples

Page 45: UniView

Compact summary view on the biological information of a protein group is important

especially when having a large dataset. This way it will be possible to observe,

compare and count all common and dissimilar characteristics; it is also possible to

analyze in every single detail of component with the same featuring.

Acetylglutamate kinase family

Page 46: UniView

Acyl-CoA dehydrogenase family

Page 47: UniView
Page 48: UniView
Page 49: UniView
Page 50: UniView
Page 51: UniView

gatB/gatE family

Page 52: UniView

IPP transferase family

Page 53: UniView