pipeline for functional annotation of novel …accelrys.com/.../pdf/hyseq_pipeline_annot.pdf ·...
TRANSCRIPT
HS00222
HS00945
query sequence lengthquery sequence lengthquery sequence length
Structure Search ResultsStructure Search ResultsStructure Search Results
start, end & range of model sequencestart, end & range of model sequencestart, end & range of model sequence
Psi-blast score & linkPsiPsi--blast score & linkblast score & link
3D model scores3D model scores3D model scores
SeqFold scoreSeqFold scoreSeqFold score
model sequence % identity & similarity
model sequence model sequence % identity & similarity% identity & similarityPDB functional annotationPDB functional annotationPDB functional annotation
total numberof hits for each sequence
total numbertotal numberof hits for each sequenceof hits for each sequence
PDB template originPDB template originPDB template origin
SCOP & PDB public database linksSCOP & PDB public database linksSCOP & PDB public database linksHS00222
HS00222
HS00945
HS00945
HS00945
PDB active site annotationPDB active site annotationPDB active site annotation
HS00222
sequence identifiersequence identifiersequence identifier
structure methodstructure methodstructure method
Acknowledgements
We would like to thank Sue Andarmani and Ling Jiang (web interface), Savita Jayaram (Structure Plus), Kiran Mukhyala (structure analysis tools), Ami Gavali (SQL database), and Ivan Labat for their excellent work and contributions.
Disclaimer: Sequence and structure data are only representations of the real data
References
• Sánchez, R., Šali, A., PNAS 95 (1998) 13597-13602.
• Fischer, D., Eisenberg, D., Theor. Chem. Acc. 101 (1999) 57.
• Fischer, D., Eisenberg, D., Protein Sci. 5 (1996) 947-955.
• Lüthy, R., Bowie, J., Eisenberg, D., Nature 356 (1992) 83-85.
• Kitson, D., et al., Briefings in Bioinform. In press.
Abstract
We have created a high-throughput, integral pipeline of structure analysis protocols for over 10,000 of Hyseq’s proprietary protein sequences. This structure analysis pipeline incorporates 3D structure prediction and functional annotation (GeneAtlasTM, Accelrys Inc., San Diego), parsing and datamining programs, an SQL structure database, and several structural analysis programs. These tools are all accessible via an in-house web-interface. The pipeline has allowed us to obtain significant structure hits (over 100,000) and 3D models for many of our novel protein sequences. After storing the hit information in our database, we datamine the hits by keywords and analyze template-model structure pairs for individual novel proteins. Altogether, the results of the pipeline are used to aid in the functional annotation of our sequences by structure homology including active site residues, to interpret and verify sequence-based annotation, and to rapidly target novel genes to appropriate assays.
High-throughput 3D structure determination from novel gene sequences has created new opportunities for us for discovery of biopharmaceuticals acting through novel mechanisms.
PIPELINE FOR FUNCTIONAL ANNOTATION OF NOVEL PROTEINS BY STRUCTURPIPELINE FOR FUNCTIONAL ANNOTATION OF NOVEL PROTEINS BY STRUCTURAL HOMOLOGYAL HOMOLOGYDana Haley-Vicente* and Nancy Mize
Hyseq Pharmaceuticals Inc., 675 Almanor Ave., Sunnyvale, CA 94086
* Currently at Accelrys, 9685 Scranton Rd., San Diego, CA 92121
searching databaseby keyword(s) or sequence ID
searching databasesearching databaseby keyword(s) or sequence IDby keyword(s) or sequence ID
filtering forbest hits
filtering forfiltering forbest hitsbest hits
search & print fields optionssearch & print fields optionssearch & print fields options
3D Protein Structure Search3D Protein Structure Search3D Protein Structure Search
query sequence
model
query query sequence sequence
modelmodel
PDB template structure
PDB PDB template template structurestructure
3D Viewer Links
3D Viewer 3D Viewer LinksLinks
Structure AnalysisStructure AnalysisStructure Analysis
alignment analysisalignment analysisalignment analysis
USER INTERFACE
USER INTERFACE
PIPELINE & DATABASEPIPELINE & DATABASE
query sequence & template alignment
query sequence query sequence & template & template alignmentalignment
HS01Project
Individual GeneAtlasTM 3D Model Report
Individual Individual GeneAtlasGeneAtlasTMTM 3D 3D Model ReportModel Report
secondary structure
annotation
secondary secondary structure structure
annotationannotation
HS01Project
Individual GeneAtlasTM SeqFold
Report
Individual Individual GeneAtlasGeneAtlasTMTM SeqFold SeqFold
ReportReportstatistical analysisstatistical analysisstatistical analysis
CloningCloningSequencingSequencing
3D Active Site Annotation
3D Active Site Annotation
3D Alignment &
Statistical Analysis Tools
3D Alignment &
Statistical Analysis Tools
Template Search
(Psi-Blast)
Template Search
(Psi-Blast)
Sequence / Template Alignment
(Psi-Blast, PDB95)
Sequence / Template Alignment
(Psi-Blast, PDB95)
Model Generation(MODELER)
Model Generation(MODELER)
Model AnnotationModel Annotation
DataminingDatamining
Threading(SeqFold)
Threading(SeqFold)
GeneAtlasTM
Structure Plus(Parser & Filter Program)
Structure Plus(Parser & Filter Program)
Protein Sequences(Projects)
Protein Sequences(Projects)
StructureDatabase
StructureDatabase
Model Evaluation(Profiles-3D/Verify & PMF)
Model Evaluation(Profiles-3D/Verify & PMF)
The Protein Structure Pipeline
The Protein The Protein Structure PipelineStructure Pipeline
join toother databases
join tojoin toother databasesother databases
individualhit data
individualindividualhit datahit data
method typemethod typemethod type
project dataproject dataproject data
active sitedata for
query sequencemodels
active siteactive sitedata for data for
query sequencequery sequencemodelsmodels
individualsequence
data
individualindividualsequencesequence
datadata
active site & template PDB
structuredata
active site & active site & template PDB template PDB
structurestructuredatadata
Hyseq’s Relational Structure DatabaseHyseq’s Relational Structure DatabaseHyseq’s Relational Structure Database