pchm: a bioinformatic resource for high-throughput human mitochondrial proteome searching and...

8
Computers in Biology and Medicine 39 (2009) 689--696 Contents lists available at ScienceDirect Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm PCHM: A bioinformatic resource for high-throughput human mitochondrial proteome searching and comparison Taeho Kim a,b , Euiyong Kim a , Seok-Ju Park c , Hyun Joo a, a Department of Physiology and Integrated Biosystems, College of Medicine, Inje University, Busan 614-735, Republic of Korea b Systems Immunology Laboratory, World Premier International Immunology Frontier Research Center, Osaka University, 3-2 Yamadaoka, Osaka 565-0871, Japan c Department of Internal Medicine, College of Medicine, Inje University, Busan 614-735, Republic of Korea ARTICLE INFO ABSTRACT Article history: Received 24 March 2009 Accepted 13 May 2009 Keywords: PCHM Human mitochondrial proteome Comparative proteomics tool Virtual PMF plot Protein function classification Proteomics Biomedical tool Database Mitochondrial proteins associated with a wide spectrum of human diseases and currently large amounts of tissue or organ specific human mitochondrial proteome datasets are generated. However, high-throughput comparative proteomic methods have yet to be applied to extract subtle differences among mitochondria from different tissues or muscle types. The aim of this work was to provide an integrated way to identify and compare huge mitochondrial protein or peptide mass spectral data sets acquired from expert mito- chondrial proteome or biomarker discovery community. Proteome comparison of human mitochondria (PCHM) is a web-based analysis environment for manual or automatic analysis of individual peptide mass fingerprints alongside a database of proteins and peptides identified in various organs for human mito- chondrial proteins. PCHM provides a suite of graphical tools that allow the virtual plot of peptide mass fingerprinting (PMF) spectra and a fully automatic protein function classification based on gene ontology (GO) annotation system. The new virtual PMF plot is very useful to validate fragmented ion loses of any identical proteins and to remove unwanted foreign ion peaks. Fully automatic protein function classifier provides an easier way to compare the subtle differences of compositionally biased mitochondrial protein functions. PCHM also provides a variety of query algorithms aid in browsing, searching, and accessing complete annotations of data relevant to each mitochondrial protein of interest, which link external databases and users. PCHM will be a useful tool for the systematic and functional characterization of the mitochondrial proteins in relation to human diseases or biological research applications. PCHM can be accessed freely via a web interface http://pchm.inje.ac.kr. © 2009 Elsevier Ltd. All rights reserved. 1. Introduction Almost 1000 mitochondrial proteins are synthesized as cytosolic precursors and are imported from the cytosol into the mitochondria [1]. These mitochondrial proteins cause human diseases associated with a wide spectrum of clinical phenotypes [2]. The mitochondrion is a well-studied small organelle in proteomic analyses and more than 50 different tissues are being targeted for the mitochondrial proteome researches. Consequently, integrative data analysis tools and infrastructure are required to support increased throughput and to efficiently manage huge sets of those mitochondrial proteomes. Recent studies have suggested that comparisons of the distribu- tion of the mitochondrial protein enrichment can be used to infer Corresponding author. Tel.: +82 51 890 6452; fax: +82 51 894 5714. E-mail addresses: [email protected] (T. Kim), [email protected] (E. Kim), [email protected] (S.-J. Park), [email protected] (H. Joo). 0010-4825/$ - see front matter © 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiomed.2009.05.006 different cellular functions [3,4]. These results indicate the existence of important molecular-level differences between human organs or tissues. Since large amount of mitochondrial proteome datasets are increasingly updated, data handling and analysis have become a sig- nificant problem [5]. Existing mitochondria-related databases, including MITOMAP, mtDB, hmtDB, MitoP2, MigDB [6–10], and MitoProteome [11] are or- ganized primarily around mitochondrial gene or protein sequences and related functions, which are publicly available elsewhere. These databases typically store a simple list of accession numbers with minimal biological annotations and some of the databases are not working. The most common problems are lack of high-throughput data handing and data interpretation tools. In practice, the real issue is how to enable experimental biologists to perform comparative mitochondrial proteome studies based on real or further processed data sets [12]. The lack of reliable reference data associated with desired proteome mass spectral analysis is a general problem expe- rienced in numerous proteome studies. It is essential for researchers to calibrate and validate the spectral data with whole batch

Upload: taeho-kim

Post on 26-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Computers in Biology and Medicine 39 (2009) 689 -- 696

Contents lists available at ScienceDirect

Computers in Biology andMedicine

journal homepage: www.e lsev ier .com/ locate /cbm

PCHM:Abioinformatic resource forhigh-throughputhumanmitochondrial proteomesearching and comparison

Taeho Kima,b, Euiyong Kima, Seok-Ju Parkc, Hyun Jooa,∗aDepartment of Physiology and Integrated Biosystems, College of Medicine, Inje University, Busan 614-735, Republic of KoreabSystems Immunology Laboratory, World Premier International Immunology Frontier Research Center, Osaka University, 3-2 Yamadaoka, Osaka 565-0871, JapancDepartment of Internal Medicine, College of Medicine, Inje University, Busan 614-735, Republic of Korea

A R T I C L E I N F O A B S T R A C T

Article history:Received 24 March 2009Accepted 13 May 2009

Keywords:PCHMHuman mitochondrial proteomeComparative proteomics toolVirtual PMF plotProtein function classificationProteomicsBiomedical toolDatabase

Mitochondrial proteins associated with a wide spectrum of human diseases and currently large amounts oftissue or organ specific human mitochondrial proteome datasets are generated. However, high-throughputcomparative proteomic methods have yet to be applied to extract subtle differences among mitochondriafrom different tissues or muscle types. The aim of this work was to provide an integrated way to identifyand compare huge mitochondrial protein or peptide mass spectral data sets acquired from expert mito-chondrial proteome or biomarker discovery community. Proteome comparison of human mitochondria(PCHM) is a web-based analysis environment for manual or automatic analysis of individual peptide massfingerprints alongside a database of proteins and peptides identified in various organs for human mito-chondrial proteins. PCHM provides a suite of graphical tools that allow the virtual plot of peptide massfingerprinting (PMF) spectra and a fully automatic protein function classification based on gene ontology(GO) annotation system. The new virtual PMF plot is very useful to validate fragmented ion loses of anyidentical proteins and to remove unwanted foreign ion peaks. Fully automatic protein function classifierprovides an easier way to compare the subtle differences of compositionally biased mitochondrial proteinfunctions. PCHM also provides a variety of query algorithms aid in browsing, searching, and accessingcomplete annotations of data relevant to each mitochondrial protein of interest, which link externaldatabases and users. PCHM will be a useful tool for the systematic and functional characterization of themitochondrial proteins in relation to human diseases or biological research applications. PCHM can beaccessed freely via a web interface http://pchm.inje.ac.kr.

© 2009 Elsevier Ltd. All rights reserved.

1. Introduction

Almost 1000 mitochondrial proteins are synthesized as cytosolicprecursors and are imported from the cytosol into the mitochondria[1]. These mitochondrial proteins cause human diseases associatedwith a wide spectrum of clinical phenotypes [2]. The mitochondrionis a well-studied small organelle in proteomic analyses and morethan 50 different tissues are being targeted for the mitochondrialproteome researches. Consequently, integrative data analysis toolsand infrastructure are required to support increased throughput andto efficiently manage huge sets of those mitochondrial proteomes.Recent studies have suggested that comparisons of the distribu-tion of the mitochondrial protein enrichment can be used to infer

∗ Corresponding author. Tel.: +82518906452; fax: +82518945714.E-mail addresses: [email protected] (T. Kim), [email protected]

(E. Kim), [email protected] (S.-J. Park), [email protected] (H. Joo).

0010-4825/$ - see front matter © 2009 Elsevier Ltd. All rights reserved.doi:10.1016/j.compbiomed.2009.05.006

different cellular functions [3,4]. These results indicate the existenceof important molecular-level differences between human organs ortissues. Since large amount of mitochondrial proteome datasets areincreasingly updated, data handling and analysis have become a sig-nificant problem [5].

Existing mitochondria-related databases, including MITOMAP,mtDB, hmtDB, MitoP2, MigDB [6–10], and MitoProteome [11] are or-ganized primarily around mitochondrial gene or protein sequencesand related functions, which are publicly available elsewhere. Thesedatabases typically store a simple list of accession numbers withminimal biological annotations and some of the databases are notworking. The most common problems are lack of high-throughputdata handing and data interpretation tools. In practice, the real issueis how to enable experimental biologists to perform comparativemitochondrial proteome studies based on real or further processeddata sets [12]. The lack of reliable reference data associated withdesired proteome mass spectral analysis is a general problem expe-rienced in numerous proteome studies. It is essential for researchersto calibrate and validate the spectral data with whole batch

690 T. Kim et al. / Computers in Biology and Medicine 39 (2009) 689–696

proteomic spectra: many factors during this experimental stage caneasily introduce bias and determining whether the peptide fragmen-tation mass truly represents the identical proteome is more difficult.

In relation to joint quality assessments of MS/MS identifications,present tools, and initiatives are TransProteomicPipeline [13], LabKeyanalysis pipeline on top of TPP [14], PRIDE mass spec data stor-age project [15], and Ensembl mitochondrial proteome [16]. Theseproteomics research platforms enable uniform analysis and facili-tate exchanges of MS/MS data generated from a variety of differ-ent instruments, and assigned peptides using a variety of differentdatabase search programs.

However, the practical problem is that the sparsely distributedlocal (or public) databases do not store multiple schema informa-tion that needs to be interpreted by the database end users. An au-tomated and object-specific database customizing will be a usefulsolution for the collaborative construction and multi-staged analy-sis of any interesting mitochondrial proteome datasets, since it isquite laborious and difficult to access all the required interrelateddatabases within a limited time. Therefore, the first aim of this workwas to provide an integrated way to identify and compare huge mi-tochondrial protein or peptide mass spectral data sets acquired fromexpert tandem (MS/MS) instrument analyses. For some proteomicsapplications, especially those involving mass spectrometry data im-ages, large numbers of raw spectrum images are needed in a uniformcomputer-readable format. To cope with this problem, we firstlydesigned a module for plotting of the virtual peptide mass finger-printing (PMF) spectrum, named VirtPMFS_PLOT, to mimic a realpeptide mass spectrum. A large (several thousand kilobytes) spec-trum is compressed into a 1–10KB scalable vector graphics (SVG) for-mat. These new functions are especially advantageous to handle themassive proteome MS/MS data (e.g., calibration or sample handling)and to validate peak abundance in the virtual MS spectra (similar tohigh sequence coverage analysis in PMF). For a collisionally activateddissociation (CAD) fragmentation process, the fragmented ion lossesthat seriously affect the overall accuracy of data coverage are oftendetrimental for peptide identification based on searching in proteinsequence databases. In addition, this is intended to alert experimen-talists to the need for caution in every step of an experiment, fromsample collection to laboratory analysis and data interpretation.

Proteome comparison of human mitochondria (PCHM) also pro-vides a variety of query algorithms including a `tiny BLAST' for pep-tide motif search. The robust query algorithms aid in browsing,searching and accessing complete annotations of data relevant toeach mitochondrial protein of interest, which link external databasesand users. This is the secondmajor update since the release of the ini-tial version 1.0. The oldest version was presented in part at GIW2005(Poster Abstract P017, Yokohama, Japan) and the interface has beenentirely redesigned for this complete release. An interesting featureof PCHM is that it contains strong virtual MS/MS plot algorithms tosupport all types of mass spectra. An automatic protein function clas-sifier engine was also embedded in the latest version (version 1.2).The earlier version of the database may not be compatible with thecurrent user interface: only the most recent version for different tis-sues or organ types can be accessed through the database archives.

2. Methods

2.1. Database implementation

Currently, the database contains a total of 11,515 human mi-tochondrial peptide sequences, which were identified by SEQUEST(9673 peptides identified) and the Sonar MS/MS algorithm (1842peptides). They were derived from public sequence databases andpeptide mass analyses of highly purified human mitochondria[17–20]. Mitochondrial proteome datasets from hair shaft cell were

recently updated. Each sequence is annotated in an automated pro-cess with data excerpted from external databases, including geneinformation from NCBI GenBank [21], protein structural data andmass properties from UniprotKB [22] and ExPASy [23]; gene locusinformation in human chromosomes and genetic disorders from En-sembl, human mitochondrial inheritance information from OnlineMendelian Inheritance in Man (OMIM) [24], and the UCSC genomebrowser [25]. For proteins with unknown functions, PCHM assignsa primary functional category to each protein, based on the high-level terms in the gene ontology (GO) [26]. In this latest version1.2, a new schematic design was added to support a fully annotatedfunctional and multifunctional classification of the mitochondrialproteins: since many proteins are multifunctional, and consequentlyassignment is needed. PCHM also provides a variety of proteomics-related query methods, such as GenBank accession number, UniPro-tKB/SwissProt accession number, fragments of protein or peptidesequences, definitions of genes, molecular weight-to-charge (m/z)ratios and gene loci in human chromosomes.

2.2. PCHM architecture and graphical user interface

PCHM was developed and annotated in the Perl programminglanguage. All data are stored in a MySQL relational database. A user-friendly interface was developed in hyper text markup language(HTML) format to permit general and advanced queries. Fig. 1 showsa schematic diagram of the integrated database design. The queryresults, along with the protein and peptide annotations, are pre-sented as HTML pages. Active Flash MX scripts are used to createVirtPMFS_PLOT module for the automized plotting of virtual peptidemass fingerprinting spectrum, based on the individual set of MS/MSfragmented ion mass values (the graphical resolution of m/z valuewas set to 0.0001a.u.). The advantage of FlashAction Script is that itprovides scalable vector graphics, which once created can be easilyresized to produce high-resolution mass spectra. A Perl script wasused as a linkage to create a generalized parsing file {*.pl}, to calcu-late each peptide mass and to excerpt gene map locus informationfrom the Ensembl database along with information about relatedproteins from SwissProt. The small parser script inserts the matchedpeptide mass values into the Flash ActionScript. A Perl script waswritten to save a {*.pchm} file containing themolecular weight (m/z),peptide length, fragment position, protein length, and an account oftotal peptides from the database. Subsequently, a Flash ActionScriptreceives the data to create a peak graph of the mass spectrum fromthe {*.pchm} file.

2.3. Query methods used to access

PCHM supports Boolean-type query operators and performs com-bined searches, for example, by selecting multiple queries selectedfrom the search menu (Fig. 2). The `general search' accepts typicalsearch terms, such as GenBank ID, SwissProt ID, PDB ID, and key-words. In the advanced search mode, `fragmented peptide search'allows the user to identify anonymous proteins in PCHM using apartial protein or peptide sequence input. This option is very usefulfor identifying meaningful patterns (i.e., motifs) from the peptide se-quence pool. For example, the defining sequence of a pore-formingunit of a potassium ion channel [27], TXGXG, may be taken as a se-quence input: T%G%G or T_G_G. The input symbols, percent `%' andunder bar `_', indicate any amino acid, but the search results are com-pletely different. As we designed the percent operator, `%', to performa broader search between two neighboring sequences, the resultingpattern includes all occurrences of natural amino acid gaps betweenthe % linkage, such as insertions or deletions. Therefore, the motifsearch using the `%' operator results in `T-(X)n-G-(X)n-G' sequencesas the most characteristic pattern, and thus the following sequences

T. Kim et al. / Computers in Biology and Medicine 39 (2009) 689–696 691

Fig. 1. PCHM schema. PCHM relational structure includes 11 subcategories, storing complete set of human mitochondrial proteome data linked to each other.

Fig. 2. PCHM query for mitochondrial proteome searching and mining. The search algorithm has six levels of major choice-points (total 24 queries).

will be found in the search: TVGLG, TWATGGYG, TIGHVDHG, and,TIGTG, etc. The fundamental idea behind this `%' function is to searchand discover any evolutionarily conserved `protein motif patterns'in a set of mitochondrial proteins, regardless of the length (n) vari-ation. However, each under bar, `_', signifies either a single aminoacid or a gap. The `molecular weight search' option works in a stan-dalone manner and can perform combined searches with general or

fragmented peptide queries. The query result presents a sort list ofproteins matching the query criteria, along with appropriate sum-mary information.

The search interface of PCHM has been significantly improved incomparison with our earlier version. We also incorporated chargestate parameters for each peptide as a supplemental query crite-rion for low-resolution tandem mass spectra. This is a useful option

692 T. Kim et al. / Computers in Biology and Medicine 39 (2009) 689–696

for the rapid and robust identification of the peptide samples be-ing characterized. In some cases, the main focus of a mitochondrialproteomics study is to characterize posttranslational modifications,which may have biological significance [28]. It is important to con-sider powerful error-tolerant peptide mass searching environments,as each type of protein modification is accompanied by a definitechange inmass [29]. Phosphorylation (+80mol wt change) and acety-lation (+42mol wt change) on a specific peptide are good examplesof themost frequent posttranslational modifications. We believe thatthe newly added peptide modification query offers a very useful wayto eliminate unwanted artifacts acquired during the sample prepa-rations stages or mass spectrometry analyses. PCHM takes a list ofmodifications below the advanced search menu. Alternatively, twodifferent types of direct numerical input (% coverage or Dalton) op-tions were also provided for a number of unsuspected chemical andposttranslational modifications, but this requires careful interpreta-tion. The error-tolerant search also looks for sequence variants, suchas single nucleotide polymorphisms (SNPs) or non-specific cleavageproducts in the mitochondrial proteome [30,31].

3. Results and discussion

3.1. Result summary

Fig. 3A shows the developed GUI query forms for browsingand searching mass spectrometric data with two different querymethods of the defined annotations in the PCHM schema. Whenthe search is complete, the results are retrieved automatically fromthe PCHM main server and displayed in HTML and Flash format(Fig. 3B). The resultant summary presents a sorted list of peptidesmatching the query criteria, along with their corresponding num-bers and their database accession numbers, including PCHM ACCESSID, tissue/organ type, GenBank ID, name (keyword), OMIM ID, Swis-sProt ID, PDB ID, protein function, chromosomal gene map locus

Fig. 3. (A) Parts of query input pages from PCHM. Users can query mitochondrial proteomes by tissue/organ or function or protein (peptide) mass or name. Fragmentedpeptide search enables user search for proteomes by sequence input. The mass values for posttranslational modification were taken from Delta Mass version 2.1.(http://www.abrf.org/index.cfm/dm.home) and (B) a sort list of query result for OXPHOS function.

(mitochondrial protein origin), protein sequence information (totalamino acid number, mol wt, FASTA format), virtual peptide massfingerprinting spectrum, and a tabular view of the related MS/MSvalues (Sonar and SEQUEST, or overlapping). Clicking on the screenhyperlinks offers more detailed information about the individualreports. From the resulting page, a separated mass spectrum canbe viewed directly and the users can easily identify the individualpeak attributes; the peaks represent the original searching results ofSEQUEST [32] (blue bar), Sonar MS/MS [33] (green bar) or both (redbar). Fig. 4 demonstrates the resultant information for a specific pro-tein (i.e., ATP synthase, beta chain) queried on a peptide mass input.Each peak on the virtual peptide mass fingerprinting spectrum islinked to the subcategory databases containing a list of each peptideMS/MS value. Each peak is also labeled with the peptide sequenceand peptidematch scores, Xcorr and deltaCn (for SEQUEST), and Epep(for Sonar MS/MS). The zoom function enlarges the peak display andprovides more detailed peak information. When a peak is chosen onthe screen, the individual peak assignments are displayed by a FlashActionScript that presents peak attributes, such as protein size, indi-vidual peptide length, location in protein, and mass-to-charge ratio(m/z), simultaneously (Fig. 4).

3.2. Automatic protein function classification

A fully automatic comparator has been designed to compareand classify mitochondrial protein functions through the differenthuman tissue or organ types. PerlScript was used to extract proteinfunctions from the entire mitochondrial proteomes and was alsoused to perform statistical calculations for the filtering of the indi-vidual function. PHP was used to create the interactive web inter-faces and to create graphs images on the PCHM. The proteins in theentire database can be rapidly classified based on their functions:the used standard definitions and term relationships are the same asin the molecular function ontology of GO term. A click on the menu

T. Kim et al. / Computers in Biology and Medicine 39 (2009) 689–696 693

Fig. 4. A graphical representation of PCHM search result based on a peptide molecular weight input query (mol wt: 899.03, tolerance level, ± 0.01%). The user can retrieveprotein sequences with FASTA format. The detailed information of each peptide mass peak is also displayed as a table (bottom).

below the resulting chart produces more detailed and subcatego-rized GO functional annotations. The primary function name andnumbers in the query menu (e.g., GO:0005198 for structural molec-ular activity) represent information about the molecular GO classterm. The comparison results between the chosen sets for molecu-lar function are demonstrated (Fig. 5). The resulting mitochondrialproteome datasets are highlighted with different colors according totheir human tissue origins.

3.3. Case study

The key feature of this automatic proteome functional classi-fier enables the rapid prediction of several well-known functionaldifferences between mitochondria, indicating the tissue- or organ-dependant resolution of mitochondrial proteins. Here, we present aspecific case result to display the power of PCHM automatic com-parator. Some statistically significant mitochondrial protein matcheswere identified between two (or three) proteome datasets from hu-man heart, T-leukemia, and hair shaft cells. Query results with the in-dividual functions showed that, in total, 35% of proteins from humanmitochondrial proteomes from different tissue sources are nearlyidentical (Fig. 5). A previous quantitative proteomic comparison ofrat mitochondria from muscle, heart and liver showed very simi-lar results [34]. The best matching proteins are mostly related to

the essential functions such as oxidative phosphorylation (OXPHOS),signaling, protein destination, and synthesis functions. This is not asurprising result, as these proteins are essential components of basicbiological functions [35]. Of the 650 mitochondrial proteins, how-ever, 280 differed significantly in their contents (43% of the totalproteins) between heart and T-leukemia samples. This suggests thatfunctional and biochemical associations of mitochondrial proteinswith other cellular compartments are not unique, indicating tissue-specific regulation of the mitochondrial proteins related to theirdifferent functions. More specifically, about 71 of 680 T-leukemiacell mitochondrial proteins identified here are related to a `proteindestination', whereas heart mitochondria contained only half thenumber of proteins with this function (32 of 614 proteins); only 14proteins showed identical matches. Protein destination function in-volves protein complex formation, modification, targeting and sta-bilization. Most of the non-identical proteins, showing very specificcharacteristics, were related to the amino acid and lipid metabolicpathway proteins, cell death and defense, and structural proteins.The interesting fact is that all three mitochondria from different tis-sues do not have any chaperon regulator and translation regulator;although they contain a lot of molecular chaperons such as Hsp70,mitochondrial ribosomal subunits or BAG family proteins act as achaperon regulator. It was known that Mitochondrial BAG-1 is a po-tent regulator of the Hsp70 chaperone [36].

694 T. Kim et al. / Computers in Biology and Medicine 39 (2009) 689–696

Fig. 5. Query and display of automatic comparator for the inspection of cellular heterogeneity of the human mitochondrial proteins (for example, heart, T-leukemia andhair shaft cell): (A) the total integrity of mitochondrial proteome localization or regulation may vary considerably in different organellar compartments and (B) when thedesired GO function is chosen, the summary chart is generated.

T. Kim et al. / Computers in Biology and Medicine 39 (2009) 689–696 695

4. Conclusions

PCHM provides integration of data from high-throughput mito-chondrial proteome MS/MS analysis, and fully annotated genomicand protein sequence links. This database system provides twopowerful tools for the generation of virtual PMF spectrum and foran automatic protein function classification. In addition, the robustquerying tools also help users to carry out high-throughput mito-chondrial proteomic researches. The graphical generation of virtualPMF spectrum is especially powerful for rapid identification andcomparison of de novo peptide candidates. In this case, the peptidecan be identified more easily from the reference spectrum, avoidingfalse positives. PCHM classifies proteins based on gene ontologyfor data integration and query processing. The advanced searchoption narrows the search criteria or adds flags to any combinato-rial queries to limit the results. Peptide and protein identification,posttranslational modification analyses and charge error correc-tions are all possible. The query provides a tool for identificationand characterization of peptides with unexpected modifications(e.g., posttranslational modifications or mutations) by tandem massspectrometry. A comprehensive list of chemical and posttransla-tional peptide modifications are provided together with a residuemodification matrix table, to avoid the loss of discrimination thatwould occur if all the permutations of large numbers of modifi-cations in combination were possible. This new mode was codedas an extension to the old version. A new schema design was alsoadded to support a fully annotated functional and multifunctionalclassification of the mitochondrial proteins: since many proteins aremultifunctional, and consequently assignment is needed.

The PCHM database is still at the developmental stage and severalupdates are anticipated in the near future. A large part of the bulkMS/MS spectra from the public data repositories or from the mito-chondrial proteome research communities will be added. AlthoughPCHM currently aims for building an integrated database for humanmitochondria proteome sets, it is possible to connect PCHM for othercellular (or tissue specific) proteome databases that can be used tofurther study the biological pathways associated with human dis-eases or molecular functions. Regarding the tandem mass (MS/MS)data handling, PCHM will adopt open XML file formats to facilitateexchanges of MS/MS data generated from a variety of different in-struments and other proteomics community [15,37,38].

Overall, PCHM will be a useful tool for the inspection of cellularheterogeneity of the human mitochondrial proteomes, allowing usto better understand how tissue-specific heterogeneities exist andwhich factors may affect physiological and pathological conditions.Moreover, it is much easier to make a judgment about the quality ofthe matched peaks from similar spectrum patterns than from simpletext reports.

Conflict of interest statement

None declared.

Acknowledgment

This project is funded in part from Inje University Core ResearchProgram (Project no. 2008-12-29).

References

[1] D.C. Chan, Mitochondrial dynamics in disease, N. Engl. J. Med 356 (17) (2007)1707–1709.

[2] J.A. MacKenzie, R.M. Payne, Mitochondrial protein import and human healthand disease, Biochim. Biophys. Acta 1772 (5) (2007) 509–523.

[3] A. Federico, L. Manneschi, E. Paolini, Biochemical difference betweenintermyofibrillar and subsarcolemmal mitochondria from human muscle, J.Inherit. Metab. Dis. 10 (Suppl. 2) (1987) 242–246.

[4] B. Venugopal, K.T. Wong, Y.I. Goto, M.B. Bhattacharjee, Mitochondrial disorder,diabetes mellitus, and findings in three muscles, including the heart, Ultrastruct.Pathol. 30 (3) (2006) 135–141.

[5] Y.M. Park, J.S. Yoo, K.H. Kwon, Proteomics and HUPO: a great future ahead, J.Proteome Res. 6 (10) (2007) 3869.

[6] M.C. Brandon, M.T. Lott, K.C. Nguyen, S. Spolim, S.B. Navathe, P. Baldi, et al.,MITOMAP: a human mitochondrial genome database—2004 update, NucleicAcids Res. 33 (Database issue) (2005) D611–D613.

[7] M. Ingman, U. Gyllensten, mtDB: human mitochondrial genome database, aresource for population genetics and medical sciences, Nucleic Acids Res. 34(Database issue) (2006) D749–D751.

[8] M. Attimonelli, M. Accetturo, M. Santamaria, D. Lascaro, G. Scioscia, G.Pappada, et al., HmtDB a human mitochondrial genomic resource based onvariability studies supporting population genetics and biomedical research, BMCBioinformatics 6 (Suppl. 4) (2005) S4.

[9] H. Prokisch, C. Andreoli, U. Ahting, K. Heiss, A. Ruepp, C. Scharfe, et al., MitoP2:the mitochondrial proteome database—now including mouse data, Nucleic AcidsRes. 34 (Database issue) (2006) D705–D711.

[10] MigDB web site 〈http://www-lecb.ncifcrf.gov/∼zullo/migDB/〉.[11] D. Cotter, P. Guda, E. Fahy, S. Subramaniam, MitoProteome: mitochondrial

protein sequence database and annotation system, Nucleic Acids Res. 32(Database issue) (2004) D463–D467.

[12] J. Hu, K.R. Coombes, J.S. Morris, K.A. Baggerly, The importance of experimentaldesign in proteomic mass spectrometry experiments: some cautionary tales,Brief. Funct. Genomic Proteomics 3 (4) (2005) 322–331.

[13] The Seattle Proteome Center (SPC), Proteomics work pipeline web site〈http://tools.proteomecenter.org/wiki/〉.

[14] LabKey Software Foundation web site 〈https://www.labkey.org〉.[15] P. Jones, R.G. Cote, L. Martens, A.F. Quinn, C.F. Taylor, W. Derache, et al., PRIDE:

a public repository of protein and peptide identifications for the proteomicscommunity, Nucleic Acids Res. 34 (Database issue) (2006) D659–D663.

[16] P. Flicek, B.L. Aken, K. Beal, B. Ballester, M. Caccamo, Y. Chen, et al., Ensembl2008, Nucleic Acids Res. 36 (Database issue) (2008) D707–D714.

[17] S.W. Taylor, E. Fahy, B. Zhang, G.M. Glenn, D.E. Warnock, S. Wiley, et al.,Characterization of the human heart mitochondrial proteome, Nat. Biotechnol.21 (3) (2003) 281–286.

[18] K. Rezaul, L. Wu, V. Mayya, S.I. Hwang, D. Han, A systematic characterizationof mitochondrial proteome from human T leukemia cells, Mol. Cell. Proteomics4 (2) (2005) 169–181.

[19] Y.J. Lee, R.H. Rice, Y.M. Lee, Proteome analysis of human hair shaft: fromprotein identification to posttranslational modification, Mol. Cell. Proteomics 5(5) (2006) 789–800.

[20] S.P. Gaucher, S.W. Taylor, E. Fahy, B. Zhang, D.E. Warnock, S.S. Ghosh,et al., Expanded coverage of the human heart mitochondrial proteomeusing multidimensional liquid chromatography coupled with tandem massspectrometry, J. Proteome Res. 3 (3) (2004) 495–505.

[21] National Center for Biotechnology Information web site 〈http://www.ncbi.nlm.nih.gov〉.

[22] UniProt Consortium, The universal protein resource (UniProt), Nucleic AcidsRes. 36 (Database issue) (2008) D190–D195.

[23] ExPASy server web site 〈http://www.expasy.org/〉.[24] V.A. McKusick, Mendelian Inheritance in Man. A Catalog of Human Genes and

Genetic Disorders, 12th ed., Johns Hopkins University Press, Baltimore, 1998.[25] A.S. Hinrichs, D. Karolchik, R. Baertsch, G.P. Barber, G. Bejerano, H. Clawson,

et al., The UCSC Genome Browser Database: update 2006, Nucleic Acids Res.34 (Database issue) (2006) D590–D598.

[26] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, et al., Geneontology: tool for the unification of biology. The Gene Ontology Consortium,Nat. Genet. 25 (1) (2000) 25–29.

[27] C.E. Capener, I.H. Shrivastava, K.M. Ranatunga, L.R. Forrest, G.R. Smith, M.S.Sansom, Homology modeling and molecular dynamics simulation studies of aninward rectifier potassium channel, Biophys. J. 78 (6) (2000) 2929–2942.

[28] R.G. Krishna, F. Wold, Proteins—Analysis and Design, first ed., Academic Press,San Diego, 1998.

[29] S. Purvine, N. Kolker, E. Kolker, Spectral quality assessment for high-throughputtandem mass spectrometry proteomics, OMICS 8 (3) (2004) 255–265.

[30] J. Salmi, R. Moulder, J.J. Filen, O.S. Nevalainen, T.A. Nyman, R. Lahesmaa, et al.,Quality classification of tandem mass spectrometry data, Bioinformatics 22 (4)(2006) 400–406.

[31] P. Wang, M. Dai, W. Xuan, R.C. McEachin, A.U. Jackson, L.J. Scott, et al., SNPFunction Portal: a web database for exploring the function implication of SNPalleles, Bioinformatics 22 (14) (2006) e523–e529.

[32] M.J. MacCoss, C.C. Wu, J.R. Yates 3rd, Probability-based validation of proteinidentifications using a modified SEQUEST algorithm, Anal. Chem. 74 (21) (2002)5593–5599.

[33] A. Keller, S. Purvine, A.I. Nesvizhskii, S. Stolyar, D.R. Goodlett, E. Kolker,Experimental protein mixture for validating tandem mass spectral analysis,OMICS 6 (2) (2002) 207–212.

[34] F. Forner, L.J. Foster, S. Campanaro, G. Valle, M. Mann, Quantitative proteomiccomparison of rat mitochondria from muscle, heart, and liver, Mol. Cell.Proteomics 5 (4) (2006) 608–619.

[35] J.A. Smeitink, M. Zeviani, D.M. Turnbull, H.T. Jacobs, Mitochondrial medicine: ametabolic perspective on the pathology of oxidative phosphorylation disorders,Cell Metab. 3 (1) (2006) 9–13.

[36] J. Hohfeld, S. Jentsch, GrpE-like regulation of the hsc70 chaperone by the anti-apoptotic protein BAG-1, EMBO J. 16 (20) (1997) 6209–6216.

696 T. Kim et al. / Computers in Biology and Medicine 39 (2009) 689–696

[37] R. Craig, J.P. Cortens, R.C. Beavis, Open source system for analyzing, validatingand storing protein identification data, J. Proteome Res. 3 (6) (2004)1234–1242.

[38] A. Keller, J. Eng, N. Zhang, X.J. Li, R. Aebersold, A uniform proteomics MS/MSanalysis platform utilizing open XML file formats, Mol. Syst. Biol. 1 2005msb4100024-E4100021–msb4100024-E4100028.

Taeho Kim received the Ph.D. degree in physiology and bioinformatics tool de-sign from Inje University College of Medicine. His major research interests includeBiomedical Engineering and Systems Biology. His current work is to design an effi-cient tool for massive biodata mining at World Premier International ImmunologyFrontier Research Center, Osaka University, Japan.

Euiyong Kim is a professor in the Department of Physiology and Integrated Biosys-tems at Inje University, Busan, South Korea. His current research interests includePhysiome, Protein Bioinformatics, and Computer-Aided Electro-patch analysis tools.

Seok-Ju Park is a professor in the Department of Internal Medicine at Inje Universityof South Korea. His major research interests include the Construction of Computer-Aided Scientific Reasoning and Decision making Tools for general health care anddiagnostics.

Hyun Joo is a professor in the Department of Physiology and Integrated Biosys-tems at Inje University of South Korea. He received his Ph.D. from Seoul NationalUniversity, Seoul, Korea. He spent 3 years as a National Research Foundation post-doctoral research fellow at Caltech's Chemistry and Chemical Engineering Divisionin California. His research primarily focuses on the Gene and Protein Database Min-ing, Biosystems and Biomedical Informatics, In Silico Molecular Evolution, ProteinBioinformatics Standardization, and Mitochondrial Proteome Searching.