patikamad: putting microarray data into pathway context

Download PATIKAmad: Putting microarray data into pathway context

Post on 06-Jul-2016




3 download

Embed Size (px)



    PATIKAmad: Putting microarray data intopathway context

    Ozgun Babur1, Recep Colak2, Emek Demir3 and Ugur Dogrusoz1, 4

    1 Center for Bioinformatics, Bilkent University, Ankara, Turkey2 Computing Science Department, Simon Fraser University, BC, Canada3 Computational Biology Center, MSKCC, New York, NY, USA4 Tom Sawyer Software, Research Division, Oakland, CA, USA

    High-throughput experiments, most significantly DNA microarrays, provide us with system-scale profiles. Connecting these data with existing biological networks poses a formidable chal-lenge to uncover facts about a cells proteome. Studies and tools with this purpose are limited tonetworks with simple structure, such as proteinprotein interaction graphs, or do not go muchbeyond than simply displaying values on the network. We have built a microarray data analysistool, named PATIKAmad, which can be used to associate microarray data with the pathwaymodels in mechanistic detail, and provides facilities for visualization, clustering, querying, andnavigation of biological graphs related with loaded microarray experiments. PATIKAmad is freelyavailable to noncommercial users as a new module of PATIKAweb at

    Received: August 7, 2007Revised: December 19, 2007Accepted: February 10, 2008

    Keywords:Bioinformatics / Gene expression / Molecular interaction / Pathways

    2196 Proteomics 2008, 8, 21962198

    Pathway databases contain information about possibleinteractions and reactions between molecules in a cell.Usually, this data is created by manually curating biologicalliterature and can span multiple experiments from differenttissues, organisms, and contexts. When taken as an inter-connected network, these interactions and reactions offer acausal model of a cells response to stimuli. For instance, in atypical microarray experiment, relatively small portions ofthis network are differentially active between the control andthe sample, and determining these parts can be extremelyuseful for finding causal explanations for the correlationsobserved in the data.

    There are many microarray specific statistical tools thatnormalize and cluster the data, and provide a variety of visu-alization options using tables and plots. Similarly, manypathway databases and tools for creating, storing, querying,

    and analyzing biological networks exist [1]. But, there areonly a few tools that bring both worlds together. One suchtool is GenMAPP [2], which provides static pathway dia-grams and the ability to map color-coded expression valueson top of entities in the diagram. MAPPFinder is a tool forfinding overrepresented gene ontology (GO) terms in amicroarray experiment, and for searching GenMAPP path-ways for the ones that have genes related with these over-represented GO terms. However, GenMAPP lacks an inte-grated database, thus it is incapable of producing dynamicpathways related with experiments. Cytoscape [3] has a plu-gin that loads tab-delimited array data, and performs severalstatistical analyses. These values can be visualized on Cytos-cape pathways via color coding. Reactome [4] database showsan overview map of the reactions in the database, which islaid out according to the module that the reaction belongs to.They support loading of microarray values and show themon an overview graph by color coding, so that users have anidea about the affected module. None of these tools are,however, capable of connecting microarray data with graphtheoretic queries or any other advanced graph analysisoperations.

    Correspondence: Dr. Ugur Dogrusoz, Center for Bioinformatics,Bilkent University, Ankara 06800, TurkeyE-mail:

    DOI 10.1002/pmic.200700769

    2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

  • Proteomics 2008, 8, 21962198 Systems Biology 2197

    We have built a microarray data integration compo-nent, called PATIKAmad, within PATIKAweb [5], which is aWeb interface to the PATIKA database for querying, visu-alizing, and analyzing biological networks. Its ontologysupports pathway graphs at two levels: bioentity level andmechanistic level. Bioentity-level graphs contain lessdetailed information, such as proteinprotein interactionsor transcriptional regulations between biological entities.Mechanistic-level graphs have state information (e.g., dif-ferent phosphorylated states) and compartment of mole-cules. This level models reactions with its inputs, outputs,and effectors.

    About graphs at the bioentity level or other levels ofsimilar detail, there is a small body of literature regardingmicroarray data integration and coanalysis [6]. The commongoal in almost all these works is to detect regions or pathwayswhere significant microarray data is somehow dense. Thisapproach makes sense when the mechanism of interactionsis not clear in the graph. However, in the case of mechanisticgraphs, interesting paths do not necessarily have to be rich inmicroarray annotation. Many reactions are post-translationalevents and can be part of a differentially active networkwithout any change of expression in their actors. Expressionchanges may be linked through paths, whose activity changeis independent from expressions. In PATIKAmad, we supplya facility to query for paths between significant nodes(according to users significance criteria) in an integratedpathway knowledgebase, in order to compile a graph ofinterest.

    PATIKAmad accepts tab-delimited microarray data filescontaining data values, and external database references.Such files are available from well known public microarraydatabases such as Gene Expression Omnibus, StanfordMicroarray Database, and ArrayExpress. Supported externalreferences are GenBank, Unigene, Entrez Gene, HUGOGene Symbol, Swiss-Prot, OMIM, Entrez RefSeq Protein ID,and Entrez RefSeq Transcript ID. During the processing oftab-delimited files, rows of the array are matched to the ob-jects in the PATIKA database, and a .pmad (PATIKAmicroarray data format) file is created for later use in PATI-

    KAmad. Alternatively, one may load their local model, forinstance in BioPAX ( format, con-taining external references. Then, microarray data withcompatible external references may be loaded and mapped tothis model, facilitating one to work on their proprietary dataindependent of PATIKA database.

    After loading a set of experiments specified in a.pmadfile, the user may set an experiment of interest, or choose toaverage a group of experiments, or compare log-2 ratios oftwo groups. These settings are managed using the DataManagement dialog. This selection determines the value tobe used for each row, directly affecting visualization, andquerying events.

    Expression values, calculated from current experimentsof interest, are visualized on the graph through node color-ing and labeling. Visualization options can be modifiedusing the Visual Settings dialog. Besides the default red/green coloring, the user may customize coloring by assign-ing colors to values. Values in between are shown with colorsin between.

    Rows of the loaded experiment may be visualized in theValues Table, which also provides an interface for queryingthe PATIKA database associated with the selected rows(Fig. 1). The rows displayed may be filtered by keywords,which partially exist in external references. Selected rowsmay be used for retrieving related PATIKA objects from thedatabase, or for running neighborhood or graph-of-interestqueries using related nodes as seed in the database (Thisquery aims at completing missing links and molecules onthese links, no longer than a specified limit, among a set ofmolecules of interest). These queries may run on eitherbioentity or mechanistic levels.

    An experiment-scale graph-of-interest query using theGraph of Interest dialog is also supported. This dialog dis-plays the users significance criteria for the rows, length ofsearch path, and type of graph, on which to execute thequery. This query maps significant rows to significant nodesand searches paths between significant nodes. All paths notlonger than the search length are included in the resultinggraph of interest.

    Figure 1. Part of the Values Table, where experiment rows are filtered with string tnfrsf10 in ascending order, according to the log-ratiovalues. Any number of rows may be selected and used for executing neighborhood or graph-of-interest queries.

    2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

  • 2198 O. Babur et al. Proteomics 2008, 8, 21962198

    Figure 2. Part of a MAP kinase pathway where two clusters are shown using compound nodes. Loaded microarray values are shown withlabels and colors on nodes.

    Clustering is one of the most popular microarray dataanalysis methods. The aim here is to group similarlybehaving genes, thus to have an idea about modules andgenes whose function is not clear. PATIKAmad supports k-means and hierarchical clustering of the loaded experi-ments. Users have the option for scale normalization,standard normalization, and filtering out a certain percent-age of genes that show low variance. Clustering results canbe saved in a .pcaf (PATIKA cluster analysis file) file forlater use. Clusters in loaded clustering results are visualizedon pathways using compound graphs or by highlightingnodes (Fig. 2).

    The authors have declared no conflict of interest.


    [1] Bader, G., Cary, M., Sander, C., Pathguide: A pathwayresource list. Nucleic Acids Res. 2006, 34, D504D506.

    [2] Salomonis, N., Hanspers, K., Zambon, A. C., Vranizan, K. etal., GenMAPP 2: New features and resources for pathwayanalysis. BMC Bioinformatics 2007, 8, 217.

    [3] Shannon, P., Markiel, A., Ozier, O., Baliga, N. et al., Cytoscape:A software environment for integrated models of biomolec-ular interactio