geworkbench cagrid teragrid integration

Download geWorkbench caGrid TeraGrid Integration

If you can't read please download the document

Upload: betty

Post on 09-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

geWorkbench caGrid TeraGrid Integration. Scott Oster Ohio State University – Dept. of Biomedical Informatics Christine Hung Columbia University – JCSB/C2B2 caBIG Architecture Face-to-Face Salt Lake City, UT January 2008. Agenda. Overview (5 min) Introduction on TeraGrid Workgroup   - PowerPoint PPT Presentation

TRANSCRIPT

  • geWorkbenchcaGrid TeraGridIntegrationScott OsterOhio State University Dept. of Biomedical InformaticsChristine HungColumbia University JCSB/C2B2

    caBIG Architecture Face-to-FaceSalt Lake City, UT January 2008

  • AgendaOverview (5 min)Introduction on TeraGrid Workgroup Background on geWorkbench and geWorkbench/caGrid/TeraGrid ProjectTechnology (10 min)Steps to establishing geWorkbench/caGrid/TeraGrid InterfaceUse of caGrid Security (GTS, Grid Grouper, Dorian, CDS)Workflow and communications between servicesDemo (5 min)Discussion (5 min)

  • Team MembersgeWorkbench (Columbia University)Christine HungKiran KeshavcaGrid (Ohio State University)Scott OsterStephen LangellacaGrid/TeraGrid (Argonne National Laboratory)Ravi MadduriTeraGrid (Argonne National Laboratory)Stuart MartinManagementAris Floratos (Columbia University)Krishnakant Shanbhag (Argonne National Laboratory)Michael Keller (Booz Allen Hamilton)Patrick McConnell (Duke University)Nancy Wilkins-Diehr (San Diego Supercomputer Center)

  • OverviewPrimary problem to addressLack of infrastructure and operating procedures to support high performance computing needs of caBIGOverarching goalsRegular caGrid services will run as caGrid/TeraGrid gateways services Virtualize TeraGrid resources (both compute and storage)Approach: labor divided between domain and technical tasksUse cases will be drafted to identify the needs of the communityExisting TeraGrid Gateway projects will be surveyed to identify lessons learned and potential technology for reuseDemonstrate approach through working prototypeDocument best practices and develop cookbook

  • TeraGrid OverviewCharacteristics:> 250 teraflops of computing capability >30 petabytes of online and archival data storagehigh-performance networksMechanics:Prospective users request allocation of HPC resources to a review committeeAllocations are granted, and credentials are issuedJobs are run with credentials and resource usage is billed to the allocationTeraGrid is an open scientific discovery infrastructure combining leadership class resources at nine partner sites to create an integrated, persistent computational resource.

  • caGrid Gateway Service OverviewcaGrid service running in the caBIG environment which acts as a bridge or proxy to TeraGrid resources for a subset of caBIG usersshould meet Gold compatibility requirementsCreated for a specific scientific scenario:abstracts away the details of leveraging TeraGrid for performance intensive operationsuses domain-specific operations and data typeshas access to TeraGrid allocationAlleviates the need for caBIG users to:understand the complexities of TeraGrid (or HPC systems)obtain TeraGrid accounts/allocations

  • geWorkbench a Platform for Integrated GenomicsIntegrated genomics analysis applicationSupport for gene expression data, sequences, pathways, and structure50+ visualization and analysis modulesAccess to local and remote data sources and analytical servicesIntegration with biological annotation sourcesDevelopment PlatformOpen sourceJava basedComponent architectureFacilitates customization

  • geWorkbench a Platform for Integrated GenomicsLarge collection of components

    Data parsers: Affy MAS/GCOS (txt and CEL), Genepix, RMA, FASTA, caArray, PDB.

    Data Management: Project folders, marker/sequence/array groups.

    Visualization: Dendrograms, color mosaics, scatter plots, SOM clusters, BLAST results, dot matrices.

    Analyses: Hierarchical clustering, t-test, SVM, ARACNE, MEDUSA. MatrixREDUCE.

    3rd Party components: Cytoscape, GoMiner, GeneWays, GenePattern, MEV.

    Complete list at www.geworkbench.org.

  • geWorkbench a Platform for Integrated Genomicshttp://www.geworkbench.org/

  • geWorkbench Graphical User Interface

  • Clustering

  • caGrid Service

  • TeraGrid Aware caGrid Service

  • Creating the Gateway ServiceManually stage the binary (jar file) on TeraGridTakes in .ser files as inputProduces results also in a .ser fileUsed the RAVi plugin for Introduce to create the gateway servicehttp://www-unix.mcs.anl.gov/~neillm/ravi/Gateway gridFTPs input data and parameters from geWorkbench to TeraGridgeWorkbench passes input to the gateway in geWorkbenchs native format (caDSR compliant)Gateway serializes the input before gridFTPing to TeraGridGateway invokes the staged binaryGateway gridFTPs results back to geWorkbenchGateway deserializes the result fileGateway returns results to geWorkbench in its native formatGateway service is a secured caGrid service which in turn invokes TeraGrid with a caBIG community account

  • Steps to establishing geWorkbench/caGrid/TeraGrid Interface

  • caGrid Security (GTS, Grid Grouper, Dorian, CDS)

    http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main

  • Workflow and Communications Between Services

  • Special ThankscaGrid (Security Services)Scott OsterStephen LangellacaGrid(RAVi Plugin, Gateway Service)Ravi Madduri

  • Demo and Discussions

  • Steps to establishing geWorkbench/caGrid/TeraGrid Interface

  • caGrid Security (GTS, Grid Grouper, Dorian, CDS)

    http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main

  • caGrid Security (GTS, Grid Grouper, Dorian, CDS)

    http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main

  • caGrid Security (GTS, Grid Grouper, Dorian, CDS)