geworkbench remote access to caarray data fan lin ph.d. molecular analysis tools knowledge center...

17
geWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard

Upload: crystal-manning

Post on 24-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

geWorkbench Remote Access to

caArray Data

Fan Lin Ph.D.

Molecular Analysis Tools Knowledge Center

Columbia University

and

The Broad Institute of MIT and Harvard

Page 2: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

What is geWorkbench?

geWorkbench is a desktop application. It is :

• Component-based integrative analysis platform with 50+ components.

• Integration of large number of data domains (sequence, expression, literature, network, structure).

Page 3: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

geWorkbench Installation

For production version (with full support):

• User can visit NCICB gforge site at http://gforge.nci.nih.gov/frs/?group_id=78 to download, install or upgrade geWorkbench.

For pre-released developer’s version:

• Please follow the steps in NCI’s knowledge base entry at https://wiki.nci.nih.gov/x/E5GNAg, titled “Installing the latest development version of geWorkbench”.

Page 4: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

Start geWorkbench

There are two ways to start geWorkbench installed:

• From DOS (Development Version Only)

At DOS prompt, go to geWorkbench directory:

c:> ant clean (clean out previous build)

c:> ant run (create new build)

• GUI (Production Version)

Start Manu -> geWorkbench 1.x

Page 5: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

geWorkbench Anatomy

geWorkbench (http://www.geworkbench.org) desktop app• Component-based integrative analysis platform with 70+ components.• Integration of large number of data domains (sequence, expression,

literature, network, structure).

GenePattern (http://www.genepattern.org) web app• Modular analysis platform for genomic data with 90+ components.• Powerful scientific workflow framework, web-based.

caArray (http://caarray.nci.nih.gov/) web app• Web and programmatically accessible array data management system.• MIAME compliant.

caIntegrator (http://caintegrator-info.nci.nih.gov/) web app• Novel informatics platform supporting translational informatics.• Integration and aggregation of clinical, genomic, analysis data.

Page 6: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

geWorkbench Anatomy

Page 7: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

geWorkbench File Structure

• All the work are done in geWorkbench’s “Workspace”

• Data files and analysis results are organized by “Project”.

• Different projects can be stored under the same “Workspace”.

• Only one “Workspace” can be opened per geWorkbench instance.

Page 8: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

Create New Project

• Right click on Workspace, Click on “New Project”.

• A new project is created with the default name as: “Project”

• Right click on “Project”, Choose “Rename Project”

• Assign a new meaningful project name

• Click on “OK”

Page 9: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

geWorkbench Data Access

There are two ways for geWorkbench to access data:

1. Local files: uploads data from local machine or network drives.

2. Remote Data Access: retrieves data from a database.

In this presentation, we will focus on geWorkbench’s remote data accessing.

Page 10: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

Remote Data Access in geWorkbench

• geWorkbench can retrieve data from certain remote data sources.

• Currently only instances of the NCICB's caArray application are supported.

• Accesses to NCI's caArray’s production and staging servers are pre-configured in geWorkbench.

• User can create an additional access to any other caArray server by adding a new remote access.

Page 11: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

Create Remote Access to a local caArray Server

To add a new remote access, following steps below:1. Select “Open” -> “File” from File Menu2. Select “Remote” (shown in red circle)3. Click on “Add New Resource” (pointed by arrow).4. Enter remote server URL – Port should be JNDI port number (to make sure the port is

accessible, type: telnet <caArray-server-name> xxxxx (port #). 5. For view only public experiments, leave user name/Password blank6. To view public AND private experiments, enter your username/password7. Click on “OK” to save access information8. To open the new remote sever: select it from the list in lower left corner and click on “Go” in

red.

Page 12: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

Data Types Accessed by geWorkbench

• Currently only the data from Affymetrix’s .CHP file, which has been parsed into caArray tables, can be remotely accessed by geWorkbench.

• .CHP file contains the results of Affymetrix experiments. It includes the average signal measures for each probe set as determined by the Affymetrix software and information about which probe sets are called as present, absent or marginal and the p-values for these calls.

Page 13: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

Remote Access Affy .CHP Data

1. After remote server is opened, a list of experiment is showing.

2. Highlight the name of the experiment interested

3. Click on “Show Array” to display array data files associated with this experiment.

4. Highlight the array file names you wish to open

5. Click on “Open”

6. Find & select the quantitation type related to your experiments

7. Click on “OK”

Page 14: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

Annotation File for Affymetrix Array

After array data is loaded, geWorkbench will proceed to upload related annotation file.

• Affymetrix’s .CHP files contain expression values on probe sets.• Affymetrix’s annotation CDF files contain annotation on probe sets. • Uploading annotation file in geWorkbench is not mandatory but will

ensure geWorkbench to be fully functional.• Affymetrix’s annotation CDF files need to be manually downloaded

from Affymetrix website due to license requirement.

Page 15: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

Uploading Annotation File

The screen on the left will first explain to users why annotation file is necessary for Affymetrix array data and where to obtain it.

•The screen on the right indicates where to upload the annotation file.

•You may download the annotation file to a different folder and access that directory at this step.

•Click on “Open”

Page 16: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

Display Array Data in geWorkbench

After a remote access is opened and annotation file selected, the microarray viewer will automatically start. The array data will be displayed in the Visualization Tool Area in the upper right side.

Page 17: GeWorkbench Remote Access to caArray Data Fan Lin Ph.D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and

Need More Information?

NCI is developing an extensive knowledge base to support various NCI molecular

analysis tools. Visit us at NCI’s Molecular Analysis Tool Knowledge center at https://wiki.nci.nih.gov/x/R5GNAg

• For more information on how to use geWorkbench, visit the geWorkbench wiki at https://wiki.nci.nih.gov/x/Hob3Ag

• Have a caArray related question? Find the answers in the caArray FAQ at https://wiki.nci.nih.gov/x/b5GNAg

• New more help? Post in geWorkbench Forum at : https://cabig-kc.nci.nih.gov/Molecular/forums/viewforum.php?f=3 .