cancer biomedical informatics gridâ„¢ (cabig tm )

Click here to load reader

Download Cancer Biomedical Informatics Gridâ„¢ (caBIG TM )

Post on 20-Jan-2016




0 download

Embed Size (px)


The Cancer Biomedical Informatics Grid™ (caBIG™) 2006 CODATA Conference Beijing, China Mary Jo Deering , Ph.D. Director, Informatics Dissemination NCI Center for Bioinformatics. Cancer Biomedical Informatics Grid™ (caBIG TM ). - PowerPoint PPT Presentation


caBIG™ Industry Partners Meeting2006 CODATA Conference
Common, widely distributed
Collection of interoperable applications developed to common standards
Raw published cancer research data is available for mining and integration
This is the heart of what caBIG is setting out to do. Essentially it’s an effort to bring different communities together through IT. Widely distributed infrastructure means the community can share common infrastructure, so each individual institution can focus on its priority innovations. Picture a common infrastructure that speaks common dialect with applications that sit on top so that people can ‘consume of the info’ with ultimate goal of ‘hot and cold running information’ available……. clinical data, expression array data, SNParray data, proteomics data…... Integrated for cross-cutting analysis.
National Cancer Institute
caBIG™ infrastructure
The core principles of caBIG
Open Source: Standards and software developed within the caBIG™ initiative are licensed as open source (including source code) and are thus available for implementation or review without a licensing fee. The caBIG™ open-source license allows vendors to incorporate caBIG™ standards or code into commercial products.
Open Access: The standards and software developed within caBIG™ are freely available for use by health care organizations, biomedical researchers and vendors who support this community. Indeed, anyone can access the tools and data available through caBIG™.
Open Development: caBIG™ is committed to open communication and broad collaboration. Planning for caBIG™ standards and software development is carried out in open meetings, and comments are solicited from all interested participants. Development projects are assigned to particular participants but are carried out iteratively, with multiple opportunities for review and comment by the caBIG™ community at large.
Federation: caBIG™ software and standards enable local sites, such as cancer centers, to use resources contributed by others and to share computing or data resources with the cancer community at large. Federation implies that these individual resources remain under the control of the local sites but are aggregated for use by all participants as an integrated research tool.
National Cancer Institute
Because of this interoperable informatics core, all these research communities can be linked together.
It is absolutely crucial that everyone understands that this informatics core is NOT just for cancer. The foundational informatics infrastructure that caBIG is developing is widely applicable to biomedical purposes. The core components can be used to build systems, tools, and data sets in any area. Just as pipes can carry anything, and just as anything can plug into electrical outlets, the standards-based caBIG infrastructure will serve a multitude of purposes.
National Cancer Institute
caBIG™ Operational Structure
National Cancer Institute
disconnected, often paper-based, and slow to get
organized, enroll patients, and report out results.
caBIG™ Tools:
Cancer Central Clinical Patient Registry (C3PR)
caBIG™ Adverse Events Reporting (caAERS)
caMatch – PHR/eligibility matching
Consistent format, language and structure throughout the trial to enable
reliability and comparability of results
More accessibility to larger population of patients for faster enrollment
Better molecular sub-grouping of patients to improve clinical outcomes
Real-time monitoring of clinical impact to enable revised entry criteria
for additional patients
More efficient and systematized administrative process to reduce costs
caAERS is the caBIG Adverse Event Reporting System. It will provide the ability for Cancer Centers to record adverse events as they occur, report Serious Adverse Events expeditiously and communicate the event to all regulatory, sponsors and healthcare providers required by law.
The Patient Study Calendar will provide cancer centers the ability to chart the course of care for a patient enrolled in a clinical trial, complete with lab visits, radiology, billing notifications, doctor visits, infusion/injections, etc.
The Lab Data Hub will provide the ability for multiple lab systems, all speaking different languages, to send data to a Clinical Trials Management System (CTMS) and a study calendar for continuity of care and on-time data to guide care.
National Cancer Institute
Population Sciences and Cancer Control
These are some of the Integrated Cancer Research or ICR Special Interest Groups.
ICR is the most diverse workspace because so many different scientific areas are supported. Challenge:
A growing volume of increasingly complex data,
but no system in place to collect, aggregate,
analyze and distribute.
sample caBIG™ Tools:
genome data presentation: webGenome
Access to and integrated analysis of data from divergent sources
Increased efficiency in analyzing and visualizing results
Accelerated discovery of molecular signatures
National Cancer Institute
Cancer Array Informatics (caARRAY) The NCICB's cancer array informatics project, caArray, consists of a microarray database and microarray data analysis and visualization tools. CaArray is an open source project, and the source code and APIs are available in the caArray Informatics page. The goals of the project are to make microarray data publicly available, and to develop and bring together open source tools to analyze these data.
National Cancer Institute
Context A CMAP user may wish to see data that is directly relevant to a particular kind of cancer.
A target is a molecule that holds special diagnostic or therapeutic interest for cancer research
Anomaly An anomaly is a deviation in the structure or expression of a target.
profile of a kind of cancer is a set of anomalies that characterize that cancer, distinguishing it from other kinds of cancer and from normal states.
An agent is a drug or other intervention that is effective in the presence of one or more specific targets.
A clinical trial is linked a context and to one or more agents. A trial is not linked directly to any target.
National Cancer Institute
The data portal provides access to data from 1177 prostate cancer subjects and 1105 controls. This dataset, including the genotype information (~720 million records) from these samples, is a very exciting data collection for whole genome analysis of cancer samples.
National Cancer Institute
caTISSUE Core (WU) – Core specimen handling and tracking functions
caTISSUE Clinical Annotation Engine (UPMC) - Annotation of specimens with clinical data
caTIES (UPMC) - Text extraction and de-identification of surgical pathology reports
Biospecimen and pathology tools to enable specimen and information sharing
A basic system that can be rapidly deployed to institutions with little or no specimen banking informatics
Modular design so that additional functionality (biological annotations, billing and financial modules, etc.) can be added without architectural redesign
Clinical annotation - key information on specimens that is not generated by the tissue bank
Source systems : Anatomical Pathology, Clinical Pathology, Cancer Registries, Clinical Trials Management Systems
Extraction of coded information from free text surgical pathology reports
Coded information includes Histologic Type, Stage and Size, Prognostic Factors (LN, ALI, PNI, Margins), Immunophenotype, Molecular Markers
Office of Biorepositories and Biospecimen Research (OBBR) Established 2005 Dr Carolyn Compton
caIMAGE allows researchers to submit and retrieve images and annotations.
Images are streamed for efficient access.
Researchers can search images based on tissue and diagnosis and experiment information.
Use of common terminology originating from the NCI Enterprise Vocabulary Server (EVS).
The In Vivo Imaging Workspace was just launched in October 2005.
Its purpose is to advance the field of imaging and, by extension, all clinical trials and research, by identifying new ways to extract and share meaning from In vivo imaging data and thereby improve outcomes for patients with cancer and enhance efforts in early diagnosis and prevention.
Existing systems provide no way to share or archive images to validate or facilitate
diagnostics or prognostics.
Imaging Archives
molecular and clinical data. Improved clinical decision support – more accurate, objective and reproducible
National Cancer Institute
We have built a repository for cancer images. This allows us to communicate histopathology and radiographic images that are part of cancer models.
National Cancer Institute
Key is to create tools for sharing information
Extensible infrastructure
Expandable and modular software to plug into existing systems so current development efforts are not wasted
Ensures partnerships
Compatibility Guidelines at
But it is very important that caBIG compatibility is understood NOT as something NCICB just created itself. Compatibility rests on international standards, vocabularies that are firmly established across much of the scientific community, and IT industry “best practices” for technology development.
caBIG brings them together.
the parts or equipment of another system
Interoperability is a relatively new concept, centered on the idea that one system can use parts of another.
If you focus on computer interoperability, the IEEE definition includes two concepts.
Functional interoperability – the ability to exchange information – and semantic interoperability – the ability to use the information that has been exchanged.
As an aside, standards are frequently thought of as focusing primarily on functional interoperability, whereas the critical sine qua non for health care is semantic interoperability - if you don’t know exactly what the information you received means, having received it is useless, or worse, maybe even dangerous!
National Cancer Institute
UML Modeling Tool (any with XMI export)
Semantic Connector (concept binding utility)
UML Loader (model registration in caDSR)
Codegen (middleware code generator)
caCORE SDK generates a caBIG-Silver compliant system
National Cancer Institute
National Cancer Institute
Grid Technology in caBIGTM
What is a ‘Grid’
“A Grid is a system that coordinates resources that are not subject to centralized control using standard, open, general-purpose protocols and interfaces to deliver nontrivial qualities of service.” - Ian Foster Grid Today, July 20, 2002
Grid Technology supplies two useful components to a network of computers:
Advertising: Inform the network about the capabilities of new systems
Discovery: Allow users to find resources that meet their needs.
The caGrid project is the ‘Grid in caBIGTM’; the actual infrastructure that data and analytical services will use to interoperate.
The current caGrid is version 0.5; caGrid 1.0 in December.
The combination of data and analytical service nodes in caBIGTM produced a design that utilizes a variety of standard Grid technologies including the Globus Toolkit and OGSA-DAI, DQP, GRAM, etc.
National Cancer Institute
Test bed Infrastructure
NCICB-Rockville, MD
UPMC-Pittsburgh, PA
NCI Core Genotyping Facility-Gaithersburg, MD
caMOD II: Cancer Model Organism Database
NCI Mouse Models of Human Cancer Consortium (MMHCC)
Analytical Service
Duke-Durham, NC
caBIG™ infrastructure
National Cancer Institute
6116 Executive Blvd. - #403
National Cancer Institute
Current caBIG™ community
Stand-alone (community leaders)
~900 active participants
But from the early days, we had many ‘volunteers’ from other cancer organizations, standards development organizations, We are working in a “FEDERATION” of academic, public, and private participants.
Participating NCI Divisions include CCR, DCP, DCCPS….. [other] Participating NECTAR grantees include the University of Minnesota, University of Chicago, and Duke University NIH: NINDS, NHLBI NIDCD are actively working to build infrastructure and apps that support their missions as well as ours.
FDA is a very close collaborator on several projects related to regulatory reporting.
Industry. Big pharma (Pfizer, Astra-zenica; IBM playing actively; Oracle, SAIC, BAH. Smaller biotech e.g. Velos, Percipenz—broad range of people engaged.
Strategic partnership with NCRI (NCI in UK) recently launched effort that will mirror caBIG. NCRI will build components to enhance infrastructure and components
National Cancer Institute
have been launched
Provides for the integration, development, and implementation of tissue and pathology tools.
Integrative Cancer Research
Provides tools and systems to enable integration and sharing of information.
Clinical Trial Management Systems
Addresses the need for consistent, open and comprehensive tools for clinical trials management.
Provides for the sharing and analysis of in vivo imaging data.
Workspaces = places where communities work on specific collections of problems; specific research interest groups that build tools and share data; how a community comes together to work on specific activities.
Cross-cutting workspaces support the construction of basic infrastructure, knitting together the caBIGTM community at infrastructure level.
National Cancer Institute
Strategic Level Workspaces
Strategic Planning
Assists in identifying strategic priorities for the development and evolution of the caBIGTM effort.
Developing strategies for providing training in the use of the caBIG developed resources including on-line tutorials, workshops, and training programs.
Data Sharing and
Intellectual Capital
Addresses issues related to the sharing of data, applications and infrastructure both within the consortium and in the larger cancer research community.
Strategic Level workspaces provide a broad framework for the whole enterprise and overlay everything we do in caBIGTM.
National Cancer Institute
REMBRANDT: Building a robust translational research framework for brain tumor studies
REpository of Molecular BRAin Neoplasia DaTa
Here is one example of a robust translational research application built on caBIGTM principles and standards.
REMBRANDT is an informatics initiative to create a repository of molecular brain neoplasms. This public database can be used for a variety of purposes, including the development of novel molecular classification systems. This partnership between the NCI and the National Institute of Neurological Disorders and Stroke (NINDS) moves us toward an era of individualized cancer treatment based on the molecular genetics of each patient’s tumor.
From the research being conducted out of this repository, someday we will be able to provide patients with personalized treatment options based on their type of glioblastoma.
The brain tumor research project is led by Dr. Howard Fine in NCI’s Neuro-Oncology branch.
National Cancer Institute
SNPArray data
Proteomics data
Specifically, REMBRANDT will be designed to house two sets of valuable data. The first set of data will come from the prospective Glioma Molecular Diagnostic Initiative (GMDI) study.
The second type of data housed by REMBRANDT will be a wide array of molecular and genetic data regarding all types of primary brain tumors generated by extramural investigators funded by the NCI to do such analyses.
Most important is that this knowledge base pulls the data together via caIntegrator. REMBRANDT will be the vital link that will not only allow huge amounts of disparate types of data to be housed in a single place, but will also supply the bioinformatics tools critically necessary for the useful analyses of such data.
The diverse data collected via Rembrandt—and other translational research studies—can be deposited into data mart.
caIntegrator has caBIGTM analytical tools sitting on top of it. There is a Website with a secure log in, where the researcher can perform a variety of complex queries (e.g. by gene symbol, differential fold change, clone location……), plus summarize and integrate data in any number of ways (e.g. survival plot).
a generic framework that allows retrieval and transformation of data - from a variety of heterogeneous operational repositories that house microarray, genomic, tissue array, imaging and clinical data.
Clinician /Scientists can conduct translational analysis of study specific data more quickly, easily –and better—than before.
National Cancer Institute
caBIGTM Compatibility Guidelines
The caBIGTM compatibility guidelines are designed to insure that systems designed in a Federated environment are still interoperable on the caBIGTM Grid, both syntactically and semantically
Since achieving interoperability is a process, caBIGTM recognizes four levels of compatibility, starting from Legacy (not interoperable) through Bronze, Silver and Gold (fully interoperable)
caBIGTM compatibility is all about interfaces rather than the scientific content of the system
The analogy is to a city
In cities architects are free to design buildings that perform myriad functions and that take many distinct forms
Nevertheless, all buildings in the city are required to conform to certain specifications in order to receive electricity, water, steam, mail, etc.
National Cancer Institute
National Cancer Institute
Common Data Elements
What do all those data classes and attributes actually mean, anyway?
Data descriptors or “semantic metadata” required
Computable, commonly structured, reusable units of metadata are “Common Data Elements” or CDEs.
NCI uses the ISO/IEC 11179 standard for metadata structure and registration
Semantics all drawn from Enterprise Vocabulary Service resources
National Cancer Institute
Cancer Data Standards Repository (caDSR)
Basic caDSR unit of metadata information to describe a datum is a Common Data Element or CDE
Enterprise-class system for storing metadata, with APIs that give runtime access to both metadata and semantics
Implements the ISO 11179 standard, a flexible model for describing arbitrary metadata
Used to describe metadata associated with clinical case report forms and UML Models
National Cancer Institute
Enterprise Vocabulary Services
Controlled vocabulary resources for caCORE and the cancer research community
Vocabulary Products and Services
Has excellent coverage of cancer terminology
Expands based on needs for additional terminology
Based on concepts rather than terms
Each concept has a unique identifier or CUI with definitions and synonym
National Cancer Institute
The V/CDE workspace is responsible for facilitating the development and ratification of Data Standards for caBIG™
Data Standards can be Vocabularies or Common Data Elements (CDEs) with their associated controlled terminology
A caBIG™ Data Standard is, in effect, a ‘pre-approved’ mechanism for semantically modeling an attribute or series of attributes in a data object. Ideally, having a standard available shortens development time for other projects that need to present such data
Whenever possible, caBIG™ adopts standards that are derived from other standards bodies (HL7, ISO, USPS, UPU, W3C, etc.) and in general use within our community
In the last year, the V/CDE workspace has developed a consensus driven mechanism for approving Data Standards and applied it to an increasing number of CDEs
National Cancer Institute
Service Provider composes service metadata describing the service and publishes it to grid.
Researcher (or application developer) specifies search criteria describing a service of interest
The research submits the discovery request to a discovery service, which identifies a list of services matching the criteria, and returns the list.
Researcher (or application developer) instantiates the grid service and access its resources
National Cancer Institute
caGrid 0.5 Services
NCICB-Rockville, MD
UPMC-Pittsburgh, PA
NCI Core Genotyping Facility-Gaithersburg, MD
caMOD II: Cancer Model Organism Database
NCI Mouse Models of Human Cancer Consortium (MMHCC)
Analytical Service
Duke-Durham, NC
The NCI provides freely available enabling technology for caBIGTM compatibility
These technologies are distributed under a ‘non-viral’ open source license.
caCORE Software Development Kit
When complete process is followed, the outcome is a caBIG ‘Silver’ compliant data system.
National Cancer Institute
How can my research benefit from caBIG™ Tools?
Everything developed by the program is open source and freely available
Training is available at
The latest versions of all the software developed as part of the project can be obtained from the caBIG™ project gforge site:
As I mentioned earlier, all the tools developed by caBIG are freely available.
We have a robust training program. You can learn about it on the Web site, too.
National Cancer Institute
caBIG™: Getting Involved
Track caBIG™ activities on the NCI’s caBIG™ website,
Attend caBIG™ Annual Meeting, February 5-7, 2007, Wardman Park Marriott, Washington, DC
Learn about the existing bioinformatics infrastructure, caCORE, at
Download currently available caBIG™ tools from the caBIG™ website at
Sign up for the caBIG™ mailing list at
Please visit the main caBIG™ website for more information:
We hope many of you will get involved. As I said at the beginning, for this infrastructure, tools, and data resources to be useful, they have to reflect the needs of the user community. YOU are part of the community. You CAN shape the evolution of caBIG and the development of specific tools.
In order to learn more about or get involved with caBIG™, interested viewers can pursue the following options: track caBIG™ activities on NCI’s caBIG™ website; attend caBIG™ meetings, including our Annual Meeting, on February 5-7, 2007, in Washington, DC; learn about caCORE, caBIG™’s bioinformatics infrastructure; download some of the already available data sharing and analysis tools from the caBIG™ website; or sign up for the caBIG™ mailing list.

View more