researching cancer in the cloud - using spring, neo4j, mongo and redis in the cloud

Post on 09-May-2015

686 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Speakers: Smitha Gudur and Manoj Joshi Cancer/life science drug research models are very rich in relationships, relationship heterogeneity and entity inter-dependencies. Most entity metadata is dynamic and unpredictable making it difficult to fit such models in traditional relational landscape. Redbasin Networks uses a hybrid Nosql strategy that supports composite and rich document metadata that is interconnected pervasively. Cancer and life science data is excessively nested. You will find this useful if you are building complex engineering and/or scientific applications, and need insights on how to merge data from many diverse data-sets and map it to an intuitive and effective graph database model. We will show using code examples how complex metadata can be engineered using Spring, Neo4J and Mongo, to create useful drug insights for the drug researcher, and also provide a platform for technologists to build sophisticated life science applications.

TRANSCRIPT

© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Redbasin Networks: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis

By Smita Kulkarni Gudur and Manoj Joshi

Friday, September 6, 13

Introduction

Smitha Kulkarni Gudur, CEO

Manoj Joshi, CTO

Allan Grimes, VP Business

Neeta Potdar, HR & Admin

Friday, September 6, 13

Redbasin Networks Overview

Redbasin Networks provides a cloud based platform for cancer drug researchers in Pharma and Bio-tech.

Redbasin is a scalable technology and platform that allows Life Science researchers to gain insights about viable drug molecules and pathways.

Friday, September 6, 13

Cancer Ecosystem Today (It’s complex!)

EPA

CDCUniversities

NIH/NLM

Hospitals, Treatment CentersBiotech

Labs

Legal

Instrument vendors

Certification,Approval

Lab tests

Patients

Insurance

Pharma

Contract Research

Organization

Drug Labs

4

FDA

Friday, September 6, 13

Cancer Market Research

US cancer spending $108b

89mdeaths

2005-2015

Redbasin Networks 10% of top

200 drugscancerrelatedgenerate $1b/yr

1.5mnew cancerdeaths

Friday, September 6, 13

Spring Data: Redbasin Cancer Research

SpringData

Protein Gene Disease Drug Antibody Ligand Complex Epigenetics

MongoDB Neo4j Redis HBase Lucene

Cloud

REST API

XML JSON

Friday, September 6, 13

Typical Drug Life Cycle Costs

Friday, September 6, 13

Why Not Go Relational?

Oncological meta-data is multi-dimensional

Pervasive joins are a drag on performance

Unpredictable schemas during mining

Temporality is difficult to represent

Friday, September 6, 13

Redbasin Core Data Technologies

• Mongo• Neo4J• Redis• Lucene• HBase/Hadoop

Friday, September 6, 13

Why Mongo?

Lots of XML and JSON documents

Very easy to use

High performance and scalability

Strong Java & REST Support

Friday, September 6, 13

Why Neo4j?

Neo4j is a modern graph database

Very easy to use

Complex features that are used less often have been dropped

Strong Java & REST Support

Friday, September 6, 13

How does Redbasin use Neo4J

We have 225 oncology dimensions

Everything either a node or relationship or a property

We use indexes liberally

Friday, September 6, 13

Numerous dim and sub-dim in Redbasin’s big data

DI

TX

Protein Gene Disease Drug Antibody Ligand

Epigenetics Ontology

Aminoacid

Structure PD/PK Physicochemical

Research Experiment

Interaction

Researcher Institute

Pathway

OrganismInstrument Method

Enzyme

Time LocationFDA Pharma ClinicalTrial

Friday, September 6, 13

Dimensions have sub-dimensions

DI

TX

Pharmacodynamics

Absorption Distribution Metabolism Elimination Toxicity

Principal Dimension

Sub-dimensions

(What drug does to body?)

Friday, September 6, 13

Data is Logical. But Big Data is not.

DI

AOP

TX

Real-time lookups

Understands human ideosyncracies

Logical

Impressive computational

abilities

Data is more than just data

Asymptotic convergence to

human

Friday, September 6, 13

No enterprise! Just plain cloud...

DI

AOP

TX

Friday, September 6, 13

Perhaps a Nebula(e), but why?

DI

AOP

TX

•Contextual correlation•Ontology driven•Multi-dimensional•Hierarchical•Fractal like•Clustering•Dynamic/Evolving•Stars(facts) are born•Zoom for details•Humongous•Transparency•Dynamic metadata*•Interconnected•Graph like•Complexity

Friday, September 6, 13

How does Redbasin use Spring DataRedbasin Cloud Connects to hundred’s of cancer data sourcesRedbasin uses contextual mining to create dynamic modelsWe map nodes, relationships, attributes to Redbasin Object ModelWe separate analytics from queries

Friday, September 6, 13

Neo4J Node Index Example IndexHits <Node> pNodeHits = drugIdIndex.get(DRUG_ID, drugConceptCode);if (pNodeHits != null && pNodeHits.size() > 0) { // if node already exists drugNode = pNodeHits.getSingle(); if (drugNode != null) { if (!drugNode.hasProperty(DRUG_CONCEPT_CODE)) { drugNode.setProperty(DRUG_CONCEPT_CODE, drugConceptCode); } if (!drugNode.hasProperty(BioEntityTypes.NODE_TYPE)) { drugNode.setProperty(BioEntityTypes.NODE_TYPE, BioEntityTypes.RB_DRUG); } }}

Friday, September 6, 13

Spring Stack: Spring Data with Mongo JSON "@molecule_type" : "complex", "@id" : "208314", "Name" : { "@name_type" : "PF", "@long_name_type" : "preferred symbol", "@value" : "TXA2/TP beta/beta Arrestin3/RAB11/GDP" }, "ComplexComponentList" : [ { "@molecule_idref" : "202489" }, { "@molecule_idref" : "202493", "PTMExpression" : [ { "@protein" : "O75228", "@position" : "239", "@aa" : "C", "@modification" : "palmitoylation" }

Friday, September 6, 13

Redbasin data grows and changes over time

Spring Data with Mongo Objects

Collection ideal for Redbasin’s unstructured

Data

Retrieve nested objects with ease

participantList.experimentalRoleList.experimentalRole.xref.secondaryRef.@db" : "pubmed"

DBObject utilities well suited for mapping to BioEntities

Friday, September 6, 13

Spring Data: Redis

Key

Value

Usage: Ontologies & Taxonomy for unique key value pairs. In auto completion as our data is “N” column based

Friday, September 6, 13

Redis - Ontology Lookups

Ontology Lookups Can Be Very Handy

Friday, September 6, 13

Redis - Analytics Cache

MineBot and Multi-entity Analytics is Nifty

Friday, September 6, 13

Redis - Managing Aliases

Gene Aliases for Instance are Numerous

Friday, September 6, 13

Redis - Key Value Pairs

Large Number of Key Value Pairs

Key Value

ATP Adenosine Tri-phosphate

Friday, September 6, 13

Redis - Slaves

Redis Slaves Simply Work

Friday, September 6, 13

Redis - Monitor

https://github.com/nkrode/RedisLive

Friday, September 6, 13

Redis - Subgraph Caching

•Subgraph Similarity Analytics•Pathway Rules Cache

Friday, September 6, 13

Redis - Spring data

• Using connection package Jedis• Spring’s data access exception for redis driver• Built abstraction - Redis template• Not using pubsub support• Using our our own JSON/XML mapping serializers• Atomic counter for redis - useful• Sorting (using) and pipelining (not using)• Not using 3.1 spring cache abstraction

Friday, September 6, 13

Spring Data: Redis Usage

Key Value

NCBI_TAXONOMY_ID Key: 9606 Homo Sapien

DISEASE_CODEKey: x46859

Metastases from colorectal carcinoma

HGNC_ID (Human Gene Identifier)Key: 1817 CEACAM5

Friday, September 6, 13

Redbasin vs Other BioModels

Redbasin Other BioModels

Focused on Oncology No focus on any specific Disease

Commercial/public domain correlations

Focused on academic knowledge

Information density is “infinite” Information size is “infinite”

Temporality/pathway dependent No time element

Hybrid vendor strategy No co-existence scenario

One cloud for all Oncology Typically downloadable software

Friday, September 6, 13

Neo4J Node Validation

Beclin 1 Gene

Bcl-2 Protein

Apoptosis

binds-to

inhibits

Biologically aware nodes and relationships

Friday, September 6, 13

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation { }

Annotation for @RelationshipEntity

Metadata for recognition of a relationship class

Convenient relationship abstraction

Friday, September 6, 13

Relationships always have start/end nodes

@RelationshipEntitypublic class BioRelation { @EndNode private Object endNode; @StartNode private Object startNode; }

• A unique field must be marked as @EndNode• A unique field must be marked as @StartNode• Field can be any variable name• Flexibility for the programmer• Must be @BioEntity class

Friday, September 6, 13

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation {..... @GraphId private Long id;..... }

• Id of the relationship• This is an unreliable field• But we have it hear for reference

Friday, September 6, 13

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation { ..... @RelProperty private String name; .... }

• @RelProperty tells if this is a property• There could be non-property fields• The property here is “name”• It’s always a String

Friday, September 6, 13

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation { .... @RelType private String relType; @RelProperty private String message;}

• @RelType is the actual relation• message is a default @RelProperty

Friday, September 6, 13

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation { @EndNode private Object endNode; @StartNode private Object startNode; @GraphId private Long id;

@RelProperty private String name; @RelType private String relType; @RelProperty private String message;}

Friday, September 6, 13

Spring Data-isms @Retention(RetentionPolicy.RUNTIME) public @interface BioEntity { public BioTypes bioType(); }

Retention(RetentionPolicy.RUNTIME) public @interface RelationshipEntity { }

Friday, September 6, 13

Spring Data-isms Neo4j Retention(RetentionPolicy.RUNTIME)public @interface RelatedTo {

public Direction direction() default Direction.BOTH;

BioRelTypes relType() default BioRelTypes.DEFAULT_RELATION;

public Class<?> elementClass() default Object.class;

public BioTypes endNodeBioType() default BioTypes.UNKNOWN;

public BioTypes startNodeBioType() default BioTypes.UNKNOWN;}

Friday, September 6, 13

End Node annotation

package com.redbasin.bio.meta;

@Retention(RetentionPolicy.RUNTIME)@Target({ ElementType.ANNOTATION_TYPE, ElementType.FIELD })public @interface Reference {}

@Retention(RetentionPolicy.RUNTIME)@Target({ElementType.FIELD,ElementType.METHOD})@Referencepublic @interface EndNode {}

• There is no concept of start and end nodes in Neo4J• This is a Redbasin abstraction• The @Reference can be used by annotation types and fields only• The annotation @EndNode can be used by methods and fields only• It cannot be used by classes or other elements

Friday, September 6, 13

Redbasin Open Doc Share

https://github.com/redbasin/redbasin-org

• It’s our “social taxonomy” for scientific documents• github community project• Scientists can collaborate over zillions of documents and media• Downloadable code, can run in cloud mode• Can be modified to support any data access• Redbasin.org uses it for collaboration in schools• A Spring champion cause, underprivileged schools

Friday, September 6, 13

What can developers do?

• Help us with development of our public domain API• We support Jquery, d3js, JSON/XML, REST and more• We support Android, iOS on mobiles/tablets• Spring data integration - developer plugins

Friday, September 6, 13

Redbasin Cloud Projects

Open Stack ProjectCloud Foundry IntegrationAWS Project

Friday, September 6, 13

Why have Java developers chosen Spring?

DI

AOP

TX

CoreModel

J(2)EE usability

Testable, lightweightmodel for

programming

Application Portability

Powerful Service Abstractions

Deployment Flexibility

Friday, September 6, 13

Spring

Deploy to Cloud or on premise

Big, Fast,

FlexibleData Web,

Integration,Batch

CoreModel

GemFire

Friday, September 6, 13

Spring Stack

DI AOP TX JMS JDBC

MVC Testing

ORM OXM Scheduling

JMXREST Caching Profiles Expression

Spring Framework

HATEOAS

JPA 2.0 JSF 2.0 JSR-250 JSR-330 JSR-303 JTA JDBC 4.1

Java EE 1.4+/SE5+

JMX 1.0+WebSphere 6.1+

WebLogic 9+

GlassFish 2.1+

Tomcat 5+

OpenShift

Google App Eng.

Heroku

AWS Beanstalk

Cloud FoundrySpring Web Flow Spring Security

Spring Batch Spring Integration

Spring Security OAuth

Spring Social

Twitter LinkedIn Facebook

Spring Web Services

Spring AMQP

Spring Data

Redis HBase

MongoDB JDBC

JPA QueryDSL

Neo4j

GemFire

Solr Splunk

HDFS MapReduce Hive

Pig Cascading

Spring for Apache Hadoop

SI/Batch

Spring XD

Friday, September 6, 13

Learn More. Stay Connected.

Contact Redbasin: bit.ly/redbasin<related sessions>

Talk to us on Twitter: @springcentralFind session replays on YouTube: spring.io/video

Friday, September 6, 13

top related