researching cancer in the cloud - using spring, neo4j, mongo and redis in the cloud

Redbasin Networks: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis

By Smita Kulkarni Gudur and Manoj Joshi

Friday, September 6, 13

Introduction

Smitha Kulkarni Gudur, CEO

Manoj Joshi, CTO

Allan Grimes, VP Business

Neeta Potdar, HR & Admin

Redbasin Networks Overview

Redbasin Networks provides a cloud based platform for cancer drug researchers in Pharma and Bio-tech.

Redbasin is a scalable technology and platform that allows Life Science researchers to gain insights about viable drug molecules and pathways.

Cancer Ecosystem Today (It’s complex!)

CDCUniversities

NIH/NLM

Hospitals, Treatment CentersBiotech

Instrument vendors

Certification,Approval

Lab tests

Patients

Insurance

Pharma

Contract Research

Organization

Drug Labs

Cancer Market Research

US cancer spending $108b

89mdeaths

2005-2015

Redbasin Networks 10% of top

200 drugscancerrelatedgenerate $1b/yr

1.5mnew cancerdeaths

Spring Data: Redbasin Cancer Research

SpringData

Protein Gene Disease Drug Antibody Ligand Complex Epigenetics

MongoDB Neo4j Redis HBase Lucene

REST API

XML JSON

Typical Drug Life Cycle Costs

Why Not Go Relational?

Oncological meta-data is multi-dimensional

Pervasive joins are a drag on performance

Unpredictable schemas during mining

Temporality is difficult to represent

Redbasin Core Data Technologies

• Mongo• Neo4J• Redis• Lucene• HBase/Hadoop

Why Mongo?

Lots of XML and JSON documents

Very easy to use

High performance and scalability

Strong Java & REST Support

Why Neo4j?

Neo4j is a modern graph database

Very easy to use

Complex features that are used less often have been dropped

Strong Java & REST Support

How does Redbasin use Neo4J

We have 225 oncology dimensions

Everything either a node or relationship or a property

We use indexes liberally

Numerous dim and sub-dim in Redbasin’s big data

Protein Gene Disease Drug Antibody Ligand

Epigenetics Ontology

Aminoacid

Structure PD/PK Physicochemical

Research Experiment

Interaction

Researcher Institute

Pathway

OrganismInstrument Method

Enzyme

Time LocationFDA Pharma ClinicalTrial

Dimensions have sub-dimensions

Pharmacodynamics

Absorption Distribution Metabolism Elimination Toxicity

Principal Dimension

Sub-dimensions

(What drug does to body?)

Data is Logical. But Big Data is not.

Real-time lookups

Understands human ideosyncracies

Logical

Impressive computational

abilities

Data is more than just data

Asymptotic convergence to

No enterprise! Just plain cloud...

Perhaps a Nebula(e), but why?

•Contextual correlation•Ontology driven•Multi-dimensional•Hierarchical•Fractal like•Clustering•Dynamic/Evolving•Stars(facts) are born•Zoom for details•Humongous•Transparency•Dynamic metadata*•Interconnected•Graph like•Complexity

How does Redbasin use Spring DataRedbasin Cloud Connects to hundred’s of cancer data sourcesRedbasin uses contextual mining to create dynamic modelsWe map nodes, relationships, attributes to Redbasin Object ModelWe separate analytics from queries

Neo4J Node Index Example IndexHits <Node> pNodeHits = drugIdIndex.get(DRUG_ID, drugConceptCode);if (pNodeHits != null && pNodeHits.size() > 0) { // if node already exists drugNode = pNodeHits.getSingle(); if (drugNode != null) { if (!drugNode.hasProperty(DRUG_CONCEPT_CODE)) { drugNode.setProperty(DRUG_CONCEPT_CODE, drugConceptCode); } if (!drugNode.hasProperty(BioEntityTypes.NODE_TYPE)) { drugNode.setProperty(BioEntityTypes.NODE_TYPE, BioEntityTypes.RB_DRUG); } }}

Spring Stack: Spring Data with Mongo JSON "@molecule_type" : "complex", "@id" : "208314", "Name" : { "@name_type" : "PF", "@long_name_type" : "preferred symbol", "@value" : "TXA2/TP beta/beta Arrestin3/RAB11/GDP" }, "ComplexComponentList" : [ { "@molecule_idref" : "202489" }, { "@molecule_idref" : "202493", "PTMExpression" : [ { "@protein" : "O75228", "@position" : "239", "@aa" : "C", "@modification" : "palmitoylation" }

Redbasin data grows and changes over time

Spring Data with Mongo Objects

Collection ideal for Redbasin’s unstructured

Retrieve nested objects with ease

participantList.experimentalRoleList.experimentalRole.xref.secondaryRef.@db" : "pubmed"

DBObject utilities well suited for mapping to BioEntities

Spring Data: Redis

Usage: Ontologies & Taxonomy for unique key value pairs. In auto completion as our data is “N” column based

Redis - Ontology Lookups

Ontology Lookups Can Be Very Handy

Redis - Analytics Cache

MineBot and Multi-entity Analytics is Nifty

Redis - Managing Aliases

Gene Aliases for Instance are Numerous

Redis - Key Value Pairs

Large Number of Key Value Pairs

Key Value

ATP Adenosine Tri-phosphate

Redis - Slaves

Redis Slaves Simply Work

Redis - Monitor

https://github.com/nkrode/RedisLive

Redis - Subgraph Caching

•Subgraph Similarity Analytics•Pathway Rules Cache

Redis - Spring data

• Using connection package Jedis• Spring’s data access exception for redis driver• Built abstraction - Redis template• Not using pubsub support• Using our our own JSON/XML mapping serializers• Atomic counter for redis - useful• Sorting (using) and pipelining (not using)• Not using 3.1 spring cache abstraction

Spring Data: Redis Usage

Key Value

NCBI_TAXONOMY_ID Key: 9606 Homo Sapien

DISEASE_CODEKey: x46859

Metastases from colorectal carcinoma

HGNC_ID (Human Gene Identifier)Key: 1817 CEACAM5

Redbasin vs Other BioModels

Redbasin Other BioModels

Focused on Oncology No focus on any specific Disease

Commercial/public domain correlations

Focused on academic knowledge

Information density is “infinite” Information size is “infinite”

Temporality/pathway dependent No time element

Hybrid vendor strategy No co-existence scenario

One cloud for all Oncology Typically downloadable software

Neo4J Node Validation

Beclin 1 Gene

Bcl-2 Protein

Apoptosis

binds-to

inhibits

Biologically aware nodes and relationships

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation { }

Annotation for @RelationshipEntity

Metadata for recognition of a relationship class

Convenient relationship abstraction

Relationships always have start/end nodes

@RelationshipEntitypublic class BioRelation { @EndNode private Object endNode; @StartNode private Object startNode; }

• A unique field must be marked as @EndNode• A unique field must be marked as @StartNode• Field can be any variable name• Flexibility for the programmer• Must be @BioEntity class

@RelationshipEntitypublic class BioRelation {..... @GraphId private Long id;..... }

• Id of the relationship• This is an unreliable field• But we have it hear for reference

@RelationshipEntitypublic class BioRelation { ..... @RelProperty private String name; .... }

• @RelProperty tells if this is a property• There could be non-property fields• The property here is “name”• It’s always a String

@RelationshipEntitypublic class BioRelation { .... @RelType private String relType; @RelProperty private String message;}

• @RelType is the actual relation• message is a default @RelProperty

@RelationshipEntitypublic class BioRelation { @EndNode private Object endNode; @StartNode private Object startNode; @GraphId private Long id;

@RelProperty private String name; @RelType private String relType; @RelProperty private String message;}

Spring Data-isms @Retention(RetentionPolicy.RUNTIME) public @interface BioEntity { public BioTypes bioType(); }

Retention(RetentionPolicy.RUNTIME) public @interface RelationshipEntity { }

Spring Data-isms Neo4j Retention(RetentionPolicy.RUNTIME)public @interface RelatedTo {

public Direction direction() default Direction.BOTH;

BioRelTypes relType() default BioRelTypes.DEFAULT_RELATION;

public Class<?> elementClass() default Object.class;

public BioTypes endNodeBioType() default BioTypes.UNKNOWN;

public BioTypes startNodeBioType() default BioTypes.UNKNOWN;}

Bio Entity

@Retention(RetentionPolicy.RUNTIME)public @interface BioEntity { public BioTypes bioType(); }

• This is usually a node in Neo4J • @Retention - How long to retain annotations?• CLASS - Annotations are to be recorded in the class file by the compiler but need not be retained by the VM at run time.• RUNTIME - Annotations are to be recorded in the class file by the compiler and retained by the VM at run time, so they may be read reflectively.• SOURCE - Annotations are to be discarded by the compiler.

End Node annotation

package com.redbasin.bio.meta;

@Retention(RetentionPolicy.RUNTIME)@Target({ ElementType.ANNOTATION_TYPE, ElementType.FIELD })public @interface Reference {}

@Retention(RetentionPolicy.RUNTIME)@Target({ElementType.FIELD,ElementType.METHOD})@Referencepublic @interface EndNode {}

• There is no concept of start and end nodes in Neo4J• This is a Redbasin abstraction• The @Reference can be used by annotation types and fields only• The annotation @EndNode can be used by methods and fields only• It cannot be used by classes or other elements

Redbasin Open Doc Share

https://github.com/redbasin/redbasin-org

• It’s our “social taxonomy” for scientific documents• github community project• Scientists can collaborate over zillions of documents and media• Downloadable code, can run in cloud mode• Can be modified to support any data access• Redbasin.org uses it for collaboration in schools• A Spring champion cause, underprivileged schools

What can developers do?

• Help us with development of our public domain API• We support Jquery, d3js, JSON/XML, REST and more• We support Android, iOS on mobiles/tablets• Spring data integration - developer plugins

Redbasin Cloud Projects

Open Stack ProjectCloud Foundry IntegrationAWS Project

Why have Java developers chosen Spring?

CoreModel

J(2)EE usability

Testable, lightweightmodel for

programming

Application Portability

Powerful Service Abstractions

Deployment Flexibility

Spring

Deploy to Cloud or on premise

Big, Fast,

FlexibleData Web,

Integration,Batch

CoreModel

GemFire

Spring Stack

DI AOP TX JMS JDBC

MVC Testing

ORM OXM Scheduling

JMXREST Caching Profiles Expression

Spring Framework

HATEOAS

JPA 2.0 JSF 2.0 JSR-250 JSR-330 JSR-303 JTA JDBC 4.1

Java EE 1.4+/SE5+

JMX 1.0+WebSphere 6.1+

WebLogic 9+

GlassFish 2.1+

Tomcat 5+

OpenShift

Google App Eng.

Heroku

AWS Beanstalk

Cloud FoundrySpring Web Flow Spring Security

Spring Batch Spring Integration

Spring Security OAuth

Spring Social

Twitter LinkedIn Facebook

Spring Web Services

Spring AMQP

Spring Data

Redis HBase

MongoDB JDBC

JPA QueryDSL

GemFire

Solr Splunk

HDFS MapReduce Hive

Pig Cascading

Spring for Apache Hadoop

SI/Batch

Spring XD

Learn More. Stay Connected.

Contact Redbasin: bit.ly/redbasin<related sessions>

Talk to us on Twitter: @springcentralFind session replays on YouTube: spring.io/video

researching cancer in the cloud - using spring, neo4j, mongo and redis in the cloud

Technology

rocking mongo db on the cloud

hello mongo

micronaut - oio · 2020-07-06 · mongo-gorm configures...

mongo - v0

mongo, beti voir : beti, mongo

building the neo4j sandbox: aws, ecs, docker, python, neo4j,...

opencypher tck - cloud object storage | store & … driver...

grails + mongo - jboss · pdf filegrails + mongo a new super...

datamanagementin$the$cloud$...

mongo chicago

big data & cloud | an introduction to neo4j (and doctor who)...

neo4j privacy shield business advantages neo4j … › rs...

mongo gridfs

mongo bongo

mongo santamaria

mongo bbmw

neo4j: graph data modeldatalab.cs.pdx.edu › education ›...

mongo, indiana

neo4j adam foust road map introduction to neo4j nosql...

mongo sharding