case study polyglot persistence in pharmaceutical industry

23
Copyrights: Reach1to1 Technologies Pvt. Ltd. Big Data Innovation Conference Case Study: Polyglot Persistence in Pharmaceutical Industry Ashutosh Bijoor Director, Reach1to1 Technologies Pvt. Ltd.

Upload: ashutosh-bijoor

Post on 10-May-2015

1.368 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Big Data Innovation Conference

Case Study: Polyglot Persistence in Pharmaceutical Industry

Ashutosh BijoorDirector, Reach1to1 Technologies Pvt. Ltd.

Page 2: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Contents

● Customer Requirements

● Existing Architecture & Limitations

● Approach - Polyglot Persistence

● Challenges & Addressing Them

● Proposed Architecture

● Performance Results

● Similar Cases from Different Industries

Page 3: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Customer Requirements

Information Sources User Applications

I.P.Research

Repository

Web Content Intranet

Data Files CustomerPortals

AnalyticalDashboards

Documents

Databases Admin Control

Page 4: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Customer Requirements

● Information Sources– Integrate wide range of IPR related information sources

– Different document formats, size and frequency of updates

– Both structured and unstructured information

– Single repository to handle wide variety and large volume of data

● User Applications– Unified API to access and manipulate all data sources

– High performance of search and analytics as well as batch operations

– Flexibility of adding new data sources with minimal or no code change

– Extensible, high performance data processing architecture

Page 5: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Existing Architecture

Information Sources User Applications

Files Archive

RDBMS

LoadingScripts

File APIDocuments

Web Content

Data Files

Databases

Dashboards

Intranet

CustomerPortals

Admin Control

LoadingScripts

SQL

ParsingScripts

Page 6: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Existing Architecture Limitations

● Information Sources– Structured data in RDBMS – fixed schema

– Unstructured data in File Archive – no analytics

– Database unable to handle large volume of data

– Limits on volume and variety of data sources

● User Applications– Performance of search and analytics slowing down – not usable

– Inability to add new search & analytics features

– Batch ingestion of new data very cumbersome

– Stagnation of performance and capabilities

Page 7: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Existing Architecture Performance

Performance Benchmarks

Batch

4 secs / 100 docs

SearchBatch

+

15 secs

Search

5 secs

Estimated time to add new data source: 3 months

Page 8: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Approach

Single repository to handle wide variety and large volume of of data

Extensible, high performance data processing architecture

+

Which database do we choose?

Page 9: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Which database do we choose?

Currently about 150 NoSQL Databases Listed!

Page 10: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Factors affecting database choice

● Data Models– What type of data sources do we want to integrate?

– How do we want to manipulate / analyze the data?

– What is the volume, variety and velocity of data?

● Consistency, Availability, Partitioning (CAP)– Consistency: Only one value of an object to each client (Atomicity)

– Availability: All objects are always available (Low Latency)

– Partition Tolerance: Data split into multiple network partitions (Clustering)

– CAP Theorem: Choose any two - which two should we choose?

Page 11: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Databases - Models and CAPability

● Data Models– Relational

– Key-Value

– Column Oriented

– Document Oriented

– Graph

● CAP ability– Consistency

– Availability

– Partition Tolerance

– Pick any two!

AA

CC PP

Pick Two!

APCA

CP

RDBMSsAster DataGreenplumVertica

CassandraSimpleDBCouchDBRiakDynamoVoldermort

BigTableHypertableHBase

MongoDBTerrastoreScalaris

MemcacheDBRedisNeo4j

Source:Visual Guide to NoSQL Systems by Nathan Hurst

Over 10 different models!

Page 12: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Polyglot Persistence

Any one database does not fit all needs!

Documents

MongoDB

Analytics

RDBMS

Search

Apache Solr

Relationships

Neo4j

● Document-oriented● Flexible schema● Replication & High

Availability● Auto-sharding● Rich, document-

based queries● Fast In-Place Updates● GridFS● Aggregation

Framework

● Advanced text search● Flexible schema● Support for

highlighting, pivoted faceting, spell check, clustering

● Support for replication & sharding

● High-performance graph database

● Nodes and edges can have indexed meta data

● Graphs of several billion nodes on a single machine

● Powerful traversal framework

● Legacy data and apps● Structured data● Support for legacy

applications

Solution: Polyglot Persistence – use more than one database!

Page 13: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Challenges

● Synchronization– How to manage consistency between multiple engines?

– How to maintain low latency of CRUD operations?

● Scalability– How to ensure high throughput of batch operations?

– How to handle large number of concurrent operations?

● Extensibility– How to allow new engines to be added with minimal architecture

change?

Page 14: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Challenges – Addressing them

● High Performance Synchronization Engine– Logical Locking – flexible synchronization models

– Event-driven – distributed control logic

– Kanban Queues – balanced resource utilization

● Horizontally Scalable– Distributed processing – automatic

– Asynchronous I/O – high concurrency

● Component-based extensions– Application-specific Controller modules

– Re-usable Synchronization patterns

– Re-usable plugins for various databases

Page 15: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Polyglot Persistence Platform

● Reusable customizable platform– Open source license

– Modular, extensible architecture

– Commercial plugins for various databases and indexing engines

● Proven performance– Based on NodeJS

– High performance in high load conditions

– Developed and supported by strongly invested team

http://oodebe.org

Page 16: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Proposed Architecture

Information Sources User Applications

MongoDB ApacheSolr

RDBMS Neo4j

Web Content

Data Files

Documents

Databases

Intranet

CustomerPortals

Dashboards

Admin Control

SynchronizationEngine

Custom-builtWeb Services

Loading APIs

DB-specificAPIs

Page 17: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Sample Operation

User Application User Application

Batch API Controller

SourceProcessor Doc

Processors

DBHandler

DBHandler

DBHandler

DBHandler

Data Source DB Engine 1 DB Engine 2 DB Engine 3

REST API

Kanban Queue

Asynchronous I/OAsynchronous I/O

Messages / Events

Locks

Page 18: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Deployment Architecture

Controllers Cluster

Database Cluster

Data Processing Cluster

Page 19: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Customer Requirements

● Information Sources– Integrate wide range of IPR related information sources

– Different document formats, size and frequency of updates

– Both structured and unstructured information

– Single repository to handle wide variety and large volume of data

● User Applications– Unified API to access and manipulate all data sources

– High performance of search and analytics as well as batch operations

– Flexibility of adding new data sources with minimal or no code change

– Extensible, high performance data processing architecture

Page 20: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

New Architecture Performance

Performance Benchmarks

Batch

4 secs / 100 docs

SearchBatch

+

15 secs

Search

5 secs

Time to add new data source: 3 months 1 day

<1 sec 1.5 secs / 100 docs <1 sec

Page 21: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Similar Cases from Other Industries

AirlinesCustomer Loyalty

Integration of flight schedules, ancillary services, bookings and payments into a single point interface for customers

InsuranceClaims Analysis

Integration of claims, feedback forms, customer info, call center logs into central repository for search and analytics

TelecomCRM Analytics

Call center logs, IVR logs, email and social media feeds archived for analysis and preventive fault alerts

BFSIInvestment Advisor

Integration of social media feeds, analyst opinions, web content and trading data with search and sentiment analysis

PublishingContent Repository

Aggregated and original content processed with text mining, automatic and assisted classification and annotation

MediaOnline TV

Broadcast schedules, ratings, social media feeds and user recordings for a TV Anywhere platform

Page 22: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

About Reach1to1

● Over 10 years experience with NoSQL and Big Data– Implemented solutions in various industries

● Wide skill sets spanning emerging technologies – Big data, cloud and mobile applications

● Variety of engagement models– Projects, Consulting, Extended Delivery Centers

● Strong investor backing– Basil Partners, Singapore

● Low operating costs and high reach– Sales team in US, delivery team in Mumbai and Bangalore

Page 23: Case study   polyglot persistence in pharmaceutical industry

Copyrights: Reach1to1 Technologies Pvt. Ltd.

Ashutosh Bijoor

[email protected]://bijoor.me

Big Data Innovation Conference (c)

Thank you!