big data everywhere chicago: leading a healthcare company to the big data promised land -- a case...

34
Mohammad Quraishi (IT Senior Principal - Cigna) [email protected] Leading a Healthcare Company to the Big Data Promised Land: A Case Study of Hadoop in Healthcare

Upload: bigdataeverywhere

Post on 13-Jul-2015

374 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

!Mohammad Quraishi (IT Senior Principal - Cigna) [email protected]

!Leading a Healthcare Company

to the Big Data Promised Land:

!!!A Case Study of Hadoop in Healthcare

Page 2: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

About me

•BS in Computer Science and Engineering from University of Connecticut

•In the Healthcare Industry for over 19 years •Programmer most of my career - Architect, Designer •Worked in the SOA space for a number of years •Lead engineer in the mobile application space •Now Lead engineer in the Big Data Analytics Space - Hadoop !

In my spare time • Love to travel with the family • Video games, music, movies • Community relations work • Fan of College basketball

2

Page 3: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Breakdown of the Hadoop Journey

The blowback What we

accomplished

Roadmap to the future

Lessons Learned Questions?

1 2 3

3

Making the case Vision Architecture

Page 4: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

The Elephant in the room

Image Credit: Guian Bolisay/Flickr 4

Page 5: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

What’s the problem?

5

We already have a mature data analysis infrastructure

Page 6: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

And it looks something like this…What we already do

•We have independent data marts •We have the Hub-and-spoke architecture, the centralized warehouse

6

Page 7: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

What is the vision?

7

The ability to perform •Descriptive, Predictive and Prescriptive Analytics !Remove the traditional IT barriers separating the business users from insights

Page 8: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Benefits of Big Data

8

•Hadoop has the lowest cost per TB ratio of any data technology available !

•Getting started with Hadoop is fairly inexpensive •“Entry-level” clusters relatively inexpensive •Grow in small steps

Page 9: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

9

!You don’t have to throw away data anymore!

Benefits of Big Data

Page 10: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Vision - Reference Architecture

10

Real%me'Data'Store'or'event'processing

LogsWebIVR

PortalMobile

NoSQLStoring weblogs

Analysis/Modeling'Tools

SASPentaho

R?

Data'Science'Tools'

TableauSpotfirePlatfora

CognosMicrostrategy

?

Teradata

filestack

RDBMS

External'Hadoop'OutputOr'in'HDFS

*Use Spark Streaming and

append to Hadoop output - Realtime

events

Live DataStreams

Web Analytics

Event detection(Storm)

Hadoop'Cluster'running'HDFS'and'MapReduceIncludes'Management,'Monitoring'and'Security

HDFS

MapReduce'Distributed'Programming'FrameworkVisualization

Analysis

Map Reduce jobsBatch

Log files to HDFS

Data from tables to HDFS

CED/Claims

Clinical DataRDBMS

SQOOP

Back'up'or'copy'data'from'HDFS'to'a'redundant'cluster'for'quick'recovery*'For'future'implementa%on'TBD

Realtime FeedFlume

Teradata

Hive/Impala(SQL)

Cascading(Java)

Scalding(Scala)Python

1

5

3

7

6

4

Flume

SQOOP

Edge'Node'For'Hadoop'Client

Jobs

SQOOP/Flume

Chronos

2

Hadoop'Cluster'#2

Hadoop'Cluster'#3

8

Page 11: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

The Initial Evaluation

11

•Vendor Evaluation: Which relationship best fits our needs without lock-in? !

•Selection of use cases for demonstration !

•Visualization of those use cases

Page 12: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Use Case 1

12

Page 13: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Use Case 2

13

Page 14: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Success!

14

•Ready to tackle tougher more complicated problems !

•Went out looking for more use cases

Page 15: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Ran into misconceptions

15

“Let’s use Hadoop as ETL!” !“Help us move data.” !“Can we back up data for archiving?” !!!

Page 16: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

… & Challenges

16

Page 17: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

But Why?

17

•Overuse of the words “Big” & “Data” !

•There was an overlap with other tools and platforms !

•Hadoop looked like a swiss army knife !

•Will it take over the world and replace other platforms?

Page 18: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Broader impact - Business Benefits

18

!!

•Building a Customer Persona !!

•Service Ops efficiency !!

•Being Customer Centric !!

•Product Efficiency !!

•Brand Impact

Page 19: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Broader impact - IT Benefits

19

!!!

•Predictive threat modeling !!

•Data Archival !!

•Network Efficiency

Page 20: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Hadoop and Big Data

20

!•Big Data = Hadoop + Relational + other

suitable task related technologies !

•Hadoop is complementary

Page 21: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Hadoop is Complementary

21

•Hadoop excels at processing and analyzing large volumes of distributed, unstructured, structured and semi-structured data in batch or near real-time fashion for analysis !

•NoSQL databases are adept at storing and serving up multi-structured data in near-real time for web-based applications !

•Massively parallel OLAP databases are best at providing analysis of large volumes of mainly structured data - Teradata !

•SAS/R - Modeling and Business Intelligence !

•Tableau - Visualization

Page 22: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Embrace the Most Important Change: Culture

22

Democratize your data and reap the benefits!

Page 23: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Why is Hadoop Complementary?

23

Page 24: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

What we accomplished?

24

•Evangelized Hadoop !

•Linked Hadoop to BI Tools !

•R on Hadoop !

•A fail fast iterative analytics approach

Page 25: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Lambda Architecture as the foundation

25The master dataset is the only part of the Lambda Architecture that absolutely must be safeguarded from corruption. So for this reason the faulttolerance and redundancy of HDFS adds tremendous support. The master data set is also referred to as "Raw Data" or "Bronze Data" in ourreference architecture and wiki in general. Information created in the serving and speed layers is also referred to as "Silver Data".

Reference ImplementationThe implementation of the is shown below. Over time Hadoop will interact with various BI tools and other technologies andreference architectureassist in multiple domains that cover, ETL (for data cleansing, data movement), analytics, archival, log parsing, realtime convergence of data,assistance in the BI space and much more. Below is one representation of this interaction.

Credit Nathan Marz - Big Data

λ

Page 26: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

What we accomplished?

26

•ETL - Ingest, Transform and Move patterns !

•Logs generated from consumer channels were ingested with Flume !

•Standardized on Parquet (Storage) and Snappy (Compression) !

•Lifecycle and organization of Data on HDFS !

•LUKS - dm-crypt — for data at rest encryption !

•Sentry and LDAP for Role Based Access Control

Page 27: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

A Custom NLP Framework

27

Page 28: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

A Roadmap to the Future

28

Page 29: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

A Roadmap to the Future

29

!Data Driven Solutions + FP

!!

“Functional Programming: I came for the concurrency, but I stayed for the Data Science”

Dean Wampler

Page 30: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

The Hadoop Stack – Advanced View

There’s also Workflow Management with Oozie.

30

Page 31: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Lessons Learned

31

•Overuse of the words “Big” & “Data” !

•The overlap !

•Everyone found a use for Hadoop !

•Big Change/Baby Steps !

•Agility + Process = Cognitive Dissonance

Page 32: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

Healthcare company needs

32

•Security !

•Vendors !

•Vendor Partnerships

Page 33: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

WWYS

33

“Difficult to see. Always in motion is the future…”

Yoda !

“Many of the truths that we cling to depend on our point of view.”

Yoda !

The Journey of a thousand miles begins with one cluster…

Page 34: Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Promised Land -- A Case Study of Hadoop in Healthcare

34

Questions? !

Mohammad Quraishi (IT Senior Principal - Cigna) [email protected]