hbasecon 2013: evolving a first-generation apache hbase deployment to second generation and beyond

|

Evolving a 1st Generation HBase Deployment to 2nd

and BeyondDoug Meil

Chief Software ArchitectHBase Committer

HBaseCon2013

|

Company Background

|

Comprehensive view of care including all venues of delivery representative of all major diseases, treatments, and demographics

14 integrated delivery networks with over 200 hospitals and 100,000 providers

$46 billion in care delivered annual by our network members

24 million truly unique patients

The Explorys Value Based Care Big Network

|

ClinicalEMRs, claims, labs, registries,

reported outcomes

OperationalProviders org charts, practices,

locations, departments, physical assets, and care workflow

FinancialPrivate / payer claims, billing,

patient accounting systems

The Explorys Platform

PCP Specialist Hospital Post

acuteLong term Home Mobile

Full view of the continuum of care & cost

Secure | Cost Effective | Ready Now

Start with Data Completeness

Aggregation Patient matching Curation & attribution Data governance

Engines

Profiling Risk

analytics Prediction

Insight

|

Why HBase?

|

HBase at Explorys

Transactional Store

HBase is our transactional data store

e.g., Clinical and Administrative Data

Why?Flexible data model, Operational Scalability

General Store

Clinical Indexes for searching

Generated results like Measures and Registries

Why?Operational Scalability, Fast Lookups

|

Source 1

Source 2

Source 3

Source 4

Explorys Apps

1Extract & Load

Loads (Puts)1 Read (Scan)2 Bulk-Load3 Multi-Get4 Impala5

5 Queries

MultiGet

4

Power Search

2

Patient Chart

M/R M/

R

“Late Binding” Transformation & Standardization

Generated Results / Indexes

3

Explore

Measure

Registry

Engage

High Level HBase Usage Overview

|

Functional Examples

|

NQF 0575 Example (Simple Example, Condensed)Initial Population

Patients >= 17 and <= 74 before the start of the measurement period

Denominator

2 encounters (non-acute and outpatient) and an active diagnosis of diabetes

Or

Active meds indicative of diabetes

All within 2 years or during the measurement end-date

Exclusions

Things like active diagnosis of gestational diabetes will exclude patient from denominator

Numerator

Most recent HbA1c test < 8%

Measures Generated in MapReduce

Measure Calculations

|

Measure Results Generated to HBaseResults by

Measure Attributed Provider Patient Reporting Window … generated to HBase

Lots of Generated DataHundreds of Measures Generates Hundreds of Millions of Measure Results Per

Day

Measure Generated Data

|

Heart Failure Functional Example No evidence of Myocardial Infarction THEN a prescription for Angiotensin-converting enzyme (ACE) inhibitor agent THEN Myocardial Infarction within one year

C. Diff. Infection Functional Example Ambulatory Encounter THEN an Inpatient Encounter THEN evidence of C. Diff. infection within 10 days THEN an Ambulatory Encounter within 30 days

SummaryNoSQL works well as the backend implementation for these kinds of “queries” because it takes complex logic to satisfy this result.

PowerSearch

|

Technical Details

|

DistroCDH4.2.1

Hadoop Knobs HDFS Local read shortcut on HDFS Drop behind reads, Read-ahead on Snappy for MR temp files Read-ahead for MR temp files MR heartbeat on task finish

Cluster Information

|

HBase Knobs We pre-split our tables We Use KeyPrefixRegionSplitPolicy Snappy CF compression HLog compression on RegionSize still 2-3 Gb (we’ve tested bigger, but staying here for now)

HBase Knobs Under Consideration HBase Checksumming - currently off, but will probably turn on FAST_DIFF encoding – currently not in use, but will probably use for lookup

tables

Cluster Information

|

Compression (HDFS and HBase)LZO Snappy

HBase Key Redesign Our initial HBase RowKeys were too beefy and too Stringy.

• Refactored to be tighter. Column names a bit too descriptive initially Changes related to the new KeyPrefixRegionSplitPolicy.

HBase Table ManagementWe have a layer of metadata around our MR jobs and apps and re-create our

tables from time to time, which makes schema changes easier.

What Have We Changed?

|

HBase Loading Index tables loaded with bulk-loading Experimented with WAL off and deferred log flushing, but bulk-loading is

better.

HBase Gets When we started multi-Get didn’t even exist in HBase! This feature was very much appreciated, our DAO layer was modified to

accept batch requests.

• Minimizing RPCs makes a difference.

SQL?Impala against HBase for internal data investigation

What Have We Changed?

|

Data Browsers We’ve built our own data browser for data inspection, and continue to add to it. This isn’t going away any time soon and is highly used. Also kind of necessary if you store complex objects in HBase

HBase Filters We have some. Didn’t initially, but they have proven quite useful.

Things We’ve Built

|

Questions?Doug MeilChief Software [email protected]

www.explorys.com

Thank You!

mailto:[email protected]

hbasecon 2013: evolving a first-generation apache hbase deployment to second generation and beyond

Technology

comprehensive view of

explorys apps

generation hbase deployment

patient chart

patient accounting systems

example simple example

unique patients

network members