healthcare payer - big data integration

14
Table Of Contents Abstract Title: MMIS - Big Data Integration ................................................................................ 3 Author : Rajasekaran Kandhasamy ......................................................................................... 3 Overview of Paper: ................................................................................................................................... 3 Introduction: ......................................................................................................................................... 3 Note: ................................................................................................................................................. 3 Functional Specification: ........................................................................................................................... 3 Primary Use Cases: ................................................................................................................................ 3 1) Unified File Management (UFM) - Heterogeneous Storage Solutions (HSS) [or] Cloud Data Center (CDC): .................................................................................................................................... 3 2) Unified Claims Archiving (UAC) ................................................................................................. 4 3) Extract, Transform and Load (ETL) Integration ......................................................................... 4 4) Process Large Audit or Log Files ................................................................................................ 5 Secondary Use Cases: ........................................................................................................................... 5 5) Near - Continuous data protection (CDP) or Backup & Recovery............................................. 5 Value to Payers: ........................................................................................................................................ 6 Technical Specification:............................................................................................................................. 7 High Level Architecture: ........................................................................................................................ 7 1) Unified File Management (UFM) - Heterogeneous Storage Solutions (HSS) [or] Cloud Data Center (CDC) Flow: ................................................................................................................................ 8 2) Unified Claims Archiving (UAC): .................................................................................................... 9 HBase Claim Archival Sample Data Model: ..................................................................................... 10 3) Extract, Transform and Load (ETL) Integration: .......................................................................... 11 Proposed system: ............................................................................................................................ 11 Design Option 1:.............................................................................................................................. 11 Design Option 2:.............................................................................................................................. 11 4) Process Large Audit or Log Files: ................................................................................................. 12 Security Audit ............................................................................................................................. 12 Application Audit ............................................................................................................................ 12 5) Near - Continuous data protection (CDP) Backup & Recovery: .................................................. 13 Proposed system: ............................................................................................................................ 13 Backup: ............................................................................................................................................ 13

Upload: rajasekaran-kandhasamy

Post on 15-Jul-2015

185 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Healthcare payer - Big data integration

Table Of Contents

Abstract Title: MMIS - Big Data Integration ................................................................................ 3

Author : Rajasekaran Kandhasamy ......................................................................................... 3

Overview of Paper: ................................................................................................................................... 3

Introduction: ......................................................................................................................................... 3

Note: ................................................................................................................................................. 3

Functional Specification: ........................................................................................................................... 3

Primary Use Cases: ................................................................................................................................ 3

1) Unified File Management (UFM) - Heterogeneous Storage Solutions (HSS) [or] Cloud Data

Center (CDC): .................................................................................................................................... 3

2) Unified Claims Archiving (UAC) ................................................................................................. 4

3) Extract, Transform and Load (ETL) Integration ......................................................................... 4

4) Process Large Audit or Log Files ................................................................................................ 5

Secondary Use Cases: ........................................................................................................................... 5

5) Near - Continuous data protection (CDP) or Backup & Recovery ............................................. 5

Value to Payers: ........................................................................................................................................ 6

Technical Specification: ............................................................................................................................. 7

High Level Architecture: ........................................................................................................................ 7

1) Unified File Management (UFM) - Heterogeneous Storage Solutions (HSS) [or] Cloud Data

Center (CDC) Flow: ................................................................................................................................ 8

2) Unified Claims Archiving (UAC): .................................................................................................... 9

HBase Claim Archival Sample Data Model: ..................................................................................... 10

3) Extract, Transform and Load (ETL) Integration: .......................................................................... 11

Proposed system: ............................................................................................................................ 11

Design Option 1:.............................................................................................................................. 11

Design Option 2:.............................................................................................................................. 11

4) Process Large Audit or Log Files: ................................................................................................. 12

Security Audit ............................................................................................................................. 12

Application Audit ............................................................................................................................ 12

5) Near - Continuous data protection (CDP) Backup & Recovery: .................................................. 13

Proposed system: ............................................................................................................................ 13

Backup: ............................................................................................................................................ 13

Page 2: Healthcare payer - Big data integration

Recovery: ........................................................................................................................................ 14

Hive Claim Backup Sample Data Model: ......................................................................................... 14

Page 3: Healthcare payer - Big data integration

Abstract Title: MMIS - Big Data Integration

Author : Rajasekaran Kandhasamy ([email protected])

Overview of Paper:

Introduction: MMIS/HealthCare Payer Applications depend upon traditional data base models and

structured data analytics to fulfill their needs. These approaches, while adequate in the past, will not

suffice to address future requirements. They lack the processing capability to load and query multi-

terabyte datasets in a timely fashion and the flexibility to effectively manage unstructured and semi-

structured data. Adapti g Big Data platfo to MMI“ appli atio ill esol e a o e issues.

This technical paper provides details about integrating MMIS/HealthCare payer

appli atio s ith Hadoop ased Big Data platfo .

Note: MMIS is big application and hence paper covers use cases only related to claims.

The proposal is not a replacement for OLTP database approach and just an idea what benefits we can

get if we integrate with Hadoop technologies. Another Non MMIS application covered here is MMIS

BI/BIRT/JASPERSOFT is nothing but business intelligence analytical tools with chart/report capabilities.

Simply open source BI or reporting tool to manage big data activities.

Functional Specification:

Primary Use Cases:

1) Unified File Management (UFM) - Heterogeneous Storage Solutions (HSS) [or]

Cloud Data Center (CDC): is a multi-source data collection/management platform that

delivers backup, archive, search, and analytics capabilities to Medicaid or HealthCare

Payer applications. Introducing cloud based options for data that must be kept for

extremely long periods of time. Simply, consolidate MMIS file transfer processes under

one managed solution.

CDC provides connectivity for the flow of data in the form of files between providers, state

agencies and switch vendors and Enterprise System.

Within the context of MMIS, A CDC/HSS is one which describes an interaction

between external entities, system or service agency (e.g. Switch Vendor, Provider Agency)

with the MMIS/Payer applications. This interaction could involve transfer of data and

store the data. Most of these external interfaces will be file based inbound and outbound

interfaces which come in batches. External systems will be exchanging the data with the

MMIS in different formats. Each data file may contain one or more records. The possible

file formats are:

a. X12 files

b. XML files

c. Flat files (Delimited and Fixed width data, comma separated values)

d. Binary data

Page 4: Healthcare payer - Big data integration

Advantages:

While the majority of files in most payer applications are stored on IT-managed

file servers that are not always under the direct control of IT. Here we maintain

all in one umbrella.

Reduce li e si g, ai te a e a d suppo t osts of file se e s. De elop a o vendor lock-i f a e o k.

Scalable storage (Hadoop) environment with CDC a solutio that does ’t require a change in platforms or the retraining of IT employees and

administrators. CDC delivers comprehensive capabilities more efficiently than

ad hoc data/file management systems do, allowing an enterprise to dedicate

fewer resources to supporting infrastructure than to innovating, so it can

quickly bring its innovation to market.

Reduce MMIS operational costs with respect to data storage, backup, archive

and maintenance.

Minimize analytical latency in big data like environment.

External systems can connect to UFM/CDC/HSS using any FTP/SFTP client or

REST based Resource Oriented Architecture (ROA).

2) Unified Claims Archiving (UAC): Preserve information for compliance, legal, business

reference, or system optimization purposes. Archiving capabilities that MEDICAID may not

believe they need now but, given current archive market trends, will be extremely useful

to them in the near future. The combination of increasing manual and machine-generated

data and increasingly larger file/message/database sizes. For a variety of reasons, a good

portion of this data needs to be archived. Once a claim case was closed it became the

responsibility of the MMIS/HealthCare Payer to archive and manage the closed file in

compliance with regulations.

Advantages:

Data associated with claims processing is a good candidate for data archival. If

the size of a production table gets too large there will be a distinct impact to

retrieval time. Most of the core system screens include limits on the number of

records the screen will retrieve and display. When tables are large, screens will

not display all of the applicable data and some screens will not function.

Moving old data from MMIS OLTP to Hadoop based HBase can increase MMIS

OLTP performance. More number of unwanted/old unused records in claims

table would decrease the performance.

Historical information for comparative and competitive analysis.

Enhanced data quality and completeness.

Supplementing disaster recovery plans with another data backup source.

3) Extract, Transform and Load (ETL) Integration: Most ETL software packages require

their own servers, processing, databases, and licenses. They also require setup,

Page 5: Healthcare payer - Big data integration

configuration, and development by experts in that particular tool, and those skills are not

always transferable.

Advantages:

For instance MMIS reference sub module receives different set of procedure,

diagnosis and other codes as files from CMS periodically. These files are stored

in CDC and Hadoop based Pig/Hive used to convert this file data into MMIS

understandable format with less expensive manner.

Note: Most of the existing MMIS uses Pl/SQL based ETL for loading data into

MMIS DB. If the MMIs don’t want to break existing flow then use CDC adapter to

get file from Hadoop instead of traditional file server.

UFP/ CDC can easily be stored inexpensively in the cloud and processed by Hive

to ETL data. It is a cost-effective complement to data warehouse solutions, and

it reduces risk, cost, and/or improves accessibility over in-house solutions. Once

data is processed and stored in Hive it does make sense to consider the various

file formats available.

4) Process Large Audit or Log Files: Audits are historical and immutable. We can

segregate MMIS audits in two categories.

a) Security Audit: MMIS application keep logging user actions in the form MMIS

file and this file will be moved to Hadoop based HBase NoSql DB. Through

MMIS BI application user can view who logged in, what actions he performed

and so on.

b) Application Audit: Existing MMIS have following application audit options,

1) DB Triggers,

2) Module specific code will insert for each user operation. E.g. Error codes

view history.

New proposal uses JMS or queue: A scalable approach, if you really need it,

and one that is completely in line with the J2EE specification, is to use JMS.

That is, publish your audit log messages to a message queue, and another,

separate process (Flume), can take them off the queue and log them either in

Cloud Data Center (CDC) based HBase NoSQL database.

Secondary Use Cases:

5) Near - Continuous data protection (CDP) or Backup & Recovery: Near-continuous

data protection (near CDP) is a general term for backup and recovery products that take

backup snapshots at set intervals. CDP technology protects data on a nearly continuous

basis. Rather than running a large monolithic backup overnight, CDP products back up

data every few minutes, 24 hours a day.

Advantages:

N-CDP is a Hadoop-based backup solution that efficiently and cost-effectively

protects business-critical healthcare data such as databases, and files.

Page 6: Healthcare payer - Big data integration

By default Hadoop enables near-instant recovery from disasters and other

replication features.

By providing continuous and periodic protection, N-CDP allows organizations to

enhance or eliminate their tape-backup infrastructures, minimizing software

license and maintenance fees as well as hardware and tape costs.

Recovery Point Objective (RPO) refers to the point in time in the past to which

you will recover.

Recovery Time Objective (RTO) refers to the point in time in the future at which

you will be up and running again.

Difference between CDP and N-CDP:

CDP backup the data for every action on data. But N-CDP take backup on user

defined regular interval.

Value to Payers:

Proudly say Payers is in cloud and big data market.

The cloud based data centers can subscribe by other parties/state/payer with agreed

SLAs. So there is no separate data centers maintenance required for each state or payer.

Reduce licensing, maintenance and support costs. Go with a o e do lo k-i framework. Wherever possible avoid licensing software run along with MMIS and go with

open source proposed tools. For e.g,

a) Informatica - Use CDC based Hadoop ETL tools.

b) FTP Server - Use CDC.

c) COGNOS or Other BI tools - Use MMIS BI/BIRT/JASPERSOFT based open source

analytical tool.

d) Archive and backup tool - Use proposed approach.

By developing more operational and analytical related use cases with this integration

will move Payers into business intelligence tool market.

SaaS/multi-tenant enabled MMIS BI application can use by several customers with low

infrastructure cost maintenance.

Much and more big data advantages.

Page 7: Healthcare payer - Big data integration

Technical Specification:

High Level Architecture:

Maryland MMIS Tenant (E.g.)

SFTP

over

Hadoop

Providers

Agencies

Others

Inbound Landing Zone

Outbound Landing Zone

Claims/Reference/TPL

Member

Provider

Others

D

a

t

a

M

a

r

t

HBase/ Hive Flume Pig/Sqoop Oozie Tools/YARN

EHR/Cognos/Others MMIS BI/BIRT/JASPERSOFT DB MMIS/ HealthCare Payer DB

Page 8: Healthcare payer - Big data integration

1) Unified File Management (UFM) - Heterogeneous Storage Solutions (HSS) [or] Cloud

Data Center (CDC) Flow:

Cloud Data Center:

i. External clients can upload files to their dedicated inbound directory through

FTP/SFTP.

ii. Here we use apache mina based customized SFTP to support Hadoop file

system.

iii. Once the files are placed, the MMIS listening queues pick file from HDFS and

start claim processing as per the above flow.

iv. Also MMIS BI to be capable of REST enabled service to upload files.

MMIS BI Unified File Management:

i. UFM is one of the sub module in MMIS BI application.

ii. Using SaaS MMIS BI, user can view complete inbound and outbound file details

under single point of access for the particular tenant.

iii. Different kind of charts/metrics used to monitor day to day file activities in CDC.

Note: Software As A Service (SaaS) MMIS BI name depicts that the application is tenant

aware. So same application services can be used by other subscribers.

EDI Claim Flow

Paper Claim Flow

Reference File Loading

Emdeon/OCR

SFTP over

Hadoop

EDI Claims

Paper Claims

Reference Files

Claims/Ref FTP/SFTP Clients

CDC -HDFS

Img Archival

PL/SQL

Hippa Validation

Claims Loader

Hippa Translation

Loading Process

MMIS DB

Claim OCR Data

SaaS - MMIS BI Unified File Management

File Monitoring

Ref File Read & Load

Page 9: Healthcare payer - Big data integration

2) Unified Claims Archiving (UAC):

MMIS APP:

Through MMIS application user can perform different type of archival as per above

diagram.

Once archival initiated, Sqoop module will trigger. This Sqoop module load detail from

MMIS Db to CDC based HBase db. Claim HBase data structure depicts in below diagram.

MMIS BI:

Through MMIS BI application user can view different type of claims related charts. This

will read data from HBase NoSQL DB.

O e est e a ple is Ope atio al Met i s , he e use a ie paid lai s th ough certain period of the time.

Also appli atio suppo t A al ti al Met i s , he e use a ie lai fo e ast fo certain period of time.

MMIS DB

Cloud Data Center - Hadoop

HBase - NoSQL Database

Claims Data Mart

Archive Table Backup Table Other Tables

MMIS BI

Apache Sqoop

MMIS APP

Provider type based archival

Claim type based archival

Date wise archival

Claim status based archival

Quarterly archival

Yearly archival

Claims Archive Operational Metrics (PAST)

Claims Archive Analytical Metrics (FUTURE)

Page 10: Healthcare payer - Big data integration

HBase Claim Archival Sample Data Model:

Around 50 - 80 tables are involved in claim adjudication related process. The below

section depicts mapping OLTP claim data model to HBase based NoSQL data model.

HBase currently does not do well with anything above two or three column families so in

this design we have one column family for all header related tables and one for claim

line related tables.

I elo diag a all heade elated ta le e t ies go i to HEADER FAMILY a d li e ite elated ta le e t ies go to LINE FMAILY .

“<ChildTableName_RecordNumber_ColumnName> is the generic format to insert the

values. This is nothing but mapping OLTP one to many to NoSQL tables. E.g: OLTP claim

Header cutback table entries go here as CUTBACK_1_QLFR= CUTBACK (Table name),

1(Record Number), QLFR (Column name).

Row - Key: <claimfiledate_claimtype_providername>

HBASE DATABASE

CLM_ARC_TB

HEADER FAMILY

TCN

CUTBACK_1_QLFR

TPL_1_AMT

LINE FAMILY

ATTACHMENT_1_NAME

PRVDR_1_LCTN

PROCEDURE_1_CODE

Page 11: Healthcare payer - Big data integration

3) Extract, Transform and Load (ETL) Integration: Taxonomy codes, HCPCS, Correct Coding

Initiative (CCI), Diagnosis Related Group Codes (DRG), Medicare Physician Fee Schedule (MPFS),

ICD‑10, Clinical Lab Fee Schedule codes are the few interface reference files where payer will

receive from CMS/State/Others. All are claim reference codes to adjudicate the claims and these

needs to update periodically in MMIS DB.

Proposed system:

Design Option 1:

CMS/State/Others can place the reference files in CDC.

MMIS DB procedures pick the files from CDC and start loading the file content

into MMIS DB.

Design Option 2:

CMS/State/Others can place the reference files in CDC.

Apache Pig application is the ETL transaction model that describes how a

process will extract data from a CDC, transform it according to a rule set and

then load it into Apache Hive.

Apache Sqoop loads the details from Apache Hive to MMIS DB.

CMS/State/Others

MMIS DB

Cloud Data Center - Hadoop

HIVE

Reference Data Mart

Reference Table

Apache Sqoop

Apache Pig - ETL

HDFS EXTRACT

TRANSFORM

LOAD

Page 12: Healthcare payer - Big data integration

4) Process Large Audit or Log Files:

Whenever user logged into the system MMIS start capturing user page actions in file

format.

Apache flume listening to this file and whenever row added this information moved to

HBase DB.

User can use BI tool view the details in allowed formats.

Security Audit

Application Audit

Security Log File

MMIS APP

Cloud Data Center - Hadoop

HBASE

SECUITY_AUDIT Table

Apache Flume

Live Streaming

All User Actions

MMIS BI

Live Data where can see logged

in user actions

JMS Queue

MMIS APP

Cloud Data Center - Hadoop

HBASE

APPLICATION_AUDIT Table

Apache Flume

Application audit

All User Modifications

MMIS BI

Live Data where can see logged

in user modifications

Page 13: Healthcare payer - Big data integration

5) Near - Continuous data protection (CDP) Backup & Recovery:

Proposed system:

Backup:

Design Option 1: One time full load and subsequent update based on time stamp.

MMIS BI integration module triggers backup service at every one hour.

Backup service calls java sqoop client with claim tables as parameter.

One time activity: Sqoop get connect with MMIS DB and start to import

complete table data. Its start with header table and subsequent child table will

get load iteratively.

Sqoop support alternate table update strategy supported is called lastmodified

mode. So when rows of the source table (MMIS table) may be updated, and

each such update will set the value of a last-modified column to the current

timestamp. Only those records get update in Hive side and new records will be

inserted as usual in Hive.

Design Option 2: Complete MMIS snapshot every time.

MMIS BI integration module triggers backup service at every one hour.

Backup service calls java sqoop client with claim tables as parameter.

Every time sqoop import complete data from MMIS DB. Header table first and

child tables next.

Each primary key appended with job trigger time and this will be used in

recovery time.

Cloud Data Center - Hadoop

MMIS DB

HIVE

Claims Header Claims Line Claims Other Tables

Sqoop Import Sqoop Export

Backup

MMIS BI

Backup

ESB

BackupService

Recovery

UI -Enter Recovery Point

RecoveryService

Load data from mentioned time

Regular Interval

Recovery

Page 14: Healthcare payer - Big data integration

Recovery:

If e hoose Desig Optio e o e ti e is ot e ui ed. Be ause this is exact MMIS DB copy.

If e hoose Desig Optio the e o e ti e is list of trigger time from

MMIS BI context. So user can choose any one of the time and snap shot data

obtained during that time will get load from cloud to MMIS DB.

If i ase a disaste happe s to MMIs DB, e a use Re o e odule to load data from cloud to MMIs.

Sqoop will export the data from Hive table to MMIS tables

Hive Claim Backup Sample Data Model:

There are no differences in MMIS data model and HIVE data model.

Mo e o less all a e sa e fo Desig Optio . Fo Desig Optio all p i a ke s will be appended with additional

timestamp surrogate key.