anomaly detection in medicare provider data using oaagraph · 2019-01-31 · compute personalized...

27
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | Anomaly Detection in Medicare Provider Data using OAAgraph Sungpack Hong (Oracle Labs) Mark Hornick (Oracle Advanced Analytics) Francisco Morales (Oracle Labs) March 21, 2018

Upload: others

Post on 14-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Anomaly Detection in MedicareProvider Data using OAAgraphSungpack Hong (Oracle Labs)Mark Hornick (Oracle Advanced Analytics)Francisco Morales (Oracle Labs)March 21, 2018

Page 2: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Safe Harbor StatementThe following is intended to outline our research activities and general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Page 3: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Insights

3

Page 4: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Graph Analytics Machine Learning

Compute graph metric(s) Add to structured data Build predictive model

using graph metric

Build model(s) and score or classify data

Add to graphExplore graph or compute

new metrics using ML result

Approach problem from two perspectives

4

Page 5: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

OAAgraph• An R package integrating Parallel Graph AnalytiX with Oracle R Enterprise

• Single, unified interface– Work with R data.frame proxy objects (ore.frame) for database data and familiar

functions across ML and graph– Results available as R data.frame proxy objects allowing further processing

• R users take advantage of powerful, complementary technologies available with Oracle Database– Highly scalable PGX engine, part of Oracle Spatial and Graph option– Integrated with Oracle R Enterprise, part of Oracle Advanced Analytics option

5

Page 6: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

PGX (Parallel Graph AnalytiX)• In-memory graph engine

• Fast, parallel, built-in graph algorithms

• 35+ graph algorithms

• Graph query (pattern-matching) via PGQL

• Custom algorithm compilation (advanced use case)

• PGX also available on Hadoop and NoSQL

Detecting Components and CommunitiesTarjan’s, Kosaraju’s, Weakly Connected Components, Label Propagation (w/ variants), Soman and Narang’s

Ranking and Walking Pagerank, Personalized Pagerank,Betweenness Centrality (w/ variants),Closeness Centrality, Degree Centrality,Eigenvector Centrality, HITS,Random walking and sampling (w/ variants)

Evaluating Community Structures

∑ ∑

Conductance, ModularityClustering Coefficient (Triangle Counting), Adamic-Adar

Path-Finding Hop-Distance (BFS)Dijkstra’sBi-directional Dijkstra’sBellman-Ford’s

Link Prediction SALSA (Twitter’s Who-to-follow)

Other Classics Vertex CoverMinimum Spanning-Tree(Prim’s)

6

Page 7: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Oracle R Enterprise• Use Oracle Database as a high performance compute environment• Transparency layer

– Leverage proxy objects (ore.frames) - data remains in the database– Overload R functions that translate functionality to SQL– Use standard R syntax to manipulate database data

• Parallel, distributed ML algorithms– Scalability and performance– Exposes in-database machine learning algorithms from ODM– Additional R-based algorithms executing and database server

• Embedded R execution– Store and invoke R scripts in Oracle Database – Data-parallel, task-parallel, and non-parallel execution– Invoke R scripts at Oracle Database server from R or SQL– Use open source CRAN packages

7

Oracle Database

User tables

In-dbstats

Database ServerMachine

SQL InterfacesSQL*Plus,SQLDeveloper, …

Oracle R EnterpriseR Client

Page 8: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

OAAgraph with Oracle Database

Client

Database Server

R Client

ORE

Oracle Database

PGX Server

# Connect R client to # Oracle Database using ORER> ore.connect(..)

# Connect to PGX server # using OAAgraphR> oaa.graphConnect(...)

OAAgraph

Page 9: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Data Sources• Graph data represented as two tables

– Nodes with properties– Edges with properties

Database Server

Oracle Database

PGX Server

Node ID

Node Prop 1(name)

Node Prop 2(age)

1238 John 39 …

1299 Paul 41 …

4818 … … …

From Node

To Node Edge Prop 1(relation)

1238 1299 Likes …

1299 4818 FriendOf …

1299 6637 FriendOf …

Node Table Edge Table

edge1node1 edge2node2

Page 10: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

R Client

Loading Graph

Client

Database Server

ORE

Oracle Database

PGX Server

# Load graph into PGX:# Graph load happens at the server side.# Returns OAAgraph object – a proxy # for the graph in PGXR> mygraph <-

oaa.graph (EdgeTable, NodeTable, ...)

edgenode

OAAgraph

Page 11: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

R Client

Running Graph Algorithm

Client

Database Server

ORE

Oracle Database

PGX Server

# e.g. compute Pagerank for every node # in the graph # Execution occurs in PGX server sideR> result1<- pagerank (mygraph, ... )

OAAgraph

Page 12: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

R Client

Exporting the result to DB

Client

Database Server

ORE

Oracle Database

PGX Server

# Export result to DB as Table(s)

R> oaa.create(mygraph, nodeTableName = “node”,

nodeProperties = c(“pagerank“, … ),

… )

EDGESNODES

OAAgraph

Page 13: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Anomaly Detection in Healthcare BillingBackground and Introduction

Page 14: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

About the Dataset• A public dataset from US Center for Medicare and Medicaid Services (CMS)

– Health-care Billing Data for CY 2012

– Aggregated medical transactions: 9,153,272 records with 29 variables

– Transactions between 880,644 medical providers and CMS with total amounts > $77B for the year

– Per provider/service aggregate counts,

and submitted/allowed/payment mean/sd

Page 15: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Anomalies in this Demo• Information in the dataset

– Providers (doctors) their services (treatment, operation, prescription…)– Specialties of providers (e.g. pediatrics, dermatology, …)

• Observation– Doctors of the same specialty provide similar services – What if a doctor perform a lot of treatments that typically belong to other specialties?

• E.g. a cardiologist doing plastic surgery?

• How do we find such cases?

By applying graph analysis on this dataset“There is a spy among us” (an internet meme)

Page 16: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

• A graph capturing relationships between providers and services– Vertex: health-care provider (LHS), and

health-care services (RHS)– Edge: there is an edge if the provider

has given the service

undirected, bipartite graph

• Vertices have associated properties – e.g. specialty, name, …

Creating a Graph From the Dataset

Health Providers Health Services

e.g. Dr. Victor Frankenstein,Podiatrist

Prescribe Aspirin

Page 17: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Graph Approach – Basic Idea

Specialty: Internal Medicine

Specialty: Plastic Surgery

Service: Administration of influenza virus vaccine

• In the graph view, providers of the same specialty are close to each other – They are closely connected by common

services that they provide

• We consider it anomalous if a provider vertex is exceptionally close to vertices of another specialty

• But how do we define such closeness? How to find them?

Page 18: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

• How do we define if two vertices in the graph are close to each other ?

• Shortest path – Using edge weights as ‘distance’

metric • Edge weight can be 1 hop-

distance– Classic graph algorithm: Dijkstra,

Bellman-Ford …

• Some considerations– What if there are multiple paths?– What about high-degree vertices in

between?

Graph Algorithm -- Closeness

A B C DVs.

Vs.A B C D

Page 19: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

• Personalized pagerank (PPR)– A variant of Pagerank algorithm*– Given a set of starting vertices– Repeating random walk (with restart)

from the starting vertices – Compute probability of visiting each

vertex in the graph– Computed value a natural relative

distance (or closeness) of vertices from the starting set

Graph Algorithm: Personalized PagerankStarting vertices

• Vertices that are ‘close’ would be visited more often naturally

• Shared edges also would make the vertex visited more often

* The algorithm becomes the normal Pagerank algorithm if the Starting Set equals to all the vertices in the graph

Page 20: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Anomaly Detection Procedure (sketch)1. Select a Specialty

2. Find the set of doctors of the specialty (starting set)

3. Compute Personalized Pagerank from the starting set

4. Find doctors of other specialty that have high values

– Pick up a threshold value (set from the minimum PPR values among the starting vertices)

– Mark high-valued vertices as anomalous

Doctors900,000 HCPCS

6,000

Same specialty(specialty set)

Anomalous (other specialty)

Anomalous (other specialty)

Edges9,000,000

Page 21: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

• There can be a large number of false-positives, though

• The case of Optometry a lot of providers higher than threshold

• Because some specialties are naturally close to each other

Dealing with False Positives

Distribution of PPR score (from Optometry)

Blue: Optometry doctors

Red: other doctors with high PPR values

Threshold

98.5% Ophthalmology

Page 22: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

• Very simple collaborative filtering– Group by anomaly-candidates to

their specialties – Focus on groups with least number

of providers

Dealing with False Positive

Page 23: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Browsing the record of the provider

Page 24: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

DemoUsing OAAgraph with RStudio

Page 25: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Summary• OAAgraph provides powerful, scalable graph analytics enabled

from R in Oracle Database with Oracle R Enterprise• Graph analytics is well-positioned for solving large-scale

anomaly detection problems with Spatial and Graph PGX

Page 26: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 26

Learn More about Oracle’s R Technologies…

http://oracle.com/goto/R

Page 27: Anomaly Detection in Medicare Provider Data using OAAgraph · 2019-01-31 · Compute Personalized Pagerank from the starting set 4. Find doctors of other specialty that have high