d. sobczak april 27, 2017 - semcacfe · 2017-04-29 · ingredients of data visualization with a...

76
Basic Data Analytics, Visualizations, and Predictive Analytics D. Sobczak April 27, 2017

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Basic Data Analytics, Visualizations, and Predictive Analytics

D. SobczakApril 27, 2017

Page 2: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Topics

• Tips on how to get started in data mining and basic analytic techniques.

• Storytelling techniques to convert analytical results to easy-to-understand presentations.

• Discussion on predictive modeling.

Page 3: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data
Page 4: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data
Page 5: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Median Loss -$975,000

J

k

Median Loss -$200,000

Types of Business Fraud

http://www.acfe.com/rttn2016/docs/Staggering-Cost-of-Fraud-infographic.pdf

Page 6: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Median Loss -$125,000

Types of Business Fraud (cont’d)

http://www.acfe.com/rttn2016/docs/Staggering-Cost-

of-Fraud-infographic.pdf

Page 7: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Median Losses Valued by Region

http://www.acfe.com/rttn2016/docs/Staggering-Cost-of-Fraud-infographic.pdf

Page 8: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Anti-Fraud Controls minimize the amount of the fraud theft.

http://www.acfe.com/rttn2016/docs/Staggering-Cost-of-Fraud-infographic.pdf

Page 9: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

http://www.acfe.com/rttn2016/detection.aspx

Page 10: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Fraud Triangle / Data Scientist Venn Diagram

http://www.internalauditor.me/article/the-fraud-triangle/http://drewconway.com/zia/2013/3/26/the-data-

science-venn-diagram

Page 11: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

CRISP-DM - Cross Industry Standard Process for Data Mining

Page 12: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

The Analytics Process – CRISP-DM (Cross Industry Standard Process for Data Mining)

https://itsalocke.com/crisp-dm/

CRISP-DM is an Iterative Process

Page 13: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

The Analytics Process – CRISP-DM (Cross Industry Standard Process for Data Mining)

Business Understanding• Understand the business goal• Assess the Situation • Translate the business goal into

a data mining objective• Develop a project plan

Data Understanding• Consider data requirements• Collect and explore the data• Determine quality of the data

Page 14: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

The Analytics Process – CRISP-DM (cont’d)(Cross Industry Standard Process for Data Mining)

Data preparation• Select needed data• Data acquisition• Data integration and formatting• Data cleaning• Data transformation and

enrichment

Modeling• Selection of appropriate modeling

technique• Splitting of the dataset into training

and testing subsets for evaluation purposes

• Development and examination of alternative modeling algorithms and parameter settings

• Fine tuning of the model settings according to an initial assessment of the model’s performance

Page 15: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

The Analytics Process – CRISP-DM (cont’d)(Cross Industry Standard Process for Data Mining)

Model evaluation• Evaluation of the model in the

context of the business success criteria

• Model approval

Deployment• Create a report of findings• Planning and development of the

deployment procedure• Deployment of the model• Distribution of the model results and

integration in the organization's operational system

• Development of a maintenance / update plan

• Review of the project• Planning the next steps

Page 16: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Why Use CRISP-DM? (Cross Industry Standard Process for Data Mining)

http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html

CRISP-DM is the most used methodology for analytics projects. It has been around since 1996 and was developed by IBM and is part of their SPSS software package.

The best reason to use a methodology is for standardization of approach.

Page 17: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

How to Minimize Fraud Using Data

An ExampleUsing Made Up Data

Page 18: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

The Analytics Process – CRISP-DM (Cross Industry Standard Process for Data Mining)

Business Understanding• Understand the business goal• Assess the Situation • Translate the business goal into

a data mining objective• Develop a project plan

Steps• Anomalous Providers• Chiropractors for 3 months• Determine which fields will

address the goal• Write down all steps needed to

complete task

Page 19: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

The Analytics Process – CRISP-DM (Cross Industry Standard Process for Data Mining)

Data Understanding• Consider data requirements• Collect and explore the data• Determine quality of the data

Steps• Where is the data, what is needed?• Collect and summarize the data• Determine quality of the data

Page 20: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

The Analytics Process – CRISP-DM (cont’d)(Cross Industry Standard Process for Data Mining)

Data preparation• Select needed data• Data acquisition• Data integration and formatting• Data cleaning• Data transformation and

enrichment

Steps• Chiropractor Claims• System – SQL Server• One data set• Ensure data is complete/accurate• Pull only variables that are needed

Page 21: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Project Plan – Chiropractor Claims

1. Determine if there are any questionable billings from Providers2. Get a sample of the data to determine the quality/quantity of the data, as

well as field names. (Chiropractor claims for X time frame)3. Summarize the data based on procedure codes by payment and count4. Speak with a Chiropractor or do research into procedures they can perform5. Determine how many procedure codes to use in your analysis6. Request data from IT or pull it yourself.7. Summarize data, again, based on procedure codes by payment and count

Page 22: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Criteria is Very Important

• Do you want:• To do the analysis once?• To provide bad results?• Your work to be valued and used?• To explain why it is incorrect?

• Understanding the business problem and data wrangling (cleaning and transforming the data) are 80% of the work

• Repeating this step is very inefficient• Once you have a process, continue to use it. Refine when necessary.

Page 23: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Warranty Information - Criteria

• Global/Specific Region• Local or US currency• Dealer/City/State/Country• Labor codes• Final payment flag• Category of Warranty• Total/labor/part payment

• Vehicle Production information• Dealer information• Payment schedules• Model Year/Calendar Year• Model• Engine RPO• Transmission RPO

With this many variables, it is wise to understand exactly what the requestor wants. This process is iterative, but should not be repeated over and over from the beginning. Get the correct data the first time!!

Page 24: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Chiropractic Medicine – Top 14 Procedure Codes – 3 Months of DataProcedure

CodesDescription Claim Count Paid

98941 Chiropractic manipulative treatment (CMT); spinal, three to four regions 2,281,358 78,064,87798942 Chiropractic manipulative treatment (CMT); spinal, five regions 462,381 20,663,39998940 Chiropractic manipulative treatment (CMT); spinal, one to two regions 591,984 15,872,04897012 Traction, mechanical 1,219,642 21,541,06172100 Radiologic examination, spine, lumbosacral; two or three views 64,904 2,579,04872070 Radiologic examination, spine; thoracic, two views 53,622 2,071,04199203 Office or other outpatient visit for the evaluation and management of a new patient 30,399 2,033,93972050 Radiologic examination, spine, cervical; minimum of four views 33,795 1,821,41072040 Radiologic examination, spine, cervical; two or three views 48,493 1,773,11072010 Radiologic examination, spine, entire, survey study, anteroposterior and lateral 22,552 1,491,33899202 Office or other outpatient visit for the evaluation and management of a new patient 30,809 1,306,16772170 Radiologic examination, pelvis; one or two views 33,201 1,032,78072110 Radiologic examination, spine, lumbosacral; minimum four views 12,319 675,283

99213 Office or other outpatient visit for the evaluation and management of an established patient

17,432 639,408

Total of above procedure codes 151,564,909Total of all chiropractic procedure codes 153,626,380

75%

14%

Page 25: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Manipulations

There are five regions of the vertebrae:1. Cervical2. Thoracic3. Lumbar4. Sacral5. Coccygeal

Procedure codes:98940 – one to two manipulations98941 – three to four manipulations98942 – five manipulations

Page 26: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Providers with High Percentages of 98942

Provider Total Total 98942 % 98941 % 98940 % Code Prov Pymt Count Prov Pymt 98942 Prov Pymt 98941 Prov Pymt 98940

All Claims 114,600,324 3,335,723 20,663,399 18% 78,064,877 68% 15,872,048 14%

J 424,710 9,750 405,937 95% 17,824 5% 949 0%E 305,118 6,305 300,941 99% 2,345 1% 1,832 0%F 280,632 6,181 253,826 91% 19,343 9% 7,464 0%G 255,393 5,268 236,332 93% 14,265 7% 4,796 0%H 245,337 5,684 229,482 93% 14,611 7% 1,243 0%C 239,414 5,714 208,748 86% 25,928 14% 4,738 0%I 213,776 5,212 202,576 94% 11,200 6% - 0%A 226,838 5,080 194,337 84% 26,907 16% 5,594 0%K 216,626 4,803 191,203 88% 20,330 12% 5,093 0%B 136,710 2,651 135,502 99% 899 1% 309 0%D 130,255 2,725 130,221 100% 34 0% - 0%

Page 27: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

What Else Can You Do With The Data?• Summarize by Provider and sort payment in descending

order – do the results make sense? Are there outliers?• How do you determine outliers? Easiest way is Boxplots.

Page 28: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data
Page 29: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

What Else Can You Do With The Data? (Cont’d)

• Determine how much Providers have worked in the 3 month period. • Have any providers worked Sundays/Holidays? • Every day? (see next slide)• What is the average number of days and have some worked a lot higher?

• Summarize by Provider and procedure code.• Do some Providers bill procedures codes a lot higher than others?

• Summarize by Provider and Patient counts per day (see next slide)

Page 30: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Provider P2

Page 31: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data
Page 32: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Rule-Based Data Mining

Page 33: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Rule-Based Data Mining

• Allergy Testing and Immunotherapy• Billing can be confusing – this procedure code is quantity processed, based on

dosages not cc’s in a vial and not on number of vials.• Percutaneous testing is done first and then for any inconclusive results, intracutaneous

testing is done.• Specialty – Allergists perform generally 7 times more percutaneous than intracutaneous tests.

Other specialties perform 0 to many percutaneous to intracutaneous tests.• Percutaneous tests are always done first. Other specialties are not following procedure.

• Providers, many times bill based on cc’s. This leads to huge overpayments. Almost 90% of dosages should be between 1 and 20. Amounts above this level may be worth looking at.

• Evaluation and Management services should not be reported simultaneously with allergy/immunotherapy injections unless a separate service was provided.

Page 34: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Rule-Based Data Mining (cont’d)

• Echocardiography• 93307 - Echocardiography, transthoracic, real-time with image documentation

(2D), includes M-mode recording, when performed, complete, without spectral or color Doppler echocardiography

• 93320 - Doppler echocardiography, pulsed wave and/or continuous wave with spectral display (List separately in addition to codes for echocardiographic imaging); complete

• 93325 - Doppler echocardiography color flow velocity mapping (List separately in addition to codes for echocardiography)

• Medically used to evaluate the possible diseases of the aorta, shunts, septal defects, and to determine the severity of any meaningful heart valve narrowing (stenosis) or regurgitation (leaking backward) or of evaluations of prosthetic valves.

ACC/AHA Guidelines for the Clinical Application of Echocardiography. Circulation. 1997;95:1686-1744

Page 35: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Ideas for Data Mining

• Ask an expert what odd things they have seen and pull data for the expert to review

• Next steps would be to discuss the data and if further analysis is needed

• Prior audits with bad controls can be good starting points• Research what types of fraud are occurring in healthcare• Pharmacy

• Providers prescribing medications that do not make sense with their specialty (Podiatrist prescribing heart medication)

• Quantity of medication prescribed in total and by patient• Pharmacist with highest fulfillment of controlled substances

Page 36: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Ideas for Financial Fraud

• Vendor Fraud• Audit vendor list and remove any vendors that are no longer providing

services and verify remaining vendors on the list• Review newest vendors – last 5 years• Review recurring payments, especially direct deposits• Review employees banking information against vendor banking information• Analyze data for any patterns

• Analyze further for any suspicious transactions• Verify currency rates are calculated correctly

viding

Page 37: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

It was three feet deep on average.

Page 38: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Visualization – Make Your Results Shine

Page 39: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

10 GOLDEN RULES WILL HELP YOU CREATE THE MOST SUCCESSFUL DATA VISUALIZATIONS1. Begin with a goal. Starting with a goal provides the foundation to bring together

ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data to find new insights.

2. Know your data. This knowledge will also serve to verify that you have the best data to support your goal.

3. Put your audience first. Data visualization is rarely one size fits all, and its message can be lost if it’s not customized for its audience. What does your audience need to know?

http://www.dbta.com/BigDataQuarterly/Articles/10-Golden-Rules-of-Data-Visualization-114796.aspx

Page 40: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

10 GOLDEN RULES WILL HELP YOU CREATE THE MOST SUCCESSFUL DATA VISUALIZATIONS

4. Be media sensitive. iPad, Notebook, Cell Phone, etc. Considering how the visualization will be viewed will help you make sure your visualization reaches its audience.

5. Choose the right chart. Know the strengths of each chart type and what key features of data they best visualize.

6. Chart smart. Data visualizations should not distort, mislead, or misrepresent. Avoid cherry picking data and do not force the data to fit a message.

http://www.dbta.com/BigDataQuarterly/Articles/10-Golden-Rules-of-Data-Visualization-114796.aspx

ggg

750,000 vs 700,000

distorted

not distorted

Page 41: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

10 GOLDEN RULES WILL HELP YOU CREATE THE MOST SUCCESSFUL DATA VISUALIZATIONS

7. Use labels wisely. Give your audience context by including a simple and compelling title. Easy to read labels.

8. Design to the point. The key to designing data visualizations is to be straightforward. Ultimately, make sure everything on the visualization serves a purpose.

9. Let the data speak. Use visual cues strategically to guide the audience and draw their attention, but let the data tell the story, not the design.

10. Feedback is a good thing. Take time to fine-tune visualizations by engaging with stakeholders to gather feedback.

http://www.dbta.com/BigDataQuarterly/Articles/10-Golden-Rules-of-Data-Visualization-114796.aspx

Page 42: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

How do you know which type of data vizworks best?Consider the following guidelines:

• Line Charts track changes or trends over time and show the relationship between two or more variables.

• Bar Charts are used to compare quantities of different categories.• Scatter Plots show joint variation of two data items.• Bubble Chart show joint variation of three data items.• Pie Charts are used to compare parts of a whole and should be used carefully.

• Never compare two pie charts without clearly noting that the size of the pie may have changed as well.

https://www.gooddata.com/blog/5-data-visualization-best-practices

Page 43: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

https://img.labnol.org/di/data-chart-type.png?_ga=1.47105312.600219594.1492990363

Page 44: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Do’s and Don’ts of Charts

Do’s• Use the full axis• Simplify less important info• Be creative with legends/labels• Pass the squint test• Ask the opinion of others

Don’ts• Use 3-D or blow apart effects• Use more than 6 colors• Change the style of charts that

are being compared• Make users do visual math• Overload the chart

Page 45: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Some Really Bad Visuals

Page 46: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

http://www.businessinsider.com/the-27-worst-charts-of-all-time-2013-6#did-anyone-learn-anything-by-looking-at-this-pseudo-pie-chart-what-do-these-colors-even-mean-why-is-it-divided-into-quadrants-well-never-know-1

Page 47: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

http://www.businessinsider.com/the-27-worst-charts-of-all-time-2013-6#i-never-thought-it-was-possible-but-i-actually-understand-soccer-less-after-looking-at-this-chart-3

Page 48: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

http://www.businessinsider.com/the-27-worst-charts-of-all-time-2013-6#theres-a-lot-going-on-with-this-bloomberg-chart-that-doesnt-seem-like-an-evenly-cut-lamb-chop-and-while-im-not-a-biologist-i-have-a-strong-feeling-an-onion-is-not-a-melon-4

Page 49: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data
Page 50: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

What Do You Think of This Visual?

Page 51: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data
Page 52: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Question Everything!!!

• Data can be used to show whatever a person wants to show

• Make sure it is fair and unbiased• The method and assumptions

should be stated.• Ask to see the data, if necessary

Page 53: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Providers with High Percentages of 98942

Provider Total Total 98942 % 98941 % 98940 % Code Prov Pymt Count Prov Pymt 98942 Prov Pymt 98941 Prov Pymt 98940

All Claims 114,600,324 3,335,723 20,663,399 18% 78,064,877 68% 15,872,048 14%

J 424,710 9,750 405,937 95% 17,824 5% 949 0%E 305,118 6,305 300,941 99% 2,345 1% 1,832 0%F 280,632 6,181 253,826 91% 19,343 9% 7,464 0%G 255,393 5,268 236,332 93% 14,265 7% 4,796 0%H 245,337 5,684 229,482 93% 14,611 7% 1,243 0%C 239,414 5,714 208,748 86% 25,928 14% 4,738 0%I 213,776 5,212 202,576 94% 11,200 6% - 0%A 226,838 5,080 194,337 84% 26,907 16% 5,594 0%K 216,626 4,803 191,203 88% 20,330 12% 5,093 0%B 136,710 2,651 135,502 99% 899 1% 309 0%D 130,255 2,725 130,221 100% 34 0% - 0%

Page 54: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Four Providers Against the Norm

68%

14%

18%

100%

Page 55: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Which one is easier/quicker to understand?

Page 56: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Some Really Good Visuals

Page 57: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Counts

YEAR

Pinpoints problem and is easy to understand

Normalizes data for comparison

Page 58: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Many pages of documentation led to the final results. Management does not want to see the documentation, just the results.

This one pager, speaks of an investigation of 12 officers regarding 10 contracts of almost a $1B

The costs are broken out by various categories that are informative at a high level.

The highest costs are at the top of the page.

The lowest costs are at the bottom.

The currency is shown in $ and N (naira). Should be consistent.

Currency rate 1N = $.0033N22.5bn = $73.4m

http://www.orodataviz.com/project/arms-procurement-fraud/

Page 59: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

https://www.slideshare.net/Linkurious/kick-start-graph-visualization-projects

Page 60: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

https://public.tableau.com/en-us/s/gallery/world-golf-rankings

Page 61: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data
Page 62: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Predictive or Advanced Analytics

Page 63: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

The Internet of Things (IOT)

• The Internet of things (IoT) is the inter-networking of physical devices, vehicles, buildings, and other items—embedded with electronics, software, sensors, actuators, and network connectivity that enable these objects to collect and exchange data.

• In short connected devices can talk to each other. This is used in Autonomous Vehicles, among other things.

Page 64: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Q. Why Has Predictive Analytics Become Popular?A. The explosion of big data and the lowering cost of storing data

• What is big data?• 100 millions rows of a relational

database?• 100,000 rows of 10 connected

relational databases?• All financial statement

information/numbers?

• What specifically makes big data?• Words, pictures

• Examples of big data• Internet clickstream data• Web server logs • Social media content • Text from customer emails and

survey responses• Mobile-phone call/text-detail

records • Machine data captured by sensors

connected to the internet of things.

Page 65: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0ahUKEwjgouL8s8DTAhXK7YMKHYQ_CwsQjB0IBg&url=http%3A%2F%2Fwww.bangkokbiznews.com%2Fblog%2Fdetail%2F636201&psig=AFQjCNH_FRAXch9T60kP7gfUUH01NX61YA&ust=1493237070424052

The 4 V’s of Big Data

Page 66: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

http://www.abajournal.com/magazine/article/the_dawn_of_big_data

The need for large storage for data has grown due to unstructured data. Costs of storage has been coming down. In 1960, a hard drive cost per gigabyte was almost $1 million. Now, it is under a dime.

In November 2015, a petabyte (1 trillion kilobytes) of data cost $48,800.

http://www.mkomo.com/cost-per-gigabyte-update

Page 67: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Supervised vs Unsupervised Learning

• Supervised Learning • Output answer is provided.

• Fraud / No Fraud• Yes / No• A value is provided, i.e., house value

• Classification/Neural Nets/Decision Trees/Linear and Logistic Regression/Others

• Unsupervised Learning• No output answer is provided

• Looks for outliers• Marketing profiles

• Clustering/Kmeans/Gaussian Mixture Models

Page 68: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Supervised Learning Algorithms

http://www.dataschool.io/comparing-supervised-learning-algorithms/

Page 69: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Supervised Learning Algorithms (cont’d)

http://www.dataschool.io/comparing-supervised-learning-algorithms/

Page 70: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

http://scikit-learn.org/stable/tutorial/machine_learning_map/Unsupervised Learning

Supervised Learning

Supervised Learning

Page 71: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Four types of predictive analytic models are being used to detect fraud - rules-based, anomaly, predictive, and social networking. 1. Rules-based models flag certain

charges automatically. 2. Anomaly models raise suspicion

based on factors that seem improbable.

3. Predictive models compare charges against a fraud profile and raise suspicion.

4. Social networking models raise suspicion based on the associations of a provider.

1. Podiatrist subscribing heart medication.

2. A provider who billed more procedures than could be possibly performed in a day.

3. A provider is billing in a fashion similar to previous known fraudsters.

4. If certain providers worked with previously known fraudulent providers.

http://www.modernhealthcare.com/article/20150225/NEWS/150229947

Page 72: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Credit Card Fraud - $190B/yearLocation:

• Live in one place, but make a purchase in anotherWhat you buy:

• If your card is commonly used to buy your morning cup of coffee and then a tank of gas, and out of the blue is used to buy a pair of expensive designer shoes

Spending amount: • If you typically spend $500/month, and suddenly rack up $3000 in a week

Spending frequency: • If your card is used to make a large number of purchases over a short period of time

Large purchase after a smaller one: • Thieves typically test stolen credit cards with smaller purchases first; if the card works, they

will proceed to make another larger purchase, like an expensive camera, or television, or sound system

Digital origins: • The digital origins of purchases are recorded and analyzed with each purchase; if an IP

address had been used to commit fraud in the past, subsequent transactions from the same IP or network may be flagged

https://thinksaveretire.com/2015/09/14/how-credit-card-fraud-detection-works/

Page 73: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Neural Network – Thinks Like a HumanA computer system modeled on the human brain and nervous system

• Considered the black box of predictive analytics as you cannot explain how the algorithm came up with the result

inputsoutputs

http://www.ijsce.org/attachments/File/NCAI2011/IJSCE_NCAI2011_025.pdf

Black Box

Page 74: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Cluster Analysis

Unsupervised Learning:Clusters 1, 2, and 5 are considered outliers and may be fraudulent. Additional research should be performed on these data points.

https://pdfs.semanticscholar.org/c42e/861472fa629c37cf7ba0d329d168f0e2f890.pdf

Page 75: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Investigations – Text Analytics

How did the FBI analyze 600,000 emails in just a few days?• More efficient discovery through automated tools

• Search for various keywords and terms related to classified information and flag these emails for further review.

• Remove materials based on keywords in email From: fields - any emails sent from Netflix, Amazon, or eBay

• Remove duplicates using signatures or hash-representations of files

• Manual review of flagged emails

Page 76: D. Sobczak April 27, 2017 - SEMCACFE · 2017-04-29 · ingredients of data visualization with a purpose. Prompting a decision or action, or inviting an audience to explore the data

Dee [email protected]

586 514-0135