copyright © 2011 deloitte development llc. all rights reserved. data visualization for data qc...

22
Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

Upload: aubrey-wade

Post on 05-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

Copyright © 2011 Deloitte Development LLC. All rights reserved.

Data Visualization for Data QC

March, 2012 Steve BermanDeloitte Consulting

Page 2: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

2Copyright © 2011 Deloitte Development LLC. All rights reserved.

Agenda

Overview

Reasons to Use Data Visualization in Data Cleansing

Examples

Tools

Wrap-Up

Page 3: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

3Copyright © 2011 Deloitte Development LLC. All rights reserved.

• Checking data for reasonability is an important part of any modeling process• Can’t be avoided – no such thing as “perfect data”• Can be time intensive – the “preprocess” portion of any project

takes 60-80% of the total project time• QC often relies on individual queries that test for specific conditions,

or filters in Excel

How QC has worked up to now

SELECT policy_no, pol_eff_date, tot_incurred_loss FROM claim_dataWHERE tot_incurred_loss) > 0ORDER BY DESCENDING tot_incurred_loss;

proc summary data=pol_info; by company state; var pol_ct earned_prem written_prem; output out=pol_info_sum (drop=_type_ _freq_) sum=;run;

Page 4: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

4Copyright © 2011 Deloitte Development LLC. All rights reserved.

Data Visualization is the communication of information using

graphical representations

Introducing data visualization

“A picture is worth a thousand words.”

“The greatest value of a picture is when it forces us to notice what we never expected to see.” – John Tukey

• Data visualization is an alternative and a supplement to the more traditional querying methods

• Use is becoming more prevalent in presentation• Not yet widely used by actuaries in QC process – more used

to seeing numbers

Page 5: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

5Copyright © 2011 Deloitte Development LLC. All rights reserved.

Foundations of data visualization

Page 6: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

6Copyright © 2011 Deloitte Development LLC. All rights reserved.

• Amount of data is growing• Information harder to be expressed

numerically in simple spreadsheets• Improved computing power• Improved tools and visualization

methods, which are easier to use• Increased demand from clients /

management

Evolution in data visualization

Page 7: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

7Copyright © 2011 Deloitte Development LLC. All rights reserved.

Benefits of data visualization

Original

ValidationRecalibration

Validation

Best 10% -67% -61%Best 25% -57% -59%25% - 50% -22% -21%50% - 75% 19% 16%Worst 25% 59% 63%Worst 10% 81% 86%

• Which one is easiest to convey the message?

Page 8: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

8Copyright © 2011 Deloitte Development LLC. All rights reserved.

• Data visualization allows users see several different perspectives of the data.

• Data visualization makes it possible to interpret vast amounts of data • Data visualization offers the ability to note exceptions in the data• Data visualization allows the user to analyze visual patterns in the

data• Data visualization equips users with the ability to see influences that

would otherwise be difficult to find• Packages allow drag and drop functionality, limited need for coding• Also allows for easy drill down, filtering, grouping• Data visualization allows a natural progression for QC to insight in

modeling

Benefits of data visualization in QC

Page 9: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

9Copyright © 2011 Deloitte Development LLC. All rights reserved.

• QC, Data Mining, Exploratory Data Analysis (EDA)• Model interpretation• Results / Presentation

Uses of data visualization

Page 10: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

10Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples - Histogram

• Mass points• Extreme values• Unreasonable values

0.0

0

0.1

5

0.3

0

0.4

5

0.6

0

0.7

5

0.9

0

1.0

5

1.2

0

1.3

5

1.5

0

1.6

5

1.8

0

1.9

5

2.1

0

2.2

5

2.4

0

2.5

5

2.7

0

2.8

5

3.0

0

3.1

5

0

10,000

20,000

30,000

40,000

50,000

60,000

Adjusted to Actual Premium

Histogram after correction – but is it fixed?• Still some outliers• For extremes, higher loss

ratio indicates potential problems with adjusted premium

Page 11: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

11Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples - Histogram

• Incorrectly calculated exposure is one of the causes

Page 12: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

12Copyright © 2011 Deloitte Development LLC. All rights reserved.

• Descriptive of median, standard deviation, outliers

• Particularly useful in comparing groups within a variable

Examples - Boxplots

Page 13: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

13Copyright © 2011 Deloitte Development LLC. All rights reserved.

• Split into subgroups as well

Examples - Boxplots

Page 14: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

14Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples – Line Charts

• Unusual spikes by month?

• Seasonal activity consistent with business knowledge?

• Consistent behavior over time?

Page 15: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

15Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples – Line Charts

• Back to insurance: Total premium average premium, policy counts by month

• Is peak at January renewal (both counts and average premium) consistent with business?

Page 16: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

16Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples – Scatterplots

Modeled points outside reasonable bounds (1-10)

Page 17: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

17Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples – Scatterplots

Adding dimension of business age (size of circles) shows that age incorrectly included in model score

Page 18: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

18Copyright © 2011 Deloitte Development LLC. All rights reserved.

Correlations between variables

Look for:• Very high correlations (near +1

or -1• Low correlations for variables

that should be related

Examples - Heatmaps

Page 19: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

19Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples - Geospatial

Customers in a five state area compared to sales locations• Customers in area that are not

within driving distance• Customers mapped well outside

area

Page 20: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

20Copyright © 2011 Deloitte Development LLC. All rights reserved.

Examples - Geospatial

Sales and gross margin rate by zip code Large mass point in sales, also large negative GMR

Page 21: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

21Copyright © 2011 Deloitte Development LLC. All rights reserved.

Software (not an exhaustive list!)

Excel Third party add-ins available

PowerPivot Microsoft add-in to Excel

R Open source, multitudes of packages

Tableau

SAS SAS/GRAPH module

Spotfire

Microstrategy

Geospatial mapping ESRI, Alteryx, Google API, etc.

Page 22: Copyright © 2011 Deloitte Development LLC. All rights reserved. Data Visualization for Data QC March, 2012 Steve Berman Deloitte Consulting

22Copyright © 2011 Deloitte Development LLC. All rights reserved.

Wrapping Up

• Data Visualization allows analysts to communicate with clarity, precision, and efficiency

• However, there are benefits in utilizing visualizations early in the process, as they may uncover data issues more easily than individual queries

• Standard tools, like histograms, scatterplots, maps, are becoming more available and easier to use