click here to view the presentation
DESCRIPTION
TRANSCRIPT
Predictive Tax CompliancePresentation to the IRS
SPSS
Benjamin ChardSenior Solution [email protected]
Sarah MattinglyIRS Account [email protected]
SRA
Ted FischerProject [email protected] or [email protected]
Agenda
Introduction to Data Mining
Predictive Tax Compliance
Using Clementine for Audit Selection
What’s New in Clementine Version 11.1
IRS Refund Fraud Detection Project Case Study
Where Does Data Mining Fit?
Operational Setting•Reporting•Case Mgt
•Claim Scoring
Operational Setting•Reporting•Case Mgt
•Claim Scoring
Build ModelsData MiningWorkbench
Build ModelsData MiningWorkbench
Existing Data •Historical Claims•Current Claims
Existing Data •Historical Claims•Current Claims
‘Data Mining’ vs. ‘Query/Reporting’
Reporting (Tables, Graphics, OLAP)
Provide you with a very good view of what is happening, but within a limited view of the data and only in models defined by the user
YEAR
200120001999
Cou
nt
600
500
400
300
200
100
0
A&B
Assault
B&E
carjacking
Larceny
Murder
MV
Rape
Robbery
other
Incident Count - by day and shift
Count
48 15 43 62 73 68
25 39 101 131 199 100
21 27 106 179 191 102
29 38 101 177 177 103
38 50 105 168 197 107
33 40 88 147 209 107
45 21 52 82 116 112
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
00-04 04-08 08-12 12-16 16-20 20-24
‘Statistics’ vs. ‘Data Mining’ Statistics: Hypothesis Testing
Three classes of data mining algorithms:
Predict who is likely to exhibit specific behavior in the future.
Associate
“Patterns”
Predict
“Relationships”
Cluster
“Differences”
Data
Mining
Group cases that exhibit similar characteristics.
What events occur together? Given a series of actions; what action is likely to occur next?
What is Data Mining?
Predictive Tax Compliance
Predictive Tax Compliance
Tax Collection•Risk Models
Tax Collection•Risk Models
Audit Selection • Audit Models
Audit Selection • Audit Models
Non-Filer Discovery•Soft-Matching
•Prioritization Models
Non-Filer Discovery•Soft-Matching
•Prioritization Models
RegisterRegister AssessAssess CollectCollect
DATA WAREHOUSE
DATA MINING & PREDICTIVE ANALYTICS TOOLS
Right work to the right resources at the right time
Predictive Modeling
Building a predictive profile of the claim that after investigation was flagged as an improper payment regardless of amount.
Select positive investigations Maximize those claims with the highest dollar adjustment found per audit hour.
Minimize the number of no-change audits.
Cat. % nBad 52.01 168
Good 47.99 155Total (100.00) 323
Credit ranking (1=default)
Cat. % nBad 86.67 143
Good 13.33 22Total (51.08) 165
Paid Weekly/MonthlyP-value=0.0000, Chi-square=179.6665, df=1
Weekly pay
Cat. % nBad 15.82 25Good 84.18 133Total (48.92) 158
Monthly salary
Cat. % nBad 90.51 143
Good 9.49 15Total (48.92) 158
Age CategoricalP-value=0.0000, Chi-square=30.1113, df=1
Young (< 25);Middle (25-35)
Cat. % nBad 0.00 0Good 100.00 7Total (2.17) 7
Old ( > 35)
Cat. % nBad 48.98 24Good 51.02 25Total (15.17) 49
Age CategoricalP-value=0.0000, Chi-square=58.7255, df=1
Young (< 25)
Cat. % nBad 0.92 1Good 99.08 108Total (33.75) 109
Middle (25-35);Old ( > 35)
Cat. % nBad 0.00 0Good 100.00 8Total (2.48) 8
Social ClassP-value=0.0016, Chi-square=12.0388, df=1
Management;Clerical
Cat. % nBad 58.54 24
Good 41.46 17Total (12.69) 41
Professional
Cat. % nBad 52.01 168
Good 47.99 155Total (100.00) 323
Credit ranking (1=default)
Cat. % nBad 86.67 143
Good 13.33 22Total (51.08) 165
Paid Weekly/MonthlyP-value=0.0000, Chi-square=179.6665, df=1
Weekly pay
Cat. % nBad 15.82 25Good 84.18 133Total (48.92) 158
Monthly salary
Cat. % nBad 90.51 143
Good 9.49 15Total (48.92) 158
Age CategoricalP-value=0.0000, Chi-square=30.1113, df=1
Young (< 25);Middle (25-35)
Cat. % nBad 0.00 0Good 100.00 7Total (2.17) 7
Old ( > 35)
Cat. % nBad 48.98 24Good 51.02 25Total (15.17) 49
Age CategoricalP-value=0.0000, Chi-square=58.7255, df=1
Young (< 25)
Cat. % nBad 0.92 1Good 99.08 108Total (33.75) 109
Middle (25-35);Old ( > 35)
Cat. % nBad 0.00 0Good 100.00 8Total (2.48) 8
Social ClassP-value=0.0016, Chi-square=12.0388, df=1
Management;Clerical
Cat. % nBad 58.54 24
Good 41.46 17Total (12.69) 41
Professional
Anomaly Detection
Find emerging trends in claims data. Use data mining to show the emerging patterns in current year data. Reported results will present specific cases that either : Exhibit a common pattern or Exhibit an unusual pattern
Unusual cases are deployed to the field investigators for further analysis.
Case Study: Audit Selection Goals
Build models to predict different outcomes. Positive Adjustment (Y/N). DPH group membership. Actual $$ Adjustment.
Historical Cases selected for model build Cases with Prior audit – prior audit and organizational data. All Cases – organizational data only.
Deployment For each outcome combine predictions for those with and
without previous audit data . For each outcome predict using organizational data only.
Clementine Workbench
Case Study: Results
Text Mining and Linguistic ExtractionText Mining and Linguistic ExtractionText Mining and Linguistic ExtractionText Mining and Linguistic Extraction
Text Mining Timeline: Text Extraction
Bag of « Words » extraction
Expressions extraction
Named Entities extraction
Events/SentimentExtraction
Combined with structured data
70’s 80’s 90’s Now
Mr.Smithakawasseenwith
Ahmedonthe
cornerof
ChurchEtc.
Mr. Smithwas seen
Mr. Ahmedcorner
Church St.Magnolia Ave.
Nov 13thMr. Smith -> Person
Mr. Ahmed-> Personaka -> Alias
was seen -> location
Church St. -> AddressMagnolia Ave. -> Address
Nov 13th -> Date
Mr. Smith (Person) -> aka (Alias) -> Mr. Ahmed (Person)was seen (location) -> Church and Magnolia (address) ->
November 13 (Date)
Mr. Ahmed in database wanted for questioning
Suspect-> send agent to this
location
“Mr. Smith aka Mr. Ahmed was seen on the corner of Church St. and Magnolia Ave. on Nov 13 th”
Text Mining Management
General Dictionaries
Organization, Location, Name, Phone Number, etc
Custom Built Subject Dictionaries
Tax Code, Form Names, Commodity, Business, etc
Interactive Synonym Dictionaries
Exclude Dictionaries
NEW!: Classification algorithms enable you to aggregate concepts from a wide variety of unstructured text data and group them into a small number of categories.
What’s NewWhat’s NewWhat’s NewWhat’s New
Binary Classifier – Automation of Many Models
Sophisticated users: hundreds of models (scripting)
Binary Classifier Node imitates this… …but easily, with a pre-built node
Time Series Algorithm
ARIMA & Exponential Smoothing
Expert Modeler – finds best model automatically
Forecast Multiple Series at once
Data Preparation Tools
Optimal Binning
Splitting up numeric data into sub-ranges
New capability to make this optimal for prediction
Existing Capability – Equal bins New Capability – Optimal bins
SPSS Reporting
SPSS Statistics and Graphs Within Clementine
Configuration Management
AuditProcessAudit
Process
Analytical Data StorageAnalytical
Data Storage
Data MiningData
Mining
AuditSelection
AuditSelection
AuditProcessAudit
Process
Analytical Data StorageAnalytical
Data Storage
Data MiningData
Mining
AuditSelection
AuditSelection
Predictive EnterpriseServices (PES) Top Four
Deployment and Integration
Configuration Management
Exporting Data, Models and Streams
Explore and Describe
1. Improve Collaboration
In single project there is the potential to create a large number of models and versions of models: different out variables different algorithms different settings different training samples.
X # different data sets
X # different users
X # different locations.
2. Improve Transparency
Provide information on which models are run on which data.
For audit standards, track who has made changes to the model and when.
Your analytics team from their desktop can see which models were
most recently run on data, so that they would be able to provide this
for internal audits.
3. Automate Process
Combine Clementine, SPSS, SAS & other processes
Scheduling & notification
4. Centralize and Control Access
Contact information
Project personnel: Ted Fischer – [email protected] or
[email protected], 301-731-3534 Anthony Colyandro – [email protected]
or [email protected], 301-731-3524
SRA Director of Business Intelligence Dave Vennergrund – [email protected],
703-803-1614
How do I get SPSS software?
IRSCathy J. Allen
Enterprise System ManagementSoftware Management Section
Idea Branch - MS 5850(304) 264-7279 - voice(304) 279-5309 - cell(304) 260-3033 - [email protected]
SPSS Contacts:Account Executive – Sarah Mattingly
Email: [email protected] – 703-740-2446C – 703-389-6485
Account Manager – Matt MaddenW - 312 651 3894
Predictive Tax CompliancePresentation to the IRS
SPSS
Benjamin ChardSenior Solution [email protected]
Sarah MattinglyIRS Account [email protected]
SRA
Ted FischerProject [email protected] or [email protected]