· identify and prioritize data needs and sources acquire data, harmonize, rescale, clean, and...
TRANSCRIPT
www.bigbang-datascience.com
CAP Certificate
CAP: Certified Analytics Professional from INFORMS
1. Business Problem (Question) Framing
2. Analytics Problem Framing
3. Data
4. Methodology (Approach) Selection
5. Deployment
6. Model Life Cycle Management
Agenda
Business Problem (Question) Framing (12%–18%)
CAP Domain I
CAP Domain I
The ability to understand a business problem and determine whether
the problem is amenable to an analytics solution
T-1 Obtain or receive problem statement and usability requirements
T-2 Identify stakeholders
T-3 Determine whether the problem is amenable to an analytics solution
T-4 Refine the problem statement and delineate constraints
T-5 Define an initial set of business benefits
T-6 Obtain stakeholder agreement on the problem statement
Business Understanding
Determine Business
Objectives
Assess Situation
Determine Data Mining
Goals
Produce Project Plan
Data Science is not panacea for all business problems. Common
Culprits include
Inadequate pre-processing of the data
Inadequate Model Validation
Unjustified extrapolation (Application of the Model to data that
reside in a space which the model has never seen)
Over-fitting the model to the existing data
“Flash Crash” on May 6, 2010 : - Stock market rapidly lost more
than 600 points
CAP Domain I
Business Understanding
Determine Business
Objectives
Assess Situation
Determine Data Mining
Goals
Produce Project Plan
Ability to understand a business problem and determine whether the
problem is amenable to an analytics solution
Obtain or receive problem statement and usability requirements
Identify stakeholders
Determine whether the problem is amenable to an analytics
solution
Refine the problem statement and delineate constraints
Define an initial set of business benefits
Obtain stakeholder agreement on the problem statement
Who is your client (Stakeholders)?
What exactly is the client asking you to solve?
CAP Domain I
Business Understanding
Determine Business
Objectives
Assess Situation
Determine Data Mining
Goals
Produce Project Plan
CAP Domain I
Decision First : Designed to fully “ frame” the decision to be made
https://cc.readytalk.com/cc/playback/Playback.do?id=4d7yds
Free webinar
CAP Domain I
Strategic Decision Making:
Multi-objective Decision
Analysis with Spreadsheets
Craig W. Kirkwood
ISBN-13: 978-0534516925
ISBN-10: 0534516920
Recommended Readings
CAP Domain II
Analytics Problem Framing (14%–20%)
The ability to reformulate a business problem into an analytics
problem with a potential analytics solution
T-1 Reformulate problem statement as an analytics problem
T-2 Develop a proposed set of drivers and relationships to outputs
T-3 State the set of assumptions related to the problem
T-4 Define key metrics of success
T-5 Obtain stakeholder agreement
CAP Domain II
Business Understanding
Determine Business
Objectives
Assess Situation
Determine Data Mining
Goals
Produce Project Plan
Ability to reformulate a business problem into an analytics problem
with a potential analytics solution
Reformulate problem statement as an analytics problem
Develop a proposed set of drivers and relationships to outputs
State the set of assumptions related to the problem
Define key metrics of success
Obtain stakeholder agreement
How can you translate their ambiguous request into a concrete,
well-defined problem?
CAP Domain II
Keeping Up with the Quants: Your Guide to Understanding
and Using Analytics
Stakeholder Analysis worksheet
Recommended Readings
Data Analysis and Decision
Making
Recommended Readings
Christian Albright
ISBN-13: 978-0538476126 ISBN-10: 0538476125
CAP Domain III
Data (18%–26%)
The ability to work effectively with data to help identify potential
relationships that will lead to refinement of the business and
analytics problem
T-1 Identify and prioritize data needs and sources
T-2 Acquire data
T-3 Harmonize, rescale, clean, and share data
T-4 Identify relationships in the data
T-5 Document and report findings (e.g., insights, results, business performance)
T-6 Refine the business and analytics problem statements
CAP Domain III
Is this data already available?
If so, what parts of the data are useful?
If not, what more data do you need?
What kind of resources (time, money, infrastructure) would it
take to collect this data in a usable form?
Data Preparation
Format Data
Integrate Data
Construct Data
Clean Data
Select Data
CAP Domain III
Identify and prioritize data needs and sources
Acquire data, Harmonize, rescale, clean, and share data
Identify relationships in the data. Document and report findings
(e.g., insights, results, business performance)
Refine the business and analytics problem statements
Reformulate problem statement as an analytics problem
Develop a proposed set of drivers and relationships to outputs
State the set of assumptions related to the problem
Define key metrics of success. Obtain stakeholder agreement
Ability to work effectively with data to help identify potential
relationships that will lead to refinement of the business and analytics
problem
CAP Domain III
Data Preparation
Format Data
Integrate Data
Construct Data
Clean Data
Select Data
• Binary Variables
• Nominal Variables
• Ordinal Variables
• Interval Variables
• Ratio Variables
Structured VS. Unstructured
Primarily Source data VS. Secondarily Source Data
CAP Domain III
Data Preparation
Format Data
Integrate Data
Construct Data
Clean Data
Select Data
How to Measure Anything: Finding
the Value of “Intangibles” in
Business
Douglas W. Hubbard
ISBN-13: 978-1118539279
ISBN-10: 1118539273
Introduction to Management Science: A Modeling and
Case Studies Approach with Spreadsheets
Frederick S. Hillier
ISBN-13: 978-0534260347
ISBN-10: 0534260349
CAP Domain IV
Methodology (Approach) Selection (12%–18%)
The ability to identify and select potential approaches for solving
the business problem
T-1 Identify available problem solving approaches (methods)
T-2 Select software tools
T-3 Test approaches (methods)1
T-4 Select approaches (methods) 1
CAP Domain IV
Once you have cleaned the data, we need to identify available
problem solving approaches (methods)
Select software tools
Test approaches (methods)
Select approaches (methods)
Ability to identify and select potential approaches for solving the
business problem Modeling
Assess Model
Build Model
Generate Test Design
Select Modeling Technique
CAP Domain IV
How many variables to be analyzed
(machine learning, statistical models, algorithms): This step
is usually the meat of your project, where you apply all the
cutting-edge machinery of data analysis to unearth high-
value insights and predictions.
• One (Univariate)
• Two (Bivariate)
• Three or more (Multivariate)
What kind of model?
• What variables to include?
• Prediction or classification
• Do we want description or inference question answered
• Which specific technique ?
Modeling
Assess Model
Build Model
Generate Test Design
Select Modeling Technique
CAP Domain IV
• Understand the information contained within at a high level.
• What kinds of obvious trends or correlations do you see in the
data?
• What are the high-level characteristics and are any of them more
significant than others?
Modeling
Assess Model
Build Model
Generate Test Design
Select Modeling Technique
CAP Domain IV
CAP Domain IV
Applied Linear Regression
Models
Michael H
KutnerChristopher J.
Nachtshei
CAP Domain V
Model Building (13%–19%)
The ability to identify and build effective model structures to help
solve the business problem.
T-1 Identify model structures1
T-2 Run and evaluate the models
T-3 Calibrate models and data1
T-4 Integrate the models1
T-5 Document and communicate findings (including assumptions, limitations, and
constraints)
CAP Domain V
Modeling
Assess Model
Build Model
Generate Test Design
Select Modeling Technique
Ability to identify and build effective model structures to help solve
the business problem
Identify model structures
Run and evaluate the models
Calibrate models and data
Integrate the models
Document and communicate findings (including assumptions,
limitations, and constraints)
CAP Domain V
Introduction to Operations Research
Frederick S. Hillier
ISBN-13: 978-0071238281 ISBN-10: 007123828X
Introductory Statistics
Sheldon M. Ross
ISBN-13: 978-0123743886 ISBN-10: 0123743885
CAP Domain VI
Deployment (7%–11%)
The ability to identify and select potential approaches for solving
the business problem
T-1 Perform business validation of the model
T-2 Deliver report with findings
T-3 Create model, usability, and system requirements for production
T-4 Deliver production model/system1
T-5 Support deployment
CAP Domain VI
Evaluation
Determine Next Steps
Review Process
Evaluate Results
Perform business validation of the model
Deliver report with findings
Create model, usability, and system requirements for
production
Deliver production model/system1
Support deployment
Ability to deploy the selected model to help solve the business
problem
Deployment
All the analysis and technical results that you come up
with are of little value unless you can explain to your
stakeholders what they mean, in a way that’s
comprehensible and compelling.
Data storytelling is a critical and underrated skill that you
will build and use here.
Making Hard Decisions: An
Introduction to Decision
Robert T. Clemen
ISBN-13: 978-0534260347
ISBN-10: 0534260349
Simulation Modeling and Analysis
Averill M. Law
ISBN-13: 978-0071255196
ISBN-10: 0071255192
CAP Domain VII
Model Life Cycle Management (4%–8%)
The ability to manage the model life cycle to evaluate business
benefit of the model over time
T-1 Document initial structure
T-2 Track model quality
T-3 Recalibrate and maintain the model1
T-4 Support training activities
T-5 Evaluate the business benefit of the model over time
T-6 Tasks that are beyond the scope of the CAP® certification exam and that will not be
tested.
CAP Domain VI
Deployment
Review Project
Produce Final
Report
Plan Monitoring &
Maintenance
Plan Deployment
Document initial structure
Track model quality
Recalibrate and maintain the model
Support training activities
Evaluate the business benefit of the model over time
Ability to manage the model life cycle to evaluate business benefit of
the model over time
CAP Domain VI
Business Analytics for Managers: Taking
Business Intelligence Beyond Reporting
Gert H. N. Laursen
ISBN-13: 978-0071255196
ISBN-10: 0071255192
Certifications
Certifications
Cloudera Certified Professional or CCP data scientist. Cloudera Certified Developer for Apache Hadoop or CCDH. Cloudera Certified Administrator for Apache Hadoop or CCAH. Cloudera Certified Specialist in Apache HBase or CCSHB.
CAP: Certified Analytics Professional from INFORMS
Columbia Certification of Professional Achievement in Data Sciences,
Digital Analytics Association Web Analyst Certification Program(tm)
Certifications
EMC Data Scientist Associate (EMCDSA) Certification.
MCSE Business Intelligence Certification
SAS Certified Big Data Professional
SAS Certified Data Scientist
Certifications
Simplilearn Data Science Certification Training - (R Programming)
Simplilearn Data Science (SAS Advanced Certification Training)
TDWI Certified Business Intelligence Professional (CBIP)
Teradata Aster Certification
Certifications
Newsletters
• Data Science Central http://www.datasciencecentral.com
• Analytics Vidhya: https://www.analyticsvidhya.com
• Big Data University : https://bigdatauniversity.com/
• Kdnuggest http://www.kdnuggets.com/
Q & A
BIG BANG DATA SCIENCE SOLUTIONS
LEARN . ACHIEVE. STANDOUT