building successful data science applications
TRANSCRIPT
Building successful data science applications
6 concepts every data science team needs to understand
Niels Kasch
About Niels• Co-founder of Miner & Kasch• ML and NLP• Data enthusiast• Probably knows your 401(k) balance• Ph.D. in Computer Science from UMBC
https://www.linkedin.com/in/nielskasch
@nielskasch
Purpose of this talk
Lessons learned from observations of Data Science in the wild
How to make the most or get the most out of your DS team?What to watch out for when you start doing data science?
What makes a data science application successful?Yay• Uses data• Uses models and ML algorithms• Are deployed• Scale• Require no human in the operational
loop• Inform decision-making• Have a large impact
Nay• Not actionable• Not repeatable• Dies on a PowerPoint• Tied to the data scientist who made it• Not scalable
#1 - Position the data science team appropriately within the organization
Pros
• DSaaS - CDS prioritizes projects• Entire org can utilize DS resources• Objective analytics• Sharing of analytics knowledge
Cons• Trust and relationships with
business units• Domain knowledge depth
Data Science Team
Chief Data Scientist
Central
Business Unit
… Business Unit
#1 - Position the data science team appropriately within the organization
Pros• Business units more likely to
embrace DS efforts• Fast turn around to business unit
requests• Specialization of data scientists in
business unit’s data and processes
Cons• Potential for siloed analytics and
data assets• Duplication of infrastructure
Within Business Units
…
Business Unit
Data Science Team
Business Unit
Data Science Team
#1 - Position the data science team appropriately within the organization
Pros• Attempts to use the best of both
worlds• Fosters knowledge sharing across
DS team and business units• Diversity for Data Scientists
Cons• Potential for mixed objectives from
two bosses• Prioritization of DS efforts
Mixed
Data Science Team
Chief Data Scientist
Business Unit
#1 - Position the data science team appropriately within the organization
Why is organizational structure so important?
What happens if you don’t have the right structure?
#2 Assemble the right team• No one person can do it all
Team properties
Business acumenExpertise in statistics
Data wranglerDomain expertisePeople who know the tools
Expertise in machine learning
#2 Assemble the right team
• Data Scientist• Statistician – has a strong methodological background• ML expert – develops predictive and explanatory models• Data analyst – exhibits strong communication skills (presentation,
visualization)• Business analyst – is a domain expert and understands business needs
• Data Engineer – versatile on full stack, performs ETL, model operationalization
• Data Architect - streamlines, centralizes, and maintains data assets
• Project manager – manages people and projects, understands tools, methods, relates to the business
Team composition
#2 - Assemble the right team
Why is the right team important?
What happens if you don’t have the right team?
#3 - Conduct repeatable data science through processes• CRISP-DM, SEMMA, ASUM-DM, OSEMN, SCRUM
• Iterate fast and often• Keep aligned with business
#3 - Data Science ProcessData science is an interactive process between SMEs and data scientists
• Stakeholder involvement○ IT (DBA, Data Architect)○ Business stakeholders
• Stakeholder is involved in every aspect of the process○ Define problem according to business need (value & impact)○ Knowledge transfer from subject matter experts○ Review progress and provide feedback
Define problem
Explore data
Develop features
Create model
Training & Documentation
Deploy analytics
• Define use case• with business• stakeholders and • SME• Define dependent• variable
• Integrate data • assets• Check data• completeness• Develop data• dictionaries and• data summary• statistics
• Derive• independent• variables in• support of• modeling task• Impute missing• values
• Develop predictive • and explanatory • model• Answer the why • and what of the • business problem
• Transition model • from dev to prod • environment• Code review and • optimization
• Document feature • and model details• Provide training to • analytics and IT • stakeholders
#3 - Conduct repeatable data science through processes
Why is it important to have a process?
What happens if you don’t have the right process?
#4 - Foster the right atmosphereFor the organization• Collaboration
• Partner with stakeholders in every step of the process• Avoid the us vs. them mentality• Pair analytics/programming
• Analytics-driven enterprise• Enable business stakeholders to play with DS output• Make the business as a whole smarter
• Sensible Analytics• Quality of analytics requires the freedom to fail• Relate analytics to key initiatives, KPIs, and drivers
#4 - Foster the right atmosphereFor the team• Collaboration
• Encourage team members to learn from each other • Encourage team members to learn from and share with the business
• Enable ‘quiet time’• Establish a no fear mentality• Provide diversity on analytics tasks• Provide time to keep up with tech and academia
#4 - Foster the right atmosphere
Why is the right atmosphere important?
What happens if you don’t have the right atmosphere?
#5 - Ensure access to data• Beak down data silos• Reduce time to analysis• Chief Data Officer – governance and utilization of data assets in an org
Inte
rnal
Exte
rnal
Transactions Sales Promotions
Inventory Products CRM
Demographics
Web/app usage
Video Call center Surveys
...
Weather Social media
Factual
Traffic
Public/Govt.
...
Stru
ctur
ed a
nd u
nstr
uctu
red
CRM
EDW
HR
...
Organizational/BUdata silos
Data Lake
#5 - Ensure access to data
Why is access to data important?
What happens if you don’t have access to data?
#6 - Provide the right tools• Flexible stack to let people work with what they know• Enable rapid exploration• Volume, Velocity, Variety, Veracity of data
Data Lake
#6 - Provide the right tools
Why are the right tools important?
What happens if you don’t have the right tools?
Wrap up
#1 - Position
within the org
#2 - The right team
#3 - Repeatable
data science through
processes
#4 - The right
atmosphere
#5 - Access to data
#6 - The right tools
Successful data science applications
Thanks
Niels Kasch
www.minerkasch.com
https://www.linkedin.com/in/nielskasch
@nielskasch