big data la 2016: backstage to a data driven culture
TRANSCRIPT
Backstage to Data Driven Culture
Success with an Agile Data Science
Stack
Big Data LA Day 2016 Pauline Chow
Desi Medoza @ Unsplash
2
So, You are the First Data Scientist…?
WORLDWIDE BUSINESS BUSINESS TO GO CREATIVE SOLUTIONS
WORLDWIDE BUSINESS BUSINESS TO GO CREATIVE SOLUTIONS
What my Friends Think I Do What my Mom Thinks I Do What Society Thinks I Do
What my Boss Think I Do What I Think I Do What I Actually Do
Misconceptions about Data Scientists
3
4
So, You are the First or Lead Data Scientist…?
Open Source & New Tools
Profits Steady , Adding Products
Report to VP Marketing
Non Technical Culture
First Data Scientist
What does the organization do
best? How does it relate to data and technology?
What is the business core competencies?
What are existing tools,
processes, and code? Do you have a budget for new tools and
resources?
What Tools are Available ?
This is both a team members
and expectations related question.
Where is your Team?
What is the mood of the organization? How are they solving problems? Why are they adding DS/A into the organization?
What is the State of the Organization?
Who are the stakeholders? How is data able to contribute to their goals and expectations?
Who has the Influence On the Roadmap?
Context for Presentation
Case Study: Startup in Digital Media
5
Effectively Implement Solutions
Maximize Impact & Commun- ication
Set a Blueprint that promotes flexibility,
iteration, and scalability. It facilities
agile-oriented mindsets for data
practices and it crucial for implementation.
Build a Roadmap from Blueprint to
shape data practices and implement goals from stakeholders,
company, as well as strong DS/A foundations.
Develop key qualitative and
quantitative milestones.
Communicate consistently and frequently to the
organization.
Influence
Expectations
Influence from both angles, yours and
stakeholders expectations. Find explicit and implicit
goals and bridge the gaps that you find.
6
Key Drivers Integrating Data Culture
Create an Agile Data Science Stack
Non-technical focused
Actively Listen
Implement
Explore Collaborate
Influence Grow
Guiding Verbs for “First” Data Scientist
7
In no particular order
ACTIVE LISTENING:
What Are you Trying to Hear?
Explicit Goals & Expectations Structured, straight-forward, logical, and safe inquiries Document, share, and openly discuss with team members and stakeholders.
Jungwoo Hong @ Unsplash
Implicit Goals & Expectations
Thom @ Unsplash
IMPLEMENT:
HOW TO APPROACH YOUR BLUEPRINT FOR DATA
DRIVEN-INFORMED CULTURE?
Architecture First
Process First
12
STACK AGILE APPROACHES
Anthony Delanoix @ Unsplash Jeff Sheldon @ Unsplash
Blueprint approach from infrastructure perspective
AGILE BY ARCHITECTURE
13
Customize as the team grows
SaaS & PaaS Integration
14
IDENTIFY
BUILD SYS & MODELS
- Select Appropriate Models - Build Models and Pipelines for Scalability - Evaluate and refine Models
ACQUIRE DATA
- Identify the “right” source - Import data and set up remote / local storage - Determine tools to work with selected sources
CREATE PROBLEM STATEMENT
- Identify business, data, product objectives - Brainstorm potential solutions - Create questions and identify people/stakeholders to help
PARSE & MINE DATA
- Determine distribution of data and necessary transformations - Format, clean, splice, etc - Create new derived data
PRESENT RESULTS
- Summarize Findings - Add Storytelling aspects - Identify next questions and additional analysis - For teams and stakeholders
15
AGILE BY PROCESS Blueprint approach from workflow perspective
ACQUIRE PARSE & MINE PRESENT BUILD DEPLOY
IDENTIFY
BUILD SYS & MODELS + DEPLOY
Leverage platforms that document models, pipelines, and feature iterations. Collaboration is a plus.
- Sklearn pipelines - DS/ML platforms: Yhat,
domino labs, anaconda
ACQUIRE DATA Curate data from existing sources that is cleaned, reliable, and automated, where ETL can be skipped
- Segement.io - Zapier - CrowdFlower - Open Data
CREATE PROBLEM STATEMENT
Keep most attributes of this section in-house and within your team
PARSE & MINE DATA
For the data that cannot be automated or acquired cleanly, sklearn pipelines or open source Luigi (Spotify) or airflow (AirBNB) can mitigate this process.
PRESENT RESULTS
Adopt platforms that allow for iterations and data mining/parsing process to feed into reports and presentations
- Ipython Jupyter Notebooks
- Dashboards: Looker, RJMetrics, Tableau
16
SaaS & PaaS Integration Customize as the Process Increases in Complexity
ACQUIRE PARSE & MINE PRESENT BUILD DEPLOY
COLLABORATE:
What Metrics to Emphasize for Teamwork?
Burn Rate Most companies do not widely
broadcast but transparency can put decisions into perspective for the
organization. Time and urgency can also be of the essence. Customer
Acquisition Cost (CAC)
Illustrates market competitiveness with your products, services, and market saturation. Social media ad platforms can make up a large portion of these costs.
Gross Profit &
Revenue Actual revenue & profit after
expenses, investors, and ongoing costs. If the business model and product are viable then the company will be able
to stand on its own without external capital.
Active Users Measure the ongoing stickiness of a service or product. Clearly define “active” to not overcompensate first-time, new, and experimental users. Can the company move beyond early adopters and fans?
Churn Rate & Retention
How many people are leaving or become inactive after a certain
period of time? When in the customer’s lifetime is churn more
likely to occur? The higher the expected churn rate, then the
more the company has to spend on acquiring new customers.
Cumulative Growth Cumulative growth puts a long term and sustainable perspective to just month over month growth. Short-term growth can unabashedly take over and cause decision makers to lose sight of an organization’s mission and goals.
Response Time
The amount of time teams take to respond and complete tasks,
which includes bug fixes, technological improvements,
product upgades, and customer service. Responsiveness
demonstrates staff and team dedication, effective allocation of
resources, operational effectiveness, and no tech debt.
Customer LIfetime Value (CLV) Total dollars from a customer during the lifetime relationship with that customer. Intersection of frequency of customer purchases, revenue per customer, acquisition costs. This measure can have predictive qualities
INFLUENCE
How to align and connect goals and expectations?
"Leadership is the art of giving people a platform for spreading ideas that
work."
-Seth Godin
23
Evaluate milestones, iterate and grow
Month 12 Blueprint for Agile Data Science and
Analytics Stack
Day 30 Establish clear
measures for success as widespread as
possible
Day 90
Good first impressions. Listen
and Learn!
Day 1 Celebrate improvements
to workflow, effectiveness, and
access
Day 60
Democratize data access and streamline measures to external and internal teams
Month 6
Communicate, Strategize, Communicate...
Connect the Dots
24
Anything Else Reporting & Urgent
Requests
Data Acquisition,
Cleaning Exploration &
Analysis, Reports, &
Presentation
20% 80% 80% 20%
25
Allocate Time & Resources Effectively
Business as Usual Allocation New Data Science Allocation
GROW YOUR TEAM
When to increase the ability and capabilities of your team?
Technical Project Manager
Data Scientist
Data Engineer
Data Engineer
Analyst Researcher
Team Members
6
1
2 5 Central to the ability to juggle and balance
responsibility of being the first/lead data scientist.
Agile Data Science & Analytics Stack
3
4 Active Listeni
ng
Influence
Collaborate with Metrics
Explore
Implement
Grow
Actionable Agile DS/A Stack is Key to Success
28
@DataThinker WhenThereIsData.com