designing data products

67
Supported by EU projects 29/11/2013 Athens, Greece Open Data for Agriculture Joint offering by Intro to Big Data

Upload: vassilis-protonotarios

Post on 26-Jun-2015

812 views

Category:

Education


2 download

DESCRIPTION

Presentation delivered during the Introductory Course to Big Data in Agriculture. 29/11/2013, NCSR Demokritos, Athens, Greece. The presentation is heavily based on the report titled “Big Data Now: 2012 Edition", by O’Reilly Media, Inc. More info about the event: http://wiki.agroknow.gr/agroknow/index.php/Athens_Green_Hackathon_2013

TRANSCRIPT

Page 1: Designing Data Products

Supported by EU projects

29/11/2013Athens, Greece

Open Data for Agriculture

Joint offering by

Intro to Big Data

Page 2: Designing Data Products

Dr. Vassilis ProtonotariosAgro-Know Technologies, Greece

Designing Data Products

Page 3: Designing Data Products

Slide 3 of 66

Intro

• This presentation provides introductory information about – the (big) data products– the design of (big) data products, – the Drivetrain approach for the design of objective-based

(big) data products.

• The Drivetrain approach will be applied to agricultural case studies in the next session

• The majority of the slides were based on the report “Big Data Now: 2012 Edition. O’Reilly Media, Inc.”

Page 4: Designing Data Products

Slide 4 of 66

Objectives

This presentation aims to:• Provide an introduction to data products• Define the “objective-based data products” concept

– Describe the Drivetrain approach in the design of (big) Data products

– Analyze the design of data products– Provide applications / case studies

In order to provide the methodology for the development of data products

Page 5: Designing Data Products

Slide 5 of 66

Structure of the presentation

1. Intro to designing (great) data products2. Objective-based data products

– The Drivetrain approach

3. Case studies (x4): Application of the Drivetrain approach

4. The future of data products

Page 6: Designing Data Products

INTRO TO DESIGNING GREAT DATA PRODUCTS

Page 7: Designing Data Products

Slide 7 of 66

What is a (big) data product?

• What happens when (big) data becomes a product– specifically, a consumer product

• Produce (big) data based on inputs ?= data producers

• Deliver results based on (big) data ?= data processors

• Uses big data for providing useful outcomes?

Page 8: Designing Data Products

Slide 8 of 66

Facts about (big) data products

• Enable their users to do whatever they want– which most often has little to do with (big) data

• Replace physical products

Page 9: Designing Data Products

Slide 9 of 66

The past: Predictive modeling

• Development of data products based on data predictive modeling– weather forecasting – recommendation engines– email spam filters– services that predict airline flight times

• sometimes more accurately than the airlines themselves.

Page 10: Designing Data Products

Slide 10 of 66

The issue of predictive modeling

• Prediction technology: interesting, useful and mathematically elegant– BUT we need to take the next step because….

• These products just make predictions– instead of asking what action they want someone

to take as a result of a prediction

Page 11: Designing Data Products

Slide 11 of 66

The role of predictive modeling

• Great predictive modeling is still an important part of the solution– but it no longer stands on its own– as products become more sophisticated, it

becomes less useful

Page 12: Designing Data Products

Slide 12 of 66

A new, alternative approach

• the Drivetrain Approach– a four-step approach, already applied in the

industry– inspired by the emerging field of self-driving

vehicles– Objective-based approach

http://www.popularmechanics.com/cars/how-to/repair-questions/1302716

B

A

Page 13: Designing Data Products

Slide 13 of 66

A fine objective-based model

Page 14: Designing Data Products

Slide 14 of 66

Case study

• A user of Google’s self-driving car – is completely unaware of the hundreds (if not

thousands) of models and the petabytes of data that make it work

BUT• It is an increasingly sophisticated product built

by data scientists – they need a systematic design approach

Page 15: Designing Data Products

OBJECTIVE-BASED DATA PRODUCTS

Page 16: Designing Data Products

Slide 16 of 66

The Drivetrain approach

http://cdn.oreilly.com/radar/images/posts/0312-1-drivetrain-approach-lg.png

Page 17: Designing Data Products

Slide 17 of 66

Case study: Search enginesS

ourc

e: h

ttp:

//en

.wik

iped

ia.o

rg/w

iki/

Fil

e:A

ltav

ista

-199

9.pn

g

2013

1997

Why ?

Page 18: Designing Data Products

Slide 18 of 66

The 4 steps in the Drivetrain approach

• The four steps in this transition:– Identify the main objective

• For Google: show the most relevant search results

– Specify the system’s manageable inputs [levers]• For Google: ranking the results

– Consider the data needed for managing the inputs• Information about users’ activities in other web sites

– Building the predictive models• For Google: PageRank algorithm

Page 19: Designing Data Products

Slide 19 of 66

Drivetrain approach goal

NOTuse data not just to generate more data

– especially in the form of predictions

BUTuse data to produce actionable outcomes

Page 20: Designing Data Products

[CASE STUDY 1] THE MODEL ASSEMBLY LINE: A CASE STUDY OF INSURANCE COMPANIES

Page 21: Designing Data Products

Slide 21 of 66

The issue of insurance companies

• Case study: Insurance companies– Their objective: maximizing the profit from each

policy price– An optimal pricing model is to them what the

assembly line is to automobile manufacturing– Despite their long experience in prediction, they

often fail to make optimal business decisions about what price to charge each new customer

Page 22: Designing Data Products

Slide 22 of 66

Transition to Drivetrain approach

• Identifying solutions to this issue– Optimal Decisions Group (ODG) approached this

problem with an early use of the Drivetrain Approach

– Resulted in a practical take on step 4 that can be applied to a wide range of problems.

Model Assembly Line

Page 23: Designing Data Products

Slide 23 of 66

The Drivetrain approach

Set a price for policies, maximizing profit

• price to charge each customer; • types of accidents to cover;• how much to spend on marketing and customer service; • how to react to their competitors’ pricing decisions

• Data collected from real experiments on customers• Randomly changing the prices of hundreds of thousands of policies over many months

• Develop a probability model for optimizing the insurer’s profit

Page 24: Designing Data Products

Slide 24 of 66

Developing the modeler: Model Assembly Line

http

://c

dn.o

reil

ly.c

om/r

adar

/im

ages

/pos

ts/0

312-

2-dr

ivet

rain

-ste

p4-l

g.pn

g

Page 25: Designing Data Products

Slide 25 of 66

The role of the Modeler

• Modeler Component 1• model of price elasticity: the probability that a customer will

accept a given price (for new policies and renewals)

• Modeler Component 2• relates price to the insurance company’s profit, conditional on

the customer accepting this price.

• Multiplying these two curves creates a final curve • Shows price versus expected profit• The final curve has a clearly identifiable local maximum that

represents the best price to charge a customer for the first year.

Page 26: Designing Data Products

Slide 26 of 66

The final curve

http://strata.oreilly.com/2012/03/drivetrain-approach-data-products.html

Page 27: Designing Data Products

Slide 27 of 66

The role of the Simulator

• Lets ODG ask the “what if ” questions• to see how the levers affect the distribution of the final

outcome

• Runs the models over a wide range of inputs• The operator can adjust the input levers to answer specific

questions• “What will happen if our company offers the customer a low

teaser price in year one but then raises the premiums in Y2?

• Explores the distribution of profit as affected the inputs outside of the insurer’s control

• E.g. “What if the economy crashes and the customer loses his job?”

Page 28: Designing Data Products

Slide 28 of 66

The role of the Optimizer

• takes the surface of possible outcomes and identifies the highest point

• finds the best outcomes• identify catastrophic outcomes

– and show how to avoid them

Page 29: Designing Data Products

Slide 29 of 66

Take-home message

Using a Drivetrain Approach combined with a Model Assembly Line bridges the gap

between predictive models and actionable outcomes.

Page 30: Designing Data Products

[CASE STUDY 2] MARKETING INDUSTRY RECOMMENDER SYSTEMS

Page 31: Designing Data Products

Slide 31 of 66

Recommendation engines

• Recommendation engines– data product based on well-built predictive

models that do not achieve an optimal objective.– The current algorithms predict what products a

customer will like, • based on purchase history and the histories of similar

customers.

Page 32: Designing Data Products

Slide 32 of 66

The case of Amazon

• Amazon represents every purchase that has ever been made as a giant sparse matrix– with customers as the rows and products as the

columns. • Once they have the data in this format, data

scientists apply some form of collaborative filtering to “fill in the matrix.”

Page 33: Designing Data Products

Slide 33 of 66

The case of Amazon

• Such models are good at predicting whether a customer will like a given product– but they often suggest products that the customer

already knows about or has already decided not to buy.

http://strata.oreilly.com/2012/03/drivetrain-approach-data-products.html

Page 34: Designing Data Products

Slide 34 of 66

Mixed-up recommendations

Page 35: Designing Data Products

Slide 35 of 66

The Drivetrain approachDrive additional salesby surprising and delighting the customer with books not initially considered without the recommendation

• Ranking of the recommendations

Data to derive from many randomized experiments about a wide range of recommendations for a wide range of customers: To generate recommendations that will cause new sales

• Develop an algorithm providing recommendations which escape a recommendation filter bubble

Page 36: Designing Data Products

Slide 36 of 66

Developing the modeler

http

://c

dn.o

reil

ly.c

om/r

adar

/im

ages

/pos

ts/0

312-

2-dr

ivet

rain

-ste

p4-l

g.pn

g

Page 37: Designing Data Products

Slide 37 of 66

The role of the Modeler

• Modeler Component 1• purchase probabilities, conditional on seeing a recommendation

• Modeler Component 2• purchase probabilities, conditional on not seeing a recommendation

• The difference between these two probabilities is a utility function for a given recommendation to a customer

• Low in cases where the algorithm recommends a familiar book that the customer has already rejected (both components are small) or a book that he/she would have bought even without the recommendation

Page 38: Designing Data Products

Slide 38 of 66

The final curve

http://strata.oreilly.com/2012/03/drivetrain-approach-data-products.html

Page 39: Designing Data Products

Slide 39 of 66

The role of the Simulator

• Test the utility of each of the many possible books in stock

• Alternatively just overall the outputs of a collaborative filtering model of similar customer purchases

Page 40: Designing Data Products

Slide 40 of 66

The role of the Optimizer

• Rank and display the recommended books based on their simulated utility

• Less emphasis on the “function” and more on the “objective.”– What is the objective of the person using our data

product? – What choice are we actually helping him or her

make?

Page 41: Designing Data Products

[CASE STUDY 3] OPTIMIZING LIFETIME CUSTOMER VALUE

Page 42: Designing Data Products

Slide 42 of 66

Customer value

• Includes all interactions between a retailer and his customers outside the actual buy-sell transaction– making a product recommendation– encouraging the customer to check out a new feature of

the online store– sending sales promotions

• Making the wrong choices comes at a cost to the retailer– reduced margins (discounts that do not drive extra

sales)– opportunity costs

Page 43: Designing Data Products

Slide 43 of 66

The Drivetrain approachoptimize the lifetime value from each

customer• Product recommendations • Offer tailored discounts / special offers on products• Make customer-care calls just to see how the user is• Invite them to use the site and ask for their feedback

Zafu approach: not send customers directly to clothes but ask a series of simple questions about the customers’ body type, how well their other jeans fit and their fashion preferences

• Develop an algorithm leading customer to browse a recommended selection of Zafu’s inventory

Page 44: Designing Data Products

Slide 44 of 66

Page 45: Designing Data Products

Slide 45 of 66

Developing the modeler

http

://c

dn.o

reil

ly.c

om/r

adar

/im

ages

/pos

ts/0

312-

2-dr

ivet

rain

-ste

p4-l

g.pn

g

Page 46: Designing Data Products

Slide 46 of 66

The role of the Modeler

• Modeler Component 1• purchase probabilities, conditional on seeing a

recommendation

• Modeler Component 2• purchase probabilities, conditional on not seeing a

recommendation

• Modeler Component 3• price elasticity model to test how offering a discount might

change the probability that the customer will buy the item

• Modeler Component 4• patience model for the customers’ tolerance for poorly

targeted communications

Page 47: Designing Data Products

Slide 47 of 66

The final curve

http://strata.oreilly.com/2012/03/drivetrain-approach-data-products.html

Page 48: Designing Data Products

Slide 48 of 66

The role of the Simulator

• Test the utility of each of the many possible clothes available

• Provide successful matches between questions & recommendations

Page 49: Designing Data Products

Slide 49 of 66

The role of the Optimizer

• Rank and display the recommended clothes based on their simulated utility– driving sales and improving the customer experience

• Less emphasis on the “function” and more on the “objective”– What is the objective of the person using our data

product? – What choice are we actually helping him or her make?

Page 50: Designing Data Products

[CASE STUDY 4] REAL LIFE APPLICATION: SELF-DRIVING CAR

Page 51: Designing Data Products

Slide 51 of 66

Building a car that drives itself (1/2)

• Alternative approach: Instead of being data driven, we can now let the data drive us!

• Models required:– model of distance / speed-limit to predict arrival

time; a ruler and a road map needed– model for traffic congestion– model to forecast weather conditions and their

effect on the safest maximum speed

Page 52: Designing Data Products

Slide 52 of 66

Building a car that drives itself (2/2)

Plenty of cool challenges in building these models but by themselves, they do not take us to our destination

• Simulator: to predict the drive times along various routes

• Optimizer: pick the shortest route subject to constraints like avoiding tolls or maximizing gas mileage

Page 53: Designing Data Products

Slide 53 of 66

It is already implemented

According to Google, about a dozen self-driving cars are on the road at any given time. They've already

logged more than 500,000 miles in beta tests.

Page 54: Designing Data Products

Slide 54 of 66

The Drivetrain approach

Building a car that drives itself

Vehicle controls• Steering wheel, Accelerator, Βrakes

Data from sensors etc.• sensors that gather data about the road • cameras that detect road signs, red or green lights & unexpected

obstacles

• Physics models to predict the effects of steering, braking & acceleration • Pattern recognition algorithms to interpret data from the road signs

Page 55: Designing Data Products

Slide 55 of 66

Developing the Modeler

http://cdn.oreilly.com/radar/images/posts/0312-2-drivetrain-step4-lg.png

Page 56: Designing Data Products

Slide 56 of 66

The role of the Modeler

• Modeler Component 1• Route selection, conditional on following a

recommendation

• Modeler Component 2• Route selection, conditional on not following a

recommendation

Page 57: Designing Data Products

Slide 57 of 66

The role of the Simulator

• examine the results of the possible actions the self-driving car could take– If it turns left now, will it hit that pedestrian? – If it makes a right turn at 55 km/h in these

weather conditions, will it skid off the road?

• Merely predicting what will happen isn’t good enough.

Page 58: Designing Data Products

Slide 58 of 66

The role of the Optimizer

Prediction only tells us that there is going to be an accident.

An optimizer tells us how to avoid accidents.

• optimize the results of the simulation– to pick the best combination of acceleration and

braking, steering and signaling

Page 59: Designing Data Products

THE FUTURE FOR DATA PRODUCTS

Page 60: Designing Data Products

Slide 60 of 66

The present

• Drivetrain Approach: – a framework for designing the next generation of

great data products – heavily relies on optimization

• A need for the data science community to educate others – on how to derive value from their predictive

models– Based on product design process

Page 61: Designing Data Products

Slide 61 of 66

Current status of data products

• Data continuously provided in big data providers– Facebook, Twitter etc.

• Data are transformed -> they do not look like data in the end– Telematics, booking systems etc.

• Example: Music now lives on the cloud– Amazon, Apple, Google, or Spotify

Page 62: Designing Data Products

Slide 62 of 66

The future

• Optimization taught in business schools & statistics departments.

• Data scientists ship products designed to produce desirable business outcomes

Risk: Models using data to create more data, rather than using data to create actions, disrupt industries, and transform lives.

Page 63: Designing Data Products

Slide 63 of 66

To keep in mind for future big data products

• when building a data product, it is critical to integrate designers into the engineering team from the beginning.

• Data products frequently have special challenges around inputting or displaying data.

Page 64: Designing Data Products

Slide 64 of 66

What to expect in the future?

Google needs to move beyond the current search format of you entering a query and getting 10 results.

The ideal would be us knowing what you want before you search for it…

Eric Schmidt Executive Chairman of Google

Page 65: Designing Data Products

Slide 65 of 66

The future is near!

25/11/2013

Page 66: Designing Data Products

Slide 66 of 66

References

• Big Data Now: 2012 Edition. O’Reilly Media, Inc.• O’Reilly Strata: Making Data Work (

http://strata.oreilly.com/tag/big-data)

• Jeremy Howard - The Drivetrain Approach: A four-step process for building data products (http://strata.oreilly.com/2012/03/drivetrain-approach-data-products.html)

• Mike Loukides - The evolution of data products (http://strata.oreilly.com/2011/09/evolution-of-data-products.html)

• Wikipedia: Big data (http://en.wikipedia.org/wiki/Big_data)

Page 67: Designing Data Products

Thank you!

Vassilis Protonotarios

Agro-Know Technologies

[email protected]