developing data products

Post on 11-May-2015

6.109 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Examples, techniques, and lessons learned building data products over the last 3 years at LinkedIn. Pete Skomoroch is a Principal Data Scientist at LinkedIn where he leads a team focused on building data products leveraging LinkedIn's powerful identity and reputation data. The talk describes some techniques and best practices applied to develop products like LinkedIn Skills & Endorsements. This was the inaugural UberData Tech Talk, held in SF at Uber HQ.

TRANSCRIPT

©2012 LinkedIn Corporation. All Rights Reserved.

Developing Data ProductsUber Tech TalkPete Skomoroch @peteskomorochDecember 5 2012

Developing Data ProductsExamples, Techniques, & Lessons Learned

Our MissionConnect the world’s professionals to make them

more productive and successful.

Our VisionCreate economic opportunity for every

professional in the world.

Members First!

©2012 LinkedIn Corporation. All Rights Reserved. 4

LinkedIn is the leading professional network site

Worldwide Workforce

3,300M+2

Worldwide Professionals

640M+2

LinkedIn Members187M+

1

©2012 LinkedIn Corporation. All Rights Reserved. 5

LinkedIn profiles represent our professional identity

187MMembers 187M MemberProfiles

1 2

©2012 LinkedIn Corporation. All Rights Reserved.

We have a lot of data.

©2012 LinkedIn Corporation. All Rights Reserved.

We have a lot of data.And (like everyone else), we store it in Hadoop.

©2012 LinkedIn Corporation. All Rights Reserved.

We have a lot of data.And (like everyone else), we store it in Hadoop.And people build awesome things with that data.

What do we mean by data products?

©2012 LinkedIn Corporation. All Rights Reserved.

Building products from data at LinkedIn

A few examples:

People You May Know Skills and Endorsements Year in Review Network Updates Digest InMaps Who’s viewed my profile Collaborative Filtering Groups You May Like and more…

©2012 LinkedIn Corporation. All Rights Reserved.

Collaborative Filtering: LinkedIn Skill Pages

©2012 LinkedIn Corporation. All Rights Reserved.

Classification: giving structure to unstructured data

Extract

©2012 LinkedIn Corporation. All Rights Reserved.

Clustering & Disambiguation

©2012 LinkedIn Corporation. All Rights Reserved.

De-duplication and Normalization

©2012 LinkedIn Corporation. All Rights Reserved. 15

Network Algorithms: Relevance & Ranking

©2012 LinkedIn Corporation. All Rights Reserved.

Prediction: Personalized Skill Recommendations

©2012 LinkedIn Corporation. All Rights Reserved.

Skill Endorsements

©2012 LinkedIn Corporation. All Rights Reserved. 20

Social Proof and the Skill Endorsement Graph

©2012 LinkedIn Corporation. All Rights Reserved. 21

The Economic Graph: Skills, Jobs, People, Locations…

TimeLocation

Lessons learned developing data products

Collect the right data at the right time

©2012 LinkedIn Corporation. All Rights Reserved. 24

Large amounts of data can reveal new patternsP

rob

ab

ilit

y of

Job

Tit

le

Months since graduation

©2012 LinkedIn Corporation. All Rights Reserved. 25

Be wary of “black-box” approaches

©2012 LinkedIn Corporation. All Rights Reserved. 26

Look at your data

©2012 LinkedIn Corporation. All Rights Reserved. 27

Aggregate statistics can be misleading

©2012 LinkedIn Corporation. All Rights Reserved. 28

Build a viewer app, “micro-listen”

©2012 LinkedIn Corporation. All Rights Reserved. 29

Algorithmic intuition: include data geeks in design

©2012 LinkedIn Corporation. All Rights Reserved. 30

OODA: Think like a jet fighter

©2012 LinkedIn Corporation. All Rights Reserved. 31

OODA: Observe, Orient, Decide, Act

©2012 LinkedIn Corporation. All Rights Reserved. 32

OODA: The speed you can move determines victory

©2012 LinkedIn Corporation. All Rights Reserved. 33

Red teaming: what can go wrong likely will

©2012 LinkedIn Corporation. All Rights Reserved. 34

Error data is super valuable, analyze it and adapt

©2012 LinkedIn Corporation. All Rights Reserved.

Conclusion: tips for developing data products

Collect the right data at the right time Large amounts of data can reveal new patterns Be wary of “black box” approaches Look at your raw data Aggregate statistics can be misleading Build and use viewer apps Include data geeks in design process OODA: Think like a jet fighter Red-teaming: anticipate edge cases Find opportunity in your error data

©2012 LinkedIn Corporation. All Rights Reserved. 36

Questions?

More info: data.linkedin.com@peteskomoroch

top related