![Page 1: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/1.jpg)
©2012 LinkedIn Corporation. All Rights Reserved.
How to Win Friends and Influence People (with Hadoop)Strata Conference New YorkSam Shah and Joseph AdlerOctober 25 2012
![Page 2: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/2.jpg)
©2012 LinkedIn Corporation. All Rights Reserved.
Sam ShahPrincipal Engineer and Engineering Managerwww.linkedin.com/in/shahsam
Joseph AdlerSenior Data Scientistwww.linkedin.com/in/josephadler
![Page 3: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/3.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 3
LinkedIn is the leading professional network site
Worldwide Workforce
3,300M+
Worldwide Professionals
640M+
LinkedIn Members175M+
![Page 4: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/4.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 4
Data rich
175+MMembers 175M MemberProfiles
![Page 5: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/5.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 5
9.3B Page Viewsper Quarter 130M Unique Visitors
![Page 6: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/6.jpg)
©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
![Page 7: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/7.jpg)
©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.We want to leverage this data to build products.
![Page 8: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/8.jpg)
©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.We want to leverage this data to build products.
How do you make it easy to build products from data?
![Page 9: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/9.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 9
Products we have built on Hadoop
![Page 10: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/10.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved.
Building products from data
Examples of products built with data
Year in Review Email Network Updates Skills and Endorsements People You May Know and more…
![Page 11: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/11.jpg)
STRATA NY 2012
Year in Review
One of the most successful email messages ever.
20% ResponseRate 5 Clicks per
responder
![Page 12: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/12.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 12
Network updates
![Page 13: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/13.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 13
People you may know
![Page 14: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/14.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved.
Skills and Endorsements
![Page 15: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/15.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved.
Building products from data
Hadoop is awesome for building product with data
Lots of cheap storage Vast computational resources Lots of tools for processing data, learning from data Shared infrastructure Shared support services Runs on commodity hardware (or AWS)
![Page 16: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/16.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved.
Leverage
The marginal cost of building new products is low
People You May Know (2 people) Skills and Endorsements (2 people) Year in Review (1 person, 1 month) Network Updates Stream (1 person, 3 months)
Hadoop can empower small teams to build things
![Page 17: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/17.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved.
Leverage
The marginal cost of building new products is low
People You May Know (2 people) Skills and Endorsements (2 people) Year in Review (1 person, 1 month) Network Updates Stream (1 person, 3 months)
Hadoop can empower small teams to build things
![Page 18: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/18.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 18
How we build productsTurning data into products
![Page 19: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/19.jpg)
STRATA NY 2012
Year in Review
Steps to make the email
– Collect job changers– Figure out who is connected
to them– Rank job changes
![Page 20: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/20.jpg)
STRATA NY 2012
Example: Year in Review
memberPosition = LOAD '$latest_positions' USING BinaryJSON;memberWithPositionsChangedLastYear = FOREACH ( FILTER memberPosition BY ((start_date >= $start_date_low ) AND (start_date <= $start_date_high))) GENERATE member_id, start_date, end_date;
allConnections = LOAD '$latest_bidirectional_connections' USING BinaryJSON;
allConnectionsWithChange_nondistinct = FOREACH ( JOIN memberWithPositionsChangedLastYear BY member_id, allConnections BY dest ) GENERATE allConnections::source AS source, allConnections::dest AS dest;
allConnectionsWithChange = DISTINCT allConnectionsWithChange_nondistinct;
memberinfowpics = LOAD '$latest_memberinfowpics' USING BinaryJSON;pictures = FOREACH ( FILTER memberinfowpics BY ((cropped_picture_id is not null) AND ( (member_picture_privacy == 'N') OR (member_picture_privacy == 'E'))) ) GENERATE member_id, cropped_picture_id, first_name as dest_first_name, last_name as dest_last_name;
resultPic = JOIN allConnectionsWithChange BY dest, pictures BY member_id;connectionsWithChangeWithPic = FOREACH resultPic GENERATE allConnectionsWithChange::source AS source_id, allConnectionsWithChange::dest AS member_id, pictures::cropped_picture_id AS pic_id, pictures::dest_first_name AS dest_first_name, pictures::dest_last_name AS dest_last_name;
joinResult = JOIN connectionsWithChangeWithPic BY source_id, memberinfowpics BY member_id; withName = FOREACH joinResult GENERATE
connectionsWithChangeWithPic::source_id AS source_id, connectionsWithChangeWithPic::member_id AS member_id, connectionsWithChangeWithPic::dest_first_name as first_name, connectionsWithChangeWithPic::dest_last_name as last_name, connectionsWithChangeWithPic::pic_id AS pic_id, memberinfowpics::first_name AS firstName, memberinfowpics::last_name AS lastName, memberinfowpics::gmt_offset as gmt_offset, memberinfowpics::email_locale as email_locale, memberinfowpics::email_address as email_address;
resultGroup0 = GROUP withName BY (source_id, firstName, lastName, email_address, email_locale, gmt_offset);
-- get the count of results per recipientresultGroupCount = FOREACH resultGroup0 GENERATE group , withName as toomany, COUNT_STAR(withName) as num_results;resultGroupPre = filter resultGroupCount by num_results > 2;resultGroup = FOREACH resultGroupPre { withName = LIMIT toomany 64; GENERATE group, withName, num_results;}
x_in_review_pre_out = FOREACH resultGroup GENERATE FLATTEN(group) as (source_id, firstName, lastName, email_address, email_locale, gmt_offset), withName.(member_id, pic_id, first_name, last_name) as jobChanger, '2011' as changeYear:chararray, num_results as num_results;
x_in_review = FOREACH x_in_review_pre_out GENERATE source_id as recipientID, gmt_offset as gmtOffset, firstName as first_name, lastName as last_name, email_address, email_locale, TOTUPLE( changeYear, source_id,firstName, lastName, num_results,jobChanger) as body;
rmf $xir;STORE x_in_review INTO '$xir' USING BinaryJSON('recipientID');
![Page 21: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/21.jpg)
STRATA NY 2012
Example: Year in Review
{body={num_results=80, lastName=Adler, changeYear=2011, firstName=Joseph, jobChanger=[{last_name=O'Connor, first_name=Br?on, member_id=12562482, pic_id=/p/3/000/086/1bd/10ee035.jpg}, {last_name=Sundaram, first_name=Vivek, member_id=6590171, pic_id=/p/3/000/0ae/354/36eb54c.jpg}, {last_name=Crane, first_name=Patrick, member_id=8628324, pic_id=/p/1/000/09c/064/10191de.jpg}, {last_name=McLennan, first_name=Dan, member_id=10551114, pic_id=/p/2/000/09d/12f/147def1.jpg}, {last_name=Shaughnessy, first_name=Helen, member_id=2211035, pic_id=/p/3/000/06d/2ba/06a113c.jpg}, {last_name=Chen, first_name=Richard, member_id=12800647, pic_id=/p/2/000/007/1ad/0fb84f9.jpg}, {last_name=Barba, first_name=Troy, member_id=27577, pic_id=/p/2/000/0a2/3e9/3a83a33.jpg}, {last_name=Reed, first_name=Harper, member_id=1865420, pic_id=/p/1/000/001/17b/396a2c3.jpg}, {last_name=Goldstein, first_name=Peter, member_id=205610, pic_id=/p/2/000/01c/2e6/042999f.jpg}, {last_name=Koren, first_name=Yuval, member_id=2289577, pic_id=/p/1/000/02b/3d3/1fc3627.jpg}, {last_name=Kiang, first_name=Andy, member_id=8347, pic_id=/p/1/000/063/115/1256f61.jpg}, {last_name=Greenfield, first_name=Nick, member_id=82814545, pic_id=/p/1/000/068/39f/2080b8f.jpg}, {last_name=Murarka, first_name=Bubba, member_id=174233, pic_id=/p/3/000/011/2c8/33837b8.jpg}, {last_name=Kutter, first_name=Norbert, member_id=310933, pic_id=/p/3/000/005/0e2/02775a0.jpg}, {last_name=Ehrenberg, first_name=Roger, member_id=1662181, pic_id=/p/3/000/038/066/3572baf.jpg}, {last_name=Coderre, CISSP, first_name=Rob, member_id=68521, pic_id=/p/1/000/088/0d5/2438981.jpg}, {last_name=Stephens, first_name=Bradford, member_id=10900447, pic_id=/p/1/000/0ad/0dc/15f9df5.jpg}, {last_name=Shiau, first_name=Peter, member_id=300654, pic_id=/p/2/000/056/2a6/18938e3.jpg}, {last_name=Rajan, first_name=Arvind, member_id=1260, pic_id=/p/3/000/019/3f7/1e6e0f2.jpg}, {last_name=Bellister, first_name=Jesse, member_id=25234604, pic_id=/p/3/000/00a/17d/1e2136b.jpg}, {last_name=Mohan, first_name=Viraj, member_id=56817108, pic_id=/p/3/000/0cd/0a4/097527a.jpg}, {last_name=Ragade, first_name=Dhananjay, member_id=325284, pic_id=/p/3/000/000/035/0504fe7.jpg}, {last_name=Richards, first_name=Jeff, member_id=16762, pic_id=/p/2/000/039/14e/081d1c7.jpg}, {last_name=Wittenauer, first_name=Allen, member_id=3328775, pic_id=/p/3/000/08d/2a3/307b112.jpg}, {last_name=Porzak, first_name=Jim, member_id=1708710, pic_id=/p/2/000/00d/109/0e4aa34.jpg}, {last_name=Ruma, first_name=Laurel, member_id=3429732, pic_id=/p/1/000/01e/277/2bb115b.jpg}, {last_name=Higgins, first_name=Josh, member_id=1458792, pic_id=/p/1/000/0c9/38b/1a24457.jpg}, {last_name=Benedict, first_name=Harvey, member_id=641340, pic_id=/p/3/000/0c6/1eb/2eb7119.jpg}, {last_name=Lazarus, first_name=Brett, member_id=49965786, pic_id=/p/2/000/03b/04e/318d080.jpg}, {last_name=Zhang, first_name=Simon, member_id=16323996, pic_id=/p/3/000/03f/0fe/35d4ded.jpg}, {last_name=Aspen, first_name=Matt, member_id=25240804, pic_id=/p/3/000/09b/371/22ec974.jpg}, {last_name=Herz, first_name=Erik, member_id=147604, pic_id=/p/3/000/086/014/0fab4d6.jpg}, {last_name=Sanders, first_name=Geoffrey, member_id=340570, pic_id=/p/1/000/0d1/2d1/37a76e6.jpg}, {last_name=Wright, first_name=Caleb, member_id=12798700, pic_id=/p/2/000/08c/337/2cc951a.jpg}, {last_name=Parab, first_name=Guru, member_id=8915230, pic_id=/p/1/000/08a/257/051926a.jpg}, {last_name=Grossman, first_name=Nick, member_id=12159520, pic_id=/p/2/000/005/2f3/1955f31.jpg}, {last_name=Skomoroch, first_name=Peter, member_id=11642980, pic_id=/p/2/000/0b4/12d/31eadbe.jpg}, {last_name=Singh, first_name=Deepak, member_id=1246166, pic_id=/p/1/000/042/3f5/369f807.jpg}, {last_name=Noakes, first_name=Geoffrey, member_id=3518726, pic_id=/p/3/000/005/3d7/3f67632.jpg}, {last_name=Scudiere, first_name=Robert, member_id=3965286, pic_id=/p/2/000/090/210/009a099.jpg}, {last_name=Skyler, first_name=David, member_id=15377099, pic_id=/p/3/000/005/1bf/080b255.jpg}, {last_name=Sharma, first_name=Manu, member_id=19295378, pic_id=/p/3/000/0d4/11e/2176c30.jpg}, {last_name=Huang, first_name=Erica, member_id=1808438, pic_id=/p/1/000/001/3a5/02ddd24.jpg}, {last_name=Ballotta, first_name=Pete, member_id=2011178, pic_id=/p/2/000/0b6/08f/3a92357.jpg}, {last_name=Kast, first_name=Anton, member_id=1092686, pic_id=/p/1/000/054/0e2/1a8efb2.jpg}, {last_name=Redfern, first_name=Joff, member_id=2849241, pic_id=/p/3/000/03d/28d/19f5688.jpg}, {last_name=Smith, first_name=Aaron, member_id=83470876, pic_id=/p/2/000/08c/27c/3cfe37a.jpg}, {last_name=Yadav, first_name=Rishi, member_id=2097381, pic_id=/p/2/000/0c8/08d/3ab9006.jpg}, {last_name=Repass, first_name=Mike, member_id=8633208, pic_id=/p/2/000/071/195/0bfc573.jpg}, {last_name=Dalvi, first_name=Anand, member_id=8388, pic_id=/p/1/000/003/3cd/3127384.jpg}, {last_name=Croll, first_name=Alistair, member_id=511218, pic_id=/p/2/000/029/0e5/1ebc076.jpg}, {last_name=Tolman, first_name=Sarah, member_id=86040596, pic_id=/p/2/000/06f/1c9/1a7870e.jpg}, {last_name=Suvarna, first_name=Sandeep, member_id=10558779, pic_id=/p/1/000/05b/2c7/0ec214a.jpg}, {last_name=Elliott-McCrea, first_name=Kellan, member_id=163959, pic_id=/p/1/000/06b/2e8/2dbd3ae.jpg}, {last_name=Jatkar, first_name=Tarang, member_id=17763609, pic_id=/p/1/000/012/010/2e8ee7f.jpg}, {last_name=Brown, first_name=David, member_id=420737, pic_id=/p/3/000/002/140/0b2dbcc.jpg}, {last_name=Patel, first_name=Jay, member_id=1179857, pic_id=/p/2/000/07c/0b2/0365e91.jpg}, {last_name=Field, first_name=Dylan, member_id=13066037, pic_id=/p/2/000/0a5/3e2/1fb7f06.jpg}, {last_name=Patel, first_name=Sumeet, member_id=23402387, pic_id=/p/2/000/0bf/3ca/2ca5f1f.jpg}, {last_name=Ting, first_name=Moses, member_id=15624915, pic_id=/p/2/000/0ac/117/29e329a.jpg}, {last_name=Hinnach, first_name=Yassine, member_id=1731285, pic_id=/p/3/000/000/035/330cce0.jpg}, {last_name=Das, first_name=Anshu, member_id=38878221, pic_id=/p/3/000/0b2/1ac/15902f4.jpg}, {last_name=Mendelson, first_name=Jordan, member_id=8598415, pic_id=/p/3/000/032/22a/1d2eaa6.jpg}, {last_name=Besbeas, first_name=Nick, member_id=12510505, pic_id=/p/3/000/093/167/34f5b6b.jpg}], source_id=256842}, first_name=Joseph, email_locale=en_US, last_name=Adler, gmtOffset=-8, recipientID=256842, [email protected]}
Each message requires a lot of data:
– Header information (10 fields)– 4 fields per person, 64 people– That’s over 250 data fields for
the final message
How do we turn this raw data in to web content or email messages?
![Page 22: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/22.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 22
People you may know
![Page 23: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/23.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 23
People you may know
Alice Bob
Carol
![Page 24: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/24.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 24
People you may know
Alice Bob
Carol
> 80% of connections from triangle closing
![Page 25: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/25.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 25
People you may know
Alice Bob
CarolDave
Eve
Organizational OverlapAge
Distance
Ranked Matches
User
Interactions Results
![Page 26: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/26.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved.
Skills and Endorsements
![Page 27: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/27.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 27
Tagging Skills
![Page 28: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/28.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 28
![Page 29: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/29.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved.
Skills and Endorsements
A combination of– Propensity to know member– Propensity for member to have skill
![Page 30: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/30.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 31
Productionalization
Take something that runs once…
… and run it multiple times
… and serve it reliably at scale
… and iterate quickly
![Page 31: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/31.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 32
Data Lifecycle
Moving around data is the key problem
1. IngressMoving raw data from online systems to offline systems
2. Workflow managementManaging offline processes
3. EgressMoving results from offline systems to online systems
![Page 32: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/32.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 33
Ingress
Apache Kafka: Low latency publish/subscribe message bus– Common data format (Avro)– Changelog is the abstraction for integration– Schema evolution
Programmatic compatibility model Explicit schema reviews “O(1)” ETL
K. Goodhope, J. Koshy, J. Kreps, N. Narkhede, R. Park, J. Rao, V.Y. Ye: Building LinkedIn’s Real-time Activity Data Pipeline. In IEEE Data Engineering Bulletin. Vol 35, No. 2, June 2012.
![Page 33: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/33.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 34
Workflows
Job A
Job B
Job C
![Page 34: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/34.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 35
Workflows
Job A
Job B
Job C
Push to Production
![Page 35: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/35.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 36
Workflows
Job A
Job B
Job C
Push to Production
Job X
![Page 36: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/36.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 37
Workflows
Job A
Job B
Job C
Push to Production
Job X
Push to QA
![Page 37: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/37.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 38
Real workflows are complicated
![Page 38: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/38.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 39
Workflow Management: Azkaban
Dependency management Diverse job types (Pig, Hive, Java, . . . ) Scheduling Monitoring Configuration Retry/restart on failure Resource locking Log collection Historical information
![Page 39: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/39.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 40
Workflow Management: Azkaban
![Page 40: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/40.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 41
Workflow Management: Azkaban
![Page 41: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/41.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 42
Egress: Voldemort
Distributed key/value store Easy to integrate into workflows
– Off the shelf jobs to copy Voldemort Stores– One line command in Pig
Cost of data load Data stored per node? Response time Fail-over How to transfer Versioning & rollback
R. Sumbaly, J. Kreps, L. Gao, A. Feinberg, C. Soman, & S. Shah. Serving Large-Scale Batch Computed Data With Project Voldemort. In FAST 2012.
![Page 42: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/42.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 43
Recap
Why we use Hadoop
Simple programmatic model Rich developer ecosystem
– Languages: Pig, Hive, Crunch, Cascading, …– Libraries: Mahout, DataFu, ElephantBird, …
Horizontal scalability, fault tolerance, multi-tenancy– Reliably process multiple TB of data
Don’t need hardcore distributed systems engineers
![Page 43: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/43.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 44
Recap
How we use Hadoop
Open source projects started at LinkedIn:
Getting data in: Kafka Building and running job flows: Azkaban Getting data out: Voldemort
This empowers data scientists and engineers to focus on new product ideas, not infrastructure
![Page 44: How to win friends and influence people (with Hadoop)](https://reader037.vdocuments.net/reader037/viewer/2022102722/554f630eb4c9058a148b4922/html5/thumbnails/44.jpg)
STRATA NY 2012©2012 LinkedIn Corporation. All Rights Reserved. 45
data.linkedin.comLearning More