understanding voice of members via text mining – how linkedin built a text analytics engine at...

28
Understanding Voice of Members via Text Mining – How Linkedin built a text analytics platform at scale Chi-Yi Kuan Weidong Zhang Tiger Zhang

Upload: yongzheng-tiger-zhang

Post on 07-Feb-2017

50 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Understanding Voice of Members via Text Mining

– How Linkedin built a text analytics platform at scale

Chi-Yi Kuan Weidong Zhang

Tiger Zhang

Page 2: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Who are we?

www.linkedin.com/in/chiyikuan

Chi-Yi Kuan

www.linkedin.com/in/weidongzhang1

Weidong Zhang Tiger Zhang

www.linkedin.com/in/tigerzhang

• Director, Analytics at Linkedin • Big data evangelist and

practitioner

• Manager, Analytics Platform & Apps at Linkedin

• Build big data and analytics products

• Sr. Staff, Analytics at Linkedin •  Text mining scientist and big data

enthusiast

Strata + Hadoop World, 12/8/2016

Page 3: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Strata + Hadoop World, 12/8/2016

Page 4: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Knowledge Schools Skills Jobs Companies Members

467M 7M 6M 3B 27k 200k Endorsements Daily posts

Strata + Hadoop World, 12/8/2016

Page 5: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

467M 2B Billions

LinkedIn Big Data

Strata + Hadoop World, 12/8/2016

Page 6: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Strata + Hadoop World, 12/8/2016

467+ million members = a lot of data

Page 7: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Voices: drive actionable intelligence from member voices…

What’s trending Products

Home Page Mobile Inbox

Sentiments Value Props

Hire Market Sell

Relevance filtering

Classification

Topic mining

Identify content that is relevant to Linkedin brand and products/services

Structuralize unstructured textual data into well-defined categories

Find most significant topics and stories in a certain time window

Strata + Hadoop World, 12/8/2016

Page 8: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

…creating impact across business metrics

Developed game-changing solutions to drive Voice of Member impact

Improved analytics efficiency with unstructured data by 20X

Drove end-to-end technological integration on big data and embedding NLP solutions

Piloting operational solutions to scale advanced analytics impact for broader organization

Strata + Hadoop World, 12/8/2016

Page 9: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

LinkedIn Hadoop Ecosystem

HDFS

Map-Reduce Tez Spark

Pig Hive Scalding

YARN

AZK

AB

AN

Strata + Hadoop World, 12/8/2016

Page 10: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Design Principles for Voices Platform

Scalability Availability Easy to Use Process Platform

Data Systems

Application Framework

Kafka, Hadoop

Spark Gobblin

Elasticsearch NoSQL

Phoenix Elasticsearch

Highcharts

Strata + Hadoop World, 12/8/2016

Page 11: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

E2E Voices Platform Architecture

Strata + Hadoop World, 12/8/2016

Page 12: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Data Processing at Scale – with Generic ETL

Strata + Hadoop World, 12/8/2016

Page 13: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Smart IDs – for Viral Mentions with Threading

Strata + Hadoop World, 12/8/2016

Page 14: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

High Availability – through Heterogeneous Data

Strata + Hadoop World, 12/8/2016

Page 15: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Machine learning based analytic engine to surface insights to everyday business users

Customized Feeds

Central navigation

Trending insights

Social analytics & topic mining

Deep dives

Sentiment solutions

Strata + Hadoop World, 12/8/2016

Page 16: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Text mining is a crowded space

Strata + Hadoop World, 12/8/2016

Page 17: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Our solution targets unique use cases for LinkedIn Member info

•  Identity •  Behavior •  Social

Social data

Customer feedback •  Customer service •  Group updates •  Network updates

Survey results

What’s trending

Products

Sentiments

Value Propositions

PYMK Group

Home Page Mobile Inbox

Identity Network

Hire Market Sell

Relevance solution

Topic mining

Text Classification

Strata + Hadoop World, 12/8/2016

Page 18: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

▪ Product insights, launches, and events ▪ Horizontal themes ▪ PR and marketing campaigns

▪ Brand and value ▪ LinkedIn’s strategy, financial

performance, international etc.

Relevant:

Non-relevant: ▪ Status update, e.g. "I posted

something on Linkedin"; ▪ Social mentions, e.g. "Please

connect with me on Linkedin" or "Follow me on Linkedin"; ▪ Self promoting materials, e.g.

“share on LinkedIn” ▪ SPAMs

1) Focusing on relevant data

Strata + Hadoop World, 12/8/2016

Page 19: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Keyword based approach

Relevance prediction

power Rules

56%

Whitelist Blacklist

10%

60%

6%

19%

35%

Strata + Hadoop World, 12/8/2016

Page 20: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Generic text classification framework ▪  Feature generation ▪  Feature selection ▪  Machine learning algorithms:

–  Naïve Bayes (NB) –  Logistic Regression (LR) –  Support Vector Machines (SVM)

(LibLinear) ▪  Cross-validation and evaluation

Applications ▪  LinkedIn relevance ▪  Sentiment analysis ▪  Product categorization

▪  Value proposition classification

2) Leveraging text classification engine

Strata + Hadoop World, 12/8/2016

Page 21: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Machine learning approach increases overall relevance by 40%

Relevance prediction

power Rules

56%

Whitelist Blacklist

6%

19%

40%

100%

SVM

35%

SVM: great gain in balancing precision and recall

Strata + Hadoop World, 12/8/2016

Page 22: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

3) Enabling topic mining HIGH SPARK

Description

POS pattern matching

Part-of-speech (POS) tagging (Stanford CoreNLP) This is great.

… …

Topic pruning

-  Stemming

-  removing stop words

-  merging synonyms

-  clustering (optional)

**** ing ****** s

= =

Topic ranking: TF-IDF weighting and DF ranking

Strata + Hadoop World, 12/8/2016

Page 23: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Trending Insights – identify organic trending topics

Didi and Kuaidi merger

Product release

Strata + Hadoop World, 12/8/2016

Page 24: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

LinkedIn’s customer support has evolved into an intelligence platform…

Scaling to have a broader impact across LinkedIn

▪  GCO cases ▪  Issue resolution ▪  Support focused

▪  Internal data (GCO, surveys, site feedback)

▪  App review ▪  LI.com ▪  Social data

▪  Product insight ▪  Member insight ▪  Launch tracking

▪  Social sentiment ▪  Brand tracking ▪  Viral mentions

Reactive Multi-channel Intelligent Predictive

Support Feedback Insights Anticipation

Strata + Hadoop World, 12/8/2016

Page 25: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

…breaks down into sentiment and drivers…

4

(For LI data ) deep dive into MLC segmentation…

6

…geographic locations…

5

…and audience segmentation…

7

…generates automatic reporting, alerts and escalations…

8

…and close the feedback loop with support and PR solutions

9

This is what the future could look like From the first time we pick up an isolated comment…

1

Machine determines if there is significant reach…

2

…and whether it is a trending topic…

3

Strata + Hadoop World, 12/8/2016

Page 26: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Best customer experience starts from understanding Voices of

members!

Thank You!

Page 27: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

Engineering blogs for Voices

Strata + Hadoop World, 12/8/2016

Part I. Voices: a Text Analytics Platform for Understanding Member Feedback Part II. Technical Details for Topic Mining

Page 28: Understanding Voice of Members via Text Mining – How Linkedin Built a Text Analytics Engine at Scale

References 1.  LibLinear: a library for large linear classification, available at

https://www.csie.ntu.edu.tw/~cjlin/liblinear/

2.  LingPipe: a Java-based toolkit for processing text using computational linguistics,

available at http://alias-i.com/lingpipe/

3.  NLTK: a leading platform for building Python programs to work with human language

data, available at http://www.nltk.org/

4.  Stanford CoreNLP: an open source project lead by Stanford NLP group, available at

http://nlp.stanford.edu/software/