recruiters, job seekers and spammers: innovations in job search at linkedin

29
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn Daria Sorokina Senior Data Scientist LinkedIn

Post on 20-Sep-2014

553 views

Category:

Technology


2 download

DESCRIPTION

ECIR 2013 workshop keynote

TRANSCRIPT

Page 1: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Daria SorokinaSenior Data ScientistLinkedIn

Page 2: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Part I: Recruiters

“Multiple Objective Optimization in Recommendation Systems”, Mario Rodriguez, Christian Posse, Ethan Zhang. RecSys’12

Page 3: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Page 4: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

TalentMatch

TalentMatch Job Posting

Member Profiles Ranked

Talent

Page 5: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Job Posting

titlegeocompany

industrydescriptionfunctional area

Candidate

Generalexpertisespecialtieseducationheadlinegeoexperience

Current Positiontitlesummarytenure lengthindustryfunctional area…

Text similarityfeatures

TalentMatch Model

The model can be trained on user activity signals like job ad clicks or job applications

Page 6: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

TalentMatch Utility = fn(email rate, reply rate)

Email Rate

ReplyRate

Problem! Job seeker?

Recruiter

Page 7: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Job Seeker Intent

Model: time till the job change o How long will this person stay in this job after this date?o Trained on past job positions from our users profileso Accelerated failure time (AFT) modelo

ACTIVE

PASSIVE

NON-JOB-SEEKER

Page 8: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Job-SeekerFeatureExample: Attrition by Industry

Pro

babi

lity

Time

Page 9: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Job-Seeking Intent:16x reply rate on career-related mail

ReplyRate

TalentMatch Utility

fn(email rate, reply rate)

Page 10: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Talent Match rankingMatch Score1, Item X, 0.98, Non-Seeker2, Item Y, 0.91, Non-Seeker---------------------------------------3, Item Z, 0.89, Active

Improved rankingMatch Score, Reranking Score1, Item X, 0.98, 0.98, Non-Seeker2, Item Z, 0.89, 0.93, Active--------------------------------------------3, Item Y, 0.91, 0.91, Non-Seeker

Re-rankingfunction f()

Divergencescore

Objective Score: #Active in top N

How: Controlled Re-ranking Ranking Score

Distributions

optimize for both

Page 11: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Part II: Job Seekers

Learning to Rank. Fast and personalized.

Page 12: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Job Search. Query “Data Scientist LinkedIn”

Page 13: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Regular approach– A data point is a pair: {Query, Document}– Data label: “Is this document relevant for this query?”

Can be done by crowdsourcing

Job Search reality– A data point is a triple: {Query, Job position, User}– Data label: “Is this job relevant for this user who asked

this query?” Depends on the user’s location, industry, seniority… Too much to ask from a random person Have to collect labels from user signals

Learning To Rank

Page 14: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Each pair is flipped with a 50% chance

Choose pairs where only the lower document is clicked

Save 1 positive (lower) and 1 negative (upper) results for the labeled data set

We use simplified version of FairPairs(Radlinski, Joachims AAAI’06)

not flipped

flipped

flipped

not flipped

Clicked!

label 0

label 1

Page 15: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

The user clicks or skips only whatever is shown Bad results are not shown So there will be no “really bad” negatives in the training data We need to add them!

For queries with many results, add all results from the last page as “easy negatives”

Fair Pairs data is not enough for training

label 0

label 0

label 0

label 0

Page 16: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Best models for LTR are complex ensembles of trees– See results of Yahoo Learning to Rank ‘10 competition– LambdaMART, BagBoo, Additive Groves, MatrixNet …

Complex models come at a cost– It takes long to calculate predictions– Requires a lot of optimization, often used with multi-level

ranking

Can we train a simple model that will resemble a complex one?

– Train a complex model– Get insights on what it looks like– Modify a simple model accordingly

Learning To Rank – Training a Model

Page 17: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Base simple model – logistic or linear regression

– Does not handle well features with non-linear effects– Does not handle interactions (e.g., if-then-else rules)

Target complex model – Additive Groves– (Sorokina, Caruana, Riedewald ECML’07)

– Comes with interaction detection and effect visualization tools

+…+(1/N)· + (1/N)· +…+ (1/N)·+…+ +…+

Training a Simple Model using a Complex Model

Page 18: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Additive Groves can model and visualize non-linear effects

Improving LR – Feature Transformations

feature values

aver

age

pred

ictio

n Approximate the effect curve with a polynomial transform T(x)

– anything simple will do

Apply T(x) to the original feature values

Now the feature effect is linear Regression model will love it!

T(x) values

aver

age

pred

ictio

n

Page 19: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Additive Groves’ interaction detection tool produces a list of strong interactions and corresponding joint effect plots

Improving LR – Interaction Splitsav

erag

e pr

edic

tion

values of feature X1

Effect of X1 is stronger when X2 = 0

Simple regression will not capture this

Often such X2 interacts with other features as well

X 2=0

X2=1

Solution: Build separate models

for different values of X2

X2=?

X2=1

X2=0

Page 20: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Both operations (effect transforms and interaction splits) can be applied multiple times in any order

Resulting model – a simple tree with regression model leaves

Gives a significant boost to the performance of the basic LR model

Improving LR – Tree with LR leaves and transforms

X 2=0

X2=?

X2=1

X10< 0.1234 ?

Yes

No

Page 21: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

TreeExtra package

A set of machine learning tools– Additive Groves ensemble– Interaction detection– Effect and interaction visualization

http://additivegroves.net– Created by Daria Sorokina while in Cornell, CMU, Yandex, LinkedIn

from 2006 to 2013

Page 22: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Part III: Spammers

Fighting black SEO

Page 23: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Search Spam

Page 24: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Search Spam

Page 25: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Search Spam

Page 26: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Training data for the search spam classifier

Find the queries targeted by spammers.– 10,000 most common non-name queries.– Spammers love optimizing for [marketing] – But not so much for [david smith]

Look at top results for a generic user.– i.e., show unpersonalized search results.

Label data by crowdsourcing.– Definition of spam is non-personalized

Train a model– Spam scores are recalculated offline once in a while– So the model complexity is not an issue– Additive Groves works well. (Could use any ensemble of trees)

Page 27: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

ROC curve. Choosing thresholds.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

a

b

Spam score threshold

0 < a < b < 1

Page 28: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

Spam model yields a probability between 0 and 1.

Convert spam score into a factor– [0.0 <= score <= a]

not a spammer, factor = 1.0

– [b <= score <= 1.0] Spammer factor = 0.0

– [a <= score <= b] Suspicious linearly scale score from [a, b] to [1, 0]

Multiply relevance score by factor

Integrating the Spam Score into Relevance

Page 29: Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn

We are hiring!