talent circle detection in job transition networkstalent circle detection in job transition networks...

10
Talent Circle Detection in Job Transition Networks Huang Xu , Zhiwen Yu †* , Jingyuan Yang , Hui Xiong ‡* , Hengshu Zhu ] Northwestern Polytechnical University, Xi’an 710072, PR China Rutgers University, ] Baidu Research-Big Data Lab [email protected], [email protected], {jingyyan,hxiong}@rutgers.edu, [email protected] ABSTRACT With the high mobility of talent, it becomes critical for the recruitment team to find the right talent from the right source in an efficient manner. The prevalence of Online Pro- fessional Networks (OPNs), such as LinkedIn, enables the new paradigm for talent recruitment and job search. How- ever, the dynamic and complex nature of such talent infor- mation imposes significant challenges to identify prospective talent sources from large-scale professional networks. There- fore, in this paper, we propose to create a job transition net- work where vertices stand for organizations and a directed edge represents the talent flow between two organizations for a time period. By analyzing this job transition network, it is able to extract talent circles in a way such that every cir- cle includes the organizations with similar talent exchange patterns. Then, the characteristics of these talent circles can be used for talent recruitment and job search. To this end, we develop a talent circle detection model and design the corresponding learning method by maximizing the Nor- malized Discounted Cumulative Gain (NDCG) of inferred probability for the edge existence based on edge weights. Then, the identified circles will be labeled by the represen- tative organizations as well as keywords in job descriptions. Moreover, based on these identified circles, we develop a tal- ent exchange prediction method for talent recommendation. Finally, we have performed extensive experiments on real- world data. The results show that, our method can achieve much higher modularity when comparing to the benchmark approaches, as well as high precision and recall for talent exchange prediction. CCS Concepts Information systems Clustering; Keywords People Analytics; Talent Circle Detection. * Corresponding authors. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. KDD ’16, August 13-17, 2016, San Francisco, CA, USA c 2016 ACM. ISBN 978-1-4503-4232-2/16/08. . . $15.00 DOI: http://dx.doi.org/10.1145/2939672.2939732 1. INTRODUCTION When there is a scarcity of skilled talents, an emerging challenge for human resource management (HRM) is how to identify the right talent from the right source in an ef- ficient manner. Professional recruiters usually invest a lot of resources in talent source acquisition, which is focused on identifying, assessing, and engaging the sources of skilled talent candidates through proactive recruiting techniques. As a proactive recruiting strategy, the use of Online Pro- fessional Networks (OPNs), such as LinkedIn 1 , for talent re- cruitment becomes popular in many firms. Indeed, there is rich information about talent career trajectories and the tal- ent skills in OPNs data [17], which enables the new paradigm for talent recruitment and job search. However, the dynamic and complex nature of this tal- ent information imposes significant challenges to identify prospective talent sources from large-scale OPNs. More specifically, there are three unique challenges. First, there are too many people available in the network. For exam- ple, LinkedIn has reached over 400 million members around the world in October 2015. Thus, it is essential to pro- vide a method for finding the candidate scope from many possible talent pools. Second, for different types of posi- tions, hiring specialists usually need to consider different talent sources. Third, since people’s job transition trajecto- ries vary a lot [16] and lack of regularity, it is necessary to investigate people’s job transition trajectories at the orga- nizational level. Then, it is possible to capture the hidden recruitment patterns and identify the right talent sources. Computer Engineers Finance Risk Analysts Project Managers Software Engineers Ego Node Figure 1: Circles in job transition network. To address these challenges, in this paper, we propose to use an organization-level job transition network [3], which is generated from people’s job transition trajectories. In the network, vertices stand for organizations and edges represent 1 https://www.linkedin.com

Upload: others

Post on 17-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Talent Circle Detection in Job Transition NetworksTalent Circle Detection in Job Transition Networks Huang Xuy, Zhiwen Yuy, Jingyuan Yangz, Hui Xiongz, Hengshu Zhu] yNorthwestern Polytechnical

Talent Circle Detection in Job Transition Networks

Huang Xu†, Zhiwen Yu†∗, Jingyuan Yang‡, Hui Xiong‡∗, Hengshu Zhu]

†Northwestern Polytechnical University, Xi’an 710072, PR China‡Rutgers University, ]Baidu Research-Big Data Lab

[email protected], [email protected],{jingyyan,hxiong}@rutgers.edu, [email protected]

ABSTRACTWith the high mobility of talent, it becomes critical for therecruitment team to find the right talent from the rightsource in an efficient manner. The prevalence of Online Pro-fessional Networks (OPNs), such as LinkedIn, enables thenew paradigm for talent recruitment and job search. How-ever, the dynamic and complex nature of such talent infor-mation imposes significant challenges to identify prospectivetalent sources from large-scale professional networks. There-fore, in this paper, we propose to create a job transition net-work where vertices stand for organizations and a directededge represents the talent flow between two organizations fora time period. By analyzing this job transition network, itis able to extract talent circles in a way such that every cir-cle includes the organizations with similar talent exchangepatterns. Then, the characteristics of these talent circlescan be used for talent recruitment and job search. To thisend, we develop a talent circle detection model and designthe corresponding learning method by maximizing the Nor-malized Discounted Cumulative Gain (NDCG) of inferredprobability for the edge existence based on edge weights.Then, the identified circles will be labeled by the represen-tative organizations as well as keywords in job descriptions.Moreover, based on these identified circles, we develop a tal-ent exchange prediction method for talent recommendation.Finally, we have performed extensive experiments on real-world data. The results show that, our method can achievemuch higher modularity when comparing to the benchmarkapproaches, as well as high precision and recall for talentexchange prediction.

CCS Concepts•Information systems → Clustering;

KeywordsPeople Analytics; Talent Circle Detection.

∗Corresponding authors.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

KDD ’16, August 13-17, 2016, San Francisco, CA, USAc© 2016 ACM. ISBN 978-1-4503-4232-2/16/08. . . $15.00

DOI: http://dx.doi.org/10.1145/2939672.2939732

1. INTRODUCTIONWhen there is a scarcity of skilled talents, an emerging

challenge for human resource management (HRM) is howto identify the right talent from the right source in an ef-ficient manner. Professional recruiters usually invest a lotof resources in talent source acquisition, which is focusedon identifying, assessing, and engaging the sources of skilledtalent candidates through proactive recruiting techniques.

As a proactive recruiting strategy, the use of Online Pro-fessional Networks (OPNs), such as LinkedIn1, for talent re-cruitment becomes popular in many firms. Indeed, there isrich information about talent career trajectories and the tal-ent skills in OPNs data [17], which enables the new paradigmfor talent recruitment and job search.

However, the dynamic and complex nature of this tal-ent information imposes significant challenges to identifyprospective talent sources from large-scale OPNs. Morespecifically, there are three unique challenges. First, thereare too many people available in the network. For exam-ple, LinkedIn has reached over 400 million members aroundthe world in October 2015. Thus, it is essential to pro-vide a method for finding the candidate scope from manypossible talent pools. Second, for different types of posi-tions, hiring specialists usually need to consider differenttalent sources. Third, since people’s job transition trajecto-ries vary a lot [16] and lack of regularity, it is necessary toinvestigate people’s job transition trajectories at the orga-nizational level. Then, it is possible to capture the hiddenrecruitment patterns and identify the right talent sources.

Computer Engineers

Finance Risk Analysts

ProjectManagers

SoftwareEngineers

Ego Node

Figure 1: Circles in job transition network.

To address these challenges, in this paper, we propose touse an organization-level job transition network [3], whichis generated from people’s job transition trajectories. In thenetwork, vertices stand for organizations and edges represent

1https://www.linkedin.com

Page 2: Talent Circle Detection in Job Transition NetworksTalent Circle Detection in Job Transition Networks Huang Xuy, Zhiwen Yuy, Jingyuan Yangz, Hui Xiongz, Hengshu Zhu] yNorthwestern Polytechnical

the job transition amount among organizations for a timeperiod. Also, the direction of edge indicates the job transi-tion orientation and the weight of edge shows the quantityand the category of employees transferring between orga-nizations. Moreover, if a given organization is used as acenter node, an ego network can be formed by selecting allthe neighbors and edges between them from job transitionnetworks, as shown in Figure 1.

By analyzing this job transition network, we define a tal-ent circle on the ego network to help identify prospectivetalent sources. A talent circle consists of organizations thathave similar talent exchange patterns. With talent circles,the source of different types of talents are gathered togetherand can then be used for recruitment and job search. Specif-ically, recruiters can find candidates from organizations inmost relevant circles. On the other hand, job seekers canlocate the position targets in their related circles. For exam-ple, in Figure 1, the neighbor organizations of the ego havebeen separated into several circles with overlapping nodesor even in hierarchy. These different circles show differenttalent exchange patterns for talents on specific types of po-sitions, such as financial risk analysts or software engineers.

Although the problem of ego network detection has beenstudied in social media [9] or co-authorship network fields[2], job transition network differs from the aforementionedscenarios in several aspects. First, the ego job transition net-work is much more densely connected and weighted. Sincemost of the organizations in an ego network have strongconnections, the weight is important to distinguish the sig-nificance of edges. Second, the job transition network is di-rection sensitive, because the direction of edges has differentmeanings in different recruitment scenarios.

In light of the above, we propose a talent circle detec-tion method for finding talent circles in the job transitionnetworks. Specifically, we define node similarity based ontalent exchange patterns and infer edge existence probabil-ity with hypothetical circles. Then, we compare the inferredprobability with edge weight and refine the circle segmen-tation by maximizing Normalized Discounted CumulativeGain (NDCG). Next, we label detected circles with repre-sentative words from job description to provide the semanticmeaning for identified circles. Moreover, based on detectedcircles, we develop a talent exchange prediction method toshow the effectiveness of the proposed model. Finally, wehave performed extensive experiments on a large amountof real-world data. The results show that, our method canachieve much higher modularity when comparing to the bench-mark approaches, as well as high precision and recall fortalent exchange prediction.

2. FRAMEWORK OVERVIEWTo address the circle detection problem in job transition

network, we design a framework that consists of data crawl-ing, data transforming, modeling, learning and circle label-ing. As shown in Figure 2, the framework consists of threemain stages, namely network formation, feature extractionand circle detection.

Network Formation. In this stage, we obtain the rawdata and transfer them into a formalized job transition net-work. The raw data are job experience records, in whicheach item contains a job title, a corresponding organiza-tion, a brief text description of work contents, and the be-ginning/end date of the job position. Specifically, we first

crawl professional profiles from OPNs and transfer the innerresume into job transition trajectories by joining successiveworking experience items. Then, we aggregate all the tra-jectories into a weighted and directed network [21] at theorganizational level. Finally, the ego network is defined as asubnetwork that consists of the neighbors of a specific node.

OPNs Resume RecordsJob Transition

SequenceTransition Network

Net

wor

k Fo

rmat

ion

Feat

ure

Ext

ract

ion

Aggregated Feature

Ego-centric Feature

Belonging Coefficient

Natural Weight Neighbor Sequence

Probability Inferring

Temporal Circle Segmentation

Inferred Probability Neighbor Sequence

Selector

Normalize Discounted Cumulative Gain

Circle Segmentation

Cir

cle

Det

ectio

n

Circle Labeling

Figure 2: The framework of talent circle detection.

Feature Extraction. In this stage, we define similarityfeatures according to the job transitions of nodes and thesimilarities between nodes. Specifically, we classify job ti-tles into categories based on the job description posted bycorresponding employees. Intuitively, organizations that ex-change more employees and have more common preferencesin job type distribution should be more similar to each other.Furthermore, organizations have homogeneous interactionswith the ego node should be more similar than heterogeneousones. Thus, the features used in similarity measurement canbe derived from two aspects: aggregated and ego-centricpersonnel exchange. Specifically, aggregated personnel ex-change features consist of job transition patterns centeredby the node from all its neighbors; and ego-centric featuresconsist of job transition patterns between nodes and the egonode. These features can capture the hiring characteristicsof the organization. Here, the similarity is defined as themultiplicative inverse of Euclidean distance between nodes.

Circle Detection. In this stage, we conduct model learn-ing process to determine node segmentation. In the model,we first put each node into one hypothetical circle and calcu-late the node-circle belonging coefficient based on the nodesimilarity. Then, we remove a node from circles which itbelongs to and then add it into other circles according tothe level of belongingness and circle size. After that, we in-fer the possibility of edge existence based on the assumptionthat, if two nodes with high similarity and appear in tightlyconnected circles, an edge will be more possibly generatedbetween them. Finally, we compare the inferred probabil-ity with edge weight by a customized version of NDCG anddetermine whether to accept the segmentation or not. Thelearning process is initiated as a single node circle sets anditerates until there is no update of objective function. Aftercircle detection, we label the circles with top related orga-nizations and keywords from job description. In particular,the learned circles can be further used for recruitment re-lated applications, such as talent exchange prediction.

3. JOB TRANSITION NETWORKIn this section, we describe how to build the weighted job

transition network, and how to extract features for measur-ing the similarity between nodes.

Page 3: Talent Circle Detection in Job Transition NetworksTalent Circle Detection in Job Transition Networks Huang Xuy, Zhiwen Yuy, Jingyuan Yangz, Hui Xiongz, Hengshu Zhu] yNorthwestern Polytechnical

3.1 PreliminariesWe use a weighted and directed graph G =< V,E,W >

to model job transitions between organizations for a timeperiod (e.g., in 12 months). Specifically, each node vi ∈V (i = 1, 2, ..., N) represents an organization, which could bea company, a university or a government department, etc. Adirected edge ei,j ∈ E from vi to vj stands for the aggregatedjob transition from organization vi to vj . Moreover, theindegree (or outdegree) deg−(vi) (or deg+(vi)) of node vi isthe number of edges ended at (or started from) vi.

wu,i wu,k

vivj

vk

wu,j

wi,j wj,k

wu,iu

wu,k

vi vkwi,j wj,k

wu,j

Ego

Egou

Figure 3: Directed and weighted ego network.

Application-Oriented Node Degree. In recruitment,different application scenarios concern different edge direc-tions. For example, if the task is to discover where to findout candidates for a specific position in a company, the his-torical incoming transitions of that company are appropriatereferences. However, when job-seekers urge to know whereto find the next job, they usually need to check the out-goingtransitions from the current company they are working for,and find out some companies which are ready for recruitmentof qualified positions. Therefore, we use different degree def-initions in these two scenarios. Specifically, we use deg(vi) torepresent appropriate node degree while developing model.In the application of recruiting candidates, it is more mean-ingful to locate the source where employees come from, suchthat we set deg(vi) = deg−(vi), where the edge weight andtransfer amount are calculated based on the selected direc-tion of transitions. Accordingly, we set deg(vi) = deg+(vi)in the application of job position locating. As a result, themodel is general for satisfying both application scenarioswhen using corresponding degree definition.

Edge Weight Definition. For a node vi and its neigh-bors {vj , j = 1, 2, ..., deg(vi)}, weight wi,j ∈ W of edge ei,jis defined as the job transition percentage between vi andvj within a given time period. More precisely, we use ni todenote the total number of transitions from (or to, dependson the application scenario) a node. The edge weight is thencalculated as

wi,j =τi,jni, (1)

where τi,j is the amount of transitions between vi and vj .Note that, since we are only interested in one of the direc-tions in our model for each scenario, edges will not be dupli-cated used in weight calculation. Furthermore, because thenodes in job transition network are densely connected, theweight is a critical indicator to distinguish the significanceof edges. Thus we take the edge weight as an importantdeterminant in both node similarity measurement and circledetection process.

Ego Network. The ego network in talent circle detectionis a subset of the job transition network. Specifically, givena node u as the center node, the ego network is constitutedof all the neighbors of u. For example, as shown in Figure 3,

the neighbor node set (i.e., black nodes) in ellipses forms theego network. In particular, different application scenariosrequire different edge sets. The left side of Figure 3 shows anetwork that is suitable for identifying talent source of theego node, and the right side network is appropriate for job-seekers to reduce job search scope when chasing a new job.Note that, because the transition patterns should not belimited in the ego network, edge weights and transitions inthe ego network are measured based on the whole network G,but not measured by the traffic of subset nodes. Moreover,u is not included in its ego network. We use notation G ⊆ Gto denote the ego network in our model.

3.2 Node SimilarityIntuitively, the similarity of two organizations could be

derived from profile information, such as whether they arebelonging to the same sector, located in the same city, orwhether they have common business scope.

However, static profile based similarity is not suitable intalent circle detection from the perspective of recruitment.Ideally, the similarity should reflect employees’ transitioncharacteristics between organizations. In other words, or-ganizations that share more job transitions and have morecommon preferences in job category distribution should bemore similar to each other. Furthermore, since the pri-mary goal is to find out the circles of the ego, organiza-tions which have homogeneous interaction with ego shouldbe more similar than heterogeneous ones. Thus, in this pa-per, the similarity is defined based on the historical job tran-sitions. Specifically, the features used for similarity measure-ment can be derived from two aspects, namely aggregatedand ego-centric personnel exchange.

Aggregated Personnel Exchange. This is defined asthe categorical distribution of transitions to a node. In thispaper, job positions in transitions are divided into 10 cate-gories according to job title and the corresponding job de-scription (the details are discussed in Section 6.1). The to-tal amount of job transition ni is separated into {ni,c, c =1, 2, .., 10}, where ni,c is the amount of transitions in jobcategory c. Then, the transition proportion of each job titlecategory with respect to the total transitions to organizationvi forms a categorical distribution vector

αi = [α1i , α

2i , ..., α

10i ], αc

i =ni,c

ni. (2)

The vector captures the constitution of employees attracted(or offered in the application of recruitment) by the organi-zation. In Figure 4, the solid lines illustrate the source ofaggregated features of in-coming traffics. From the perspec-tive of recruitment, two organizations with similar α mayhave analogous staffing strategies. Since the edge and thecorresponding weight stand for the fraction of transitionsbetween nodes, aggregated exchange could be treated as theextension of all the edges. Moreover, because the weight ofedge is used as the objective in our model, it will be ignoredin similarity measurement.

Ego-centric Personnel Exchange. This is the cate-gorical distribution of transitions interfered by the ego node.From the viewpoint of the ego node, organizations may actas similar roles in talent exchange flows, even though theyare different in staffing strategies. Specifically, two organiza-tions are similar when they offering (or attracting) the sametype of talents (e.g. hardware engineer) to (or from) the ego.Although the center node is excluded in its ego network, the

Page 4: Talent Circle Detection in Job Transition NetworksTalent Circle Detection in Job Transition Networks Huang Xuy, Zhiwen Yuy, Jingyuan Yangz, Hui Xiongz, Hengshu Zhu] yNorthwestern Polytechnical

αi1 αi

2 αi9 αi10 αj

1 αj2

βi1

βi10

… αj9 αj10…

βi2

βi9

βj1

βj2βj9βj10

Fi =[αi, βi] Fj=[αj, βj]

Ego

vi vj

u

Figure 4: Similarity measurement illustration.

relationship between nodes and the ego is a determinativefactor in circle detection. Particularly, two nodes are morelikely to co-appear in a circle when they have common inter-action patterns with the ego node. In fact, two organizationsmay be grouped into the same circle because they feed theego with the same category of employees, even though theydo not have the same organization scale or in the same sec-tor. Ego-centric features are defined based on αi, while thetransitions between node u and vi are considered. We useµi to denote the categorical distribution vector of transitionsbetween u and vi, and then define the feature vector as

βi = [β1i , β

2i , ..., β

10i ], βc

i =µci

αcu

· wu,i, (3)

where αu is the aggregated feature and wu,i is the weight ofedge between u and vi. Each dotted line in Figure 4 illus-trates a dimension of ego-centric features of node vi and vjbetween the ego. βi is the fine-grained transition categoricaldistribution normalized by aggregated transition categoricaldistribution. Indeed, two similar vectors indicate they inter-act with the ego in similar way, since the vector encodes theinteraction pattern between the ego and organization. Basedon the feature vectors αi and βi, similarity Sim(vi, vj) is de-fined on the merged vector Fi = [αi, βi] and Fj = [αj , βj ],as shown in Equation 4.

Sim(vi, vj) =1

||Fi − Fj ||2. (4)

4. CIRCLE DETECTION MODELIn this section, we introduce the details of our model for

talent circle detection in the ego job transition network.

4.1 Basic ConceptsIn this paper, we design a generative model to detect over-

lapping and hierarchical talent circles. As in the ego jobtransition network, two organizations should be assigned toa circle if they are closely connected and also similar to eachother. The basic assumptions of our model have three as-pects. First, two nodes have a possibility to form an edgewhen they are co-appearing in a circle. Second, circles thatcontain stronger connected nodes lead to a higher possibilityto form edges among contained nodes. Third, two nodes ina circle with higher belongingness also have a higher edgeexistence possibility.

The probability of edges is generated based on the aboveassumptions. We then compare the inferred probability withthe edge weight in an ordered manner. The objective of themodel is to maximize the weighted similarity between theweight order of neighbors and the inferred probability ofedges around an organization.

To build up the metrics of our model, we first define theconcepts of natural ordered neighbor sequence, circle and be-longing coefficient as follows.

Natural Ordered Neighbor Sequence. Given a nodevi, the natural ordered neighbor sequence Ai is the descend-ing ordered list of neighbors according to the edge weight{wi,k|ei,k ∈ E}. Formally, we have

Ai = (..., vj , vj+1, ...),

where wi,j ≥ wi,j+1. We use Ai,k to refer to the k-th node inAi. For example, if vi connects to {v1, v2, v3} and weightsamong them satisfy (wi,2 > wi,1, wi,1 > wi,3), then Ai =(v2, v1, v3). The neighbors with the same edge weight arearranged by the indexing (e.g., alphabetic) order. Whenconstructing the model for a specific organization, the egonode is excluded from the neighbor list.

Talent Circle. A circle is a subset of neighbors of anego node, where nodes within a circle are closely connectedand similar to each other. We use {Cm ∈ C} to denotecircles, where m = 1, 2, ...,M and Cm ⊆ V . Intuitively,on the one hand, if the circles are appropriately detected,nodes with strong connections should more likely be settledin the same circle than the nodes with weak connections. Onthe other hand, from the perspective of the node similarity,similar nodes are more likely to appear in the same circle.Meanwhile, circles could be hierarchical, which means circleswith strongly connected nodes could be contained in circleswith weakly connected nodes. Moreover, circles could alsobe overlapping. In other words, a node can belong to morethan one circle.

Belonging Coefficient. Since circles may be overlap-ping, a node could more tightly belong to a circle than an-other circle. The belonging coefficient is defined as the tight-ness strength of a node to a circle. Specifically, we use abelonging matrix Ui,m ∈ R(0 ≤ Ui,m ≤ 1, i ∈ 1, 2, ..., N,m ∈1, 2, ...,M) to denote the belonging strength of node vi tocircle Cm, and a larger value of Ui,m to indicate a tighterbelongingness. In our model, Ui,m is defined as the averagemutual similarity between node vi and the rest of the nodesin circle Cm, as shown in Equation 5. When there is onlyone node in a circle, the belonging coefficient is assigned to1. In particular, the belonging coefficient affects the infer-ring of the edge existence probability in the model as well asthe node-circle belonging dynamics in the learning process.

Ui,m =1

|Cm| − 1

∑vj∈Cm,j 6=i

Sim(vi, vj). (5)

4.2 Model FormalizationAccording to our assumptions, a pair of nodes within a

circle has a possibility to generate an edge, and the gener-ated edge weight is determined by two factors: the numberof common circles they belong to, and the belonging coef-ficient of the node pair to the common circles. In reality,if two companies exchange employees more frequently, theyshould be more likely to be placed in the common circles. Weuse an iterative method to approximate the real segmenta-tion and start from a simple temporary status where circlesare hypothetically existing. Accordingly, the probability oftwo nodes forming an edge is defined based on node-circlebelonging relation in the temporary status.

In a temporary status, all nodes are assigned into cir-cles. We define an indicator ρ(i, j) to capture the edge

Page 5: Talent Circle Detection in Job Transition NetworksTalent Circle Detection in Job Transition Networks Huang Xuy, Zhiwen Yuy, Jingyuan Yangz, Hui Xiongz, Hengshu Zhu] yNorthwestern Polytechnical

existence possibility (or weight strength) between node viand vj . Specifically, the value of ρ(i, j) is high if vi andvj are put into a common circle and have high belongingcoefficients to the circle. For each node vj which satisfies|{Cm|{vi, vj} ⊆ Cm}| > 0, ρ(i, j) is defined as follows:

ρ(i, j) = exp {∑

{Cm|{vi,vj}⊆Cm}

Ui,m · Uj,m

Sim(i, j)−1 − Tm + η}, (6)

where Tm is the minimal Uj,m for all the vj ∈ Cm, and ηis the maximal Tm for all circles Cm. In other words, Tmis a similarity threshold of circle Cm, which indicates thetightness of nodes within a circle. η is used to guarantee thatthe value of (−Tm+η) is positive and has a higher value whenTm gets smaller. A smaller value of Tm reduces the value ofρ(i, j), such that a pair of nodes in lower similarity thresholdcircles generates lower ρ(i, j) value. On the contrary, Ui,m

and Uj,m guarantee that ρ(i, j) has a high value if both ofthe belonging coefficients are high.

Similar to the natural ordered neighbor sequence Ai, wearrange all the neighbors of node vi according to ρ(i, j) in de-scending order as Bi = (..., vj , vj+1, ...), where ρi,j ≥ ρi,j+1.

Let we use Bi,k to refer to k-th node in Bi. Ideally, givena pair of nodes (vj , vl), if wi,j > wi,l and the edge weightis well inferred, it should have ρi,j > ρi,l. The opposite re-lation indicates wrongly assigned circles. Thus we treat thesequence difference between Ai and Bi as the goodness ofcircle division. If wi,j > wi,l and ρi,j < ρi,l, then the pairof nodes (vj , vl) is an inversion of Ai and Bi. The inver-sion number is the total inversions between two sequencesas inv(vi) = |{(vj , vl)|wi,j > wi,l ∧ ρi,j < ρi,l}|. Accordingto the definition, there is no inversion when edge weightsare perfectly inferred. In addition, the more error exists,the larger inversion number will be conducted.

However, the calculation of inversion number is computa-tional complex, because the length and elements of Ai andBi are not always match. Fortunately, since what we needis a metric that can measure the difference of two weightedsequences, we propose to use Normalized Discounted Cumu-lative Gain (NDCG)[7] for measuring the differences instead.Specifically, NDCG is a measure of ranking quality in infor-mation retrieval by calculating the weighted ranking resultsaccording to ideal ranking lists. When replacing documentwith neighbors, and relevance with edge weight in the defini-tion, NDCG is suitable for measuring the difference betweenBi and Ai. The basic assumptions of NDCG can be graftedon to measure sequence difference smoothly:

• Highly weighted neighbors are more meaningful whenappearing earlier in Bi.

• Highly weighted neighbors are more meaningful thanmarginally weighted neighbors, which are in turn moremeaningful than unconnected nodes.

By the migration of conceptions, Discounted CumulativeGain (DCG) of node vi is defined as follows.

DCGi =

deg(vi)∑k=1

2wi,k − 1

log2 (k + 1), (7)

where k is the sequence order index in Bi,k. According tothe definition, DCGi increases when a node with higherweighted edge be moved before a lower one in Bi. In otherwords, DCGi is high if Bi is close to Ai, especially when

the prefix of Bi is close to the prefix of Ai. Ideal DCGi

(IDCGi) is calculated based on the neighbor sequence ofAi,k, and it is the upper bound of DCGi.NDCGi is then defined as normalizing DCGi by IDCGi.

Note that Bi is the list of neighbors that co-appear with viin circles, so the length of Bi is usually not equal to |Ai| inreality. Typically, Bi may contain many elements as suffixthat not belongs to Ai. For example, if nodes in Ai are(v1, v2), then B′i = (v1, v2, v3, v4) and B′′i = (v1, v2) willhave the same NDCGi. Thus we use set difference Ai4Bi

to penalize it, and define objective function of our model inEquation 8:

FU =

N∑i=1

(ni∑i ni· DCGi

IDCGi· (1− |Ai 4Bi|

|Ai ∪Bi|)), (8)

where U is the belonging coefficient matrix and acts as adynamic parameter in the learning process. Meanwhile, FU

is related to the proportion of transitions such that a largernode has a higher priority.

5. MODEL LEARNING ALGORITHMIn this paper, we propose to use a bottom up learning al-

gorithm starting from a simple node-circle belonging statusto maximize objective Equation 8, as shown in Algorithm 1.

In the initial step, each node vi is assigned to a corre-sponding circle Ci, such that Ci = {vi}, i = 1, 2, ..., N . Ui,m

is assigned as an identity matrix, since the node-circle be-longing coefficient is 1 if there is only one node in the circle.For each node, we first compute the natural ordered neighborsequence {Ai} according to {wi,k|ei,k ∈ E}. As illustratedin the left side of Figure 5, at step 0, U0 is an identity matrixthat encodes initial node-circle belonging relationships.

vivjvk

cx cy czcw

vl

vivjvk

cx cy czcw

vl1 0 0 00 1 0 00 0 1 00 0 0 1

0.4 0 0.2 0

0 0 0.4 0

0.4 0 0 1

0 0 0.3 0

U0 Ut

Figure 5: Learning process illustration.

In the learning process, the algorithm iteratively removea node from circles it belongs to and add the node into othercircles to maximize the objective function in Equation 8. Ineach step, the leaving and joining circles are determined byboth the belonging coefficient and the circle size. Let weuse notation NCt

i = {Ctm|vi ∈ Ct

m} to denote node-circlemembership in iteration t, which is a binary version of non-zero elements in the corresponding row of Ui,m. The learningprocess then changes NCt

i by adding and removing nodes incircles, and evaluates the value of FU of the temporal statusto decide whether accept or reject the change.

Specifically, in iteration t, for each node vi, we first calcu-late the edge existence possibility according to Equation 6,

Page 6: Talent Circle Detection in Job Transition NetworksTalent Circle Detection in Job Transition Networks Huang Xuy, Zhiwen Yuy, Jingyuan Yangz, Hui Xiongz, Hengshu Zhu] yNorthwestern Polytechnical

and get ordered neighbor sequence Bi. Then, we adjust thenode-circle membership as follows:

1. Remove vi from drm circles. drm is determined byBi, belongingness, and the number of circles which vibelongs to. If Bi contains more nodes that not belongto Ai, then vi should be removed from more circles.Meanwhile, the node should leave from lower Ui,m cir-cles in higher priority. Let Bi\Ai denote the set dif-ference of Bi and Ai, then we define

drm = |NCti | ·|Bi\Ai||Bi|

.

Next, we delete vi from drm circles by randomly se-lecting circles in {m|m ∈ NCt

i} according to Pr(i,m)

which is the inverse order of Ui,m, as Pr(i,m) =Ui,m∑m Ui,m

.

2. Add vi to dadd circles. To increase the value of FU ,vi should be added to more circles while Ai has manyitems not belonging to Bi. We define the number ofcircles to add as

dadd =1

|NCi|(∑i

|NCi| − |NCi|) ·|Ai\Bi||Ai|

.

Because Ai is much longer than Bi in the early stageof iteration, dadd is usually too large to converge. Thismay trigger the explosion of circles that vi belongsto. Therefore, we set 1

|NCi|to limit the scale of circle

candidates. We first add nodes to circles with largesize and high belongingness, and avoid adding nodesto empty circles. So vi is added to dadd circles byrandomly selecting circles in {m|m /∈ NCt

i} according

to probability distribution Pa(i) =|Cj |∑

k/∈NCti|Ck|

.

3. Calculate temporary ρti,j. We calculate ρti,j (Equa-tion 6) according to NCt

i for all neighbors of vi. Then,we form Bt

i by ordering ρti,j .

After the node-circle removing and adding step, we up-date Ui,m for all the node-circle pairs and then calculate F t

U

based on Ui,m, Ai and Bti . If F t

U > F , we accept the newcircles, and set NCi = NCt

i and F = F tU . The iteration re-

peats until F becoming stable after sufficient steps. Whenthe iteration stops, Ui,m contains the detected node-circlebelonging relationship.

6. EXPERIMENTAL RESULTSIn this section, we evaluate our talent circle detection

model with extensive experiments on real-world data.

6.1 Experimental DataIn this paper, we use a representative real-world data set

from one of the largest commercial OPNs, to study the circledetection problem. Specifically, the data set contains 1.98million professional profile pages from Sept. 2014 to Dec.2015. Each page contains a job experience list in whicheach item shows the job title, the organization, the timeperiod in a monthly granularity and a brief text about thejob description [17]. According to the URL of organizationsin resume records, there are 649,076 organizations in total.Since we are focusing on job-hopping behaviors among or-ganizations, work experience records with keywords indicat-ing freelancers like “freelance”, “self employed” and “work at

Algorithm 1 Talent circle detection by maximizing NDCG

Require: Similarity matrix SIM , IDCG, Ai, Maximal it-eration steps ϕ

Ensure: Node-circle belonging coefficient matrix UInitial: Ui,k ← I, NC0

i ← {vi}, F ← 0while F changed in last ϕ steps do

Calculate ρti,j , then Arrange Bi

for Node vi ∈ V dodrm ← |NCt

i | · |Bi\Ai||Bi|

rmc← sample drm circles from NCi by Pr(i,m)NCt

i ← setdiff(NCi, rmc)

dadd ← 1|NCi|

(∑

i |NCi| − |NCi|) · |Ai\Bi||Ai|

adc← sample dadd circles by Pa(i)NCt

i ← setunion(NCti , adc)

end forCalculare Ui,j

F tU ← FU (NCt, Bi, SIM, IDCG)

if F tU > F thenF ← F t

U

NCi ← NCti

end ifend while

home” are neglected. Table 1 shows the detailed statisticsof our data set.

There are three steps of data pre-processing, includingjob transitions transformation, job title categorization andtransition network initialization.

Table 1: Job transition record statistics.Data Capacity Data Capacity

Resume 1,980,000 Job title 284,245Transition 2,123,383 Category 10

Organization 649,076 Average flow 30

Job Transition Transformation. Items in job resumeare transformed into job transitions by joining successivejob record pairs. Specifically, we sort the resume items bystarting time in ascending order, and then compare the enddate to identify job-hooping activities, as shown in Figure 6.If the end date of record j satisfies etj < etj+1, then we treat

c

j

Date

j+1

stj etj

stj+1a

stj+1b

stj+1cetj+1c

etj+1a

etj+1b

ab

Figure 6: Transforming resume into job transition.

record pair j and j + 1 as a transition, as case c of j + 1 inthe figure. If the end date of j is later than the start date ofj+ 1, then we ignore the record pair j and j+ 1 because thej + 1 time period is included by the period j (overlapping),as case b in Figure 6. Otherwise, as in case a in the figure, wecheck record j + 2 (if it exists), and repeat the process untilthere is no more j+1 item. The job transition date is set as

Page 7: Talent Circle Detection in Job Transition NetworksTalent Circle Detection in Job Transition Networks Huang Xuy, Zhiwen Yuy, Jingyuan Yangz, Hui Xiongz, Hengshu Zhu] yNorthwestern Polytechnical

the start date of successor and the job title is determined bythe title of predecessor when it changed during a transition.

Common transition date lasts from 1970s to the date whenwe collect data, and most of the transitions happening after2000s. As shown in Figure 7, job transition frequency hasa clear cycle of 12 months, so we use 12-month as a timewindow in our experiments.

Job Transition Frequency

2010/01 2011/01 2012/01 2013/01 2014/01

0.3

0.5

0.7

0.9

Month

Freq

uenc

y

2012/01

2012/12

Figure 7: Aggregated job transition frequency.

Job Title Categorization. On professional profiles,users are free to specify their job titles, and usually writea short text description of their work [1]. In total, thereare 284,245 different titles in our data set. In our exper-iments, we categorized the job titles into a few dozens ofclasses according to job functions by using online API froma third-party tool called Autocoder 2, which can classifyjob related content such as resumes and job description toa standardized hierarchy of occupation categories known asthe Occupational Information Network. After that, we man-ually classified titles into 142 classes and further integratedinto 10 major categories, as shown in Table 2.

Table 2: Job title category.Category #Subclass #TitleHigh Tech 11 54,091Finance 10 20,195

Professional and Business Services 21 75,565Transportation 6 7,510

Consumer Goods 14 11,548Goods-Producing Industries 6 7,850

Public Administration 22 27,994Leisure and Hospitality 19 32,553

Education and Health Services 14 36,612Manufacturing 19 10,327

Transition network initialization. We aggregated allthe transitions at the organizational level and formed a jobtransition network. All the transitions from organization i toorganization j form a direct edge starting from i and endingat j with the transition volume ni,j . In average, there are 30transitions between two organizations, and 84% of the flowshappen among 20% of the nodes. Among all the transitions,approximately a quarter of the transitions took place withinthe same organization, and we ignored these loops becausethe ego is excluded in talent source identification.

6.2 Modularity based EvaluationTo evaluate the performance of our model in terms of cir-

cle detection, we compare our model with two state-of-the-art algorithms on modularity [10], which is a widely used2http://www.onetsocautocoder.com/plus/onetmatch

metric in community detection [15]. Since the original defi-nition of modularity is defined on non-overlapping commu-nities, in our experiments we use an extension introducedin [12]. Specifically, we first choose two latest algorithmsas baselines, namely Coordinate Ascent (CA) from socialnetwork analysis [9] and Simulated Annealing (SA) fromco-authorship network analysis [2]. Meanwhile, we choosetwo classic algorithms as baselines, namely Random Walks(RW) based community detection [13] and Edge Between-ness (EB) based method [11]. Since there are some differ-ences between network definitions, we follow the implicationof RW and EB in [5], and from original websites34 of fourmethods, respectively.

Modularity

MethodsOur Model CA SA RW EB

RW

EB

Our Model

CA

SA

Figure 8: The comparison results of modularity.

Because the belongingness is usually normalized to havea summation of 1, we normalize the belonging coefficient

Ui,m by its row sum, i.e., U i,m =Ui,m∑m Ui,m

, and we set be-

longing coefficient of nodes as the multiplicative inverse ofbelonged circles in CA and SA. The coefficient is set to 1in RW and EB since there is no corresponding definitionsin these two methods. Since there is a modularity in eachego network, we pay more attention to the performance onmajority egos and comparing the corresponding results ofego in other algorithms, as shown in Figure 8.

The result is based on the experiments on 7,000 egos whichare the top 10 percent nodes in degree. There is an average of5.6 circles and 3.1 organizations in each circle. As shown inFigure 8, the average modularity are 0.6376 (Our Model),0.5536 (CA), 0.5652 (SA), 0.5464 (RW), and 0.5420 (EB),respectively. As a result, our method outperforms all othermethods at least 7.2% in average. The majority of resultsof our method are also higher comparing to other methodsin despite of few outliers with low modularity. The resultsclearly indicate that our method is more accurate for circledetection in terms of modularity in the weighted and denselyconnected job transition networks.

6.3 Circle Case AnalysisAccording to the model assumptions, organizations within

a circle should have common interactions with the center or-ganization and also exchange similar employees among eachother. Thus different circles have different job transitioncharacteristics. To this end, we select the results of severalorganizations for analysis.

To interpret the results, we rank the organizations in thecircle according to the belonging coefficient and select top3http://cse.iitkgp.ac.in/resgrp/cnerg/circle/4http://cseweb.ucsd.edu/ jmcauley/

Page 8: Talent Circle Detection in Job Transition NetworksTalent Circle Detection in Job Transition Networks Huang Xuy, Zhiwen Yuy, Jingyuan Yangz, Hui Xiongz, Hengshu Zhu] yNorthwestern Polytechnical

ones as representative organizations. Besides, we extractrelated keywords from job description to facilitate the inter-pretation. We trace job transitions between ego and the setof companies in the circle, and extract job descriptions ofthese transitions. All the descriptions related to a circle arejoined into a document, and thus a set of documents formas a corpus. After a removal of stop words, we arrangedthe words in a document according term frequency inversedocument frequency (TF-IDF), and manually select severalrepresentative keywords in top 10% TF-IDF words.

AB

DIDG

VodafoneNetApp

MicroStrategy

Photon

Advertising.comMeebo

TechCrunch

POPSUGAR

HP

Millennial Media

Discovery Networks Benelux

TownsquareMedia

Comcast

WSJ Hearst

AbrilLos Angeles Times

AmdocsAdconion Media Group

Radio One

WundermanFoursquare

Last.fm

Thomas CookAOL

C

Figure 9: Circles of AOL. Organizations in circle (A)provide quantitative analysts. (B) provides digitalmarketing specialists. (C) provides customer opera-tion support. (D) provides sales marketing analysts.

As shown in Table 3, we display the representative orga-nizations and descriptive keywords for three ego examples,namely Citi Bank, AOL, and Accenture respectively. Fourcircles of each ego have been listed with corresponding key-words for job title description. For instance, since Citi is acorporate financial organization, most of the organizationsin the Citi circles are focus on financial services. Althoughcircle 1, 2, 3 of Citi are banking or related financial consult-ing services, the characteristics of talent sources from eachcircle are very different. For example, according the descrip-tion of keywords, circle 1 mainly involves the talents for taskssuch as corporate finance or project management, changemanagement. While circle 2 focuses on job applicants spe-cialized in corporate and financial risk management tasks,and circle 3 deals with talents for trading services, capitalmarkets, or investment banking. The companies in circle4 are mainly in IT services industry and the talent sourcefrom this circle are focusing on quantitative research, deskstrategist, and quantitative analytics for Citi bank.

In summary, most of the companies in the circles for theego are in the similar industry of the ego company, but dif-ferent circle shows different needs of talent specialties. Sim-ilar observations can be found on the media and advertisingcompany AOL (as illustrated in Figure 9) and the very suc-cessful consulting company Accenture. Keywords of job titlesummary based on the the top keywords of description is alsolisted in the last column of the result table.

6.4 Talent Exchange PredictionOne of the major application of talent circle detection in

job transition network is to identify the source organizationfor selecting candidates in hiring. In fact, the detected cir-cles are appropriate references for hiring different types ofemployees. For example, when the goal is to hire software

engineer, recruiters can refer to the organizations appearingin circles which have labels in“computer software”, “softwareengineer” or “programmer”, etc. It is possible to precisely lo-cate target organization by using the circles.

Rel

ated

Val

ue

Circle Feature Dimension

Consumer Goods

Education and Health

Finance

Goods-Producing Industries

High Tech

Leisure and Hospitality

Manufacturing

Professional and Business

Public Administration

Transportation

Figure 10: The feature distribution of typical circlesof Citi bank.

However, enumerating related labels is inefficient whenselecting circles for a job title category. As users are free todescribe their works, there are too many alternative wordsfor a title. There is a high possibility of mismatching bysimply indexing keywords to solve the problem. Therefore,we design a quantitative method to identify related circles byranking the average feature vectors. Specifically, the averagefeature of a circle Cm is defined as a vector FCm where eachdimension is defined as

FCim =

1

|Cm|∑

vj∈Cm

αij .

For a given ego, FCm can be normalized by subtracting theaverage circle feature 1

M

∑m FCm, where M represents the

number of circles. The normalized FCm is treated as the“feature of a circle”. Figure 10 shows the typical normalizedcircle feature distribution of ego “Citi”, in which each areais the value of a circle feature in 10 dimensions. It indicatesthat different circles have different feature peak dimensioncombination [20]. This phenomenon also appears in circlesof other egos. It suggests that average feature value is areasonable way to identify different circle properties. Wecan then rank the circle features by different dimensions andselect top ranked circles as related candidates.

Talent Exchange Prediction Problem. This prob-lem refers to predict where employees will leave for froman ego, based on the detected circles. In real job market,majority of employees will go to the same set of companieswhere the forerunner in the same job type went, becausethe sector usually keeps stable and forms some specific jobhooping trajectories. Although other employees may go toother companies, it is highly possible that they still stay inorganizations which have similar labor demand. Therefore,we predict that the employees in a given category will goto the circles that labeled in the same category. By tak-ing advantage of circle features, we design a simple talentexchange prediction mechanism.

For each given job title, we predict where employees withthe title will leave for. Specifically, we first rank the circlefeature [22, 19] according to the corresponding dimension ina descending order, and then select companies in top cir-cles as the predicted destination. We use the job transitiondestination from the ego in the next year as ground truth.The prediction and evaluation procedures have three stages.

Page 9: Talent Circle Detection in Job Transition NetworksTalent Circle Detection in Job Transition Networks Huang Xuy, Zhiwen Yuy, Jingyuan Yangz, Hui Xiongz, Hengshu Zhu] yNorthwestern Polytechnical

Table 3: Top circles: representative organization and keywords.Ego Circle Top Organization Top Keywords Job Title Summary

CitiBank

1Capco, RBS Markets and International Bank-ing, HSBC, PwC, Lehman Brothers, HSBCGlobal Banking and Markets, USAA

project, process, management, finance, backoffice, retail, firmwide, mssql, query, infras-tructure

Corporate finance,Project management,Change management

2UBS, Credit Suisse, Barclays InvestmentBank, Bank of Montreal, Newmark GrubbKnight Frank

stock, corporate, cash, compensation, extract,matlab, realtime, portfolio, oracle, model, data

Corporate/financialrisk management

3YES Bank, John Hancock Financial Services,HSBC Private Bank, CHASE

finance, insurance, banker, private, system-atic, equity, mortgage, rate, risk, business

Trading and Capitalmarkets, Investmentbanking

4Sun Microsystems, Google, Palantir Technolo-gies, D+H, Intel Corporation

visual, linear, predict, calculate, curve, java,system, algebra, compute, diagram, variable

Quantitative analytics,Desk strategist

AOL

1Millennial Media, POPSUGAR, DiscoveryNetworks Benelux, WSJ, Wunderman DC,Comcast, Townsquare Media

social, live, media, content, week, brand, ad-vertise, broadband, freelance, independent, lo-cal, publish, editor

Digital marketing

2Thomas Cook, Foursquare, Last.fm, RadioOne, Wunderman

accuracy, analyze, online, metric, sale, cus-tomer, website, trend, user response, webdata,revenue, adhoc, analyst, dashboard

Sales marketing, Digi-tal sales

3Comcast, Adconion Media Group, Abril,Hearst, Los Angeles Times, Amdocs, WSJ

center, call, service, agent, maintain, quick,complaint, response, issue, communicate, cus-tomer, monitor

Customer operation,Sales operation andsupport

4IDG, Photon, TechCrunch, Vodafone, MicroS-trategy, Advertising.com, Meebo, NetApp,Hewlett Packard Enterprise, POPSUGAR

data, report, media, web, strategy, advertise,market, analyze, feedback, hadoop, api, in-sight, module, benchmark, research, track, re-sponse, statistics, techniques

Web analytics, Quanti-tative research

Accentu

re

1Cognizant, Logica, Deloitte Digital, HewlettPackard Enterprise, Avanade, Gartner, Gtech

technology, enterprise, system, application,communication, defencework, enterprise, ar-chitecture

Business analyst, ERP,CRM, Enterprise Ar-chitecture

2Capgemini Consulting, Deloitte, EY, Alexan-der Mann Solutions, Towers Watson

agil, management waterfall, employee, execu-tion, procurement, trainer

Technology projectmanagement

3Deloitte, Morgan Stanley, American Express,PwC, Standard Chartered Bank, AlexanderMann Solutions

e-source, spend, procurement, pension, fi-nance, pay, pipeline, workforce, resource

Financial and Account-ing services

4Microsoft, IBM, Google, Oracle, Amazon,Ebay, Cisco, Facebook, Tata Consultancy,CSC, Sopra Steria, EMC, Avanade, Altran

digit, advance, project, innovation, deploy,data, interface, module, architecture, technol-ogy, crm, methodology, network, database

System or Enterpriseintegration and imple-mentation

First, for a given ego and title, we rank circle features bycorresponding dimension in descending manner. Then weselect organizations in top ranked circles as candidates. Fi-nally, we compare predicted organization candidates withthe companies which employees in the same title categoryleave for in the next 12 months.

Prec

isio

n an

d R

ecal

l

Number of Circles

PrecisionRecall

1 2 3 4 5 6 7 8

Figure 11: The precision and recall performance oftalent exchange prediction.

We measure the precision and recall to evaluate the per-formance of talent exchange prediction. The circles are de-tected based on the data that have transitions during Jan-uary 2013 and December 2013, and test data are the tran-sitions during January 2014 and December 2014. Figure 11shows the average, upper and lower bound of precision andrecall for 10 job categories of Citi. As shown in the result,

the precision and recall depend on the number of circles weused in prediction. We find that, the precision firstly in-creases with the increase of circles, and then reaches a peakof 73% in 4-5 circles, while it finally decreases when the num-ber of circles larger than 6. The recall increases consistentlywith the number of circles, and stays stable after 6 circles in65%. The results indicate that a majority of employees ina given job category will leave for companies that appearsin the circles with corresponding characteristics. It suggeststhat the detected circles are appropriate node separationfrom the perspective of talent exchange flow.

7. RELATED WORKIn this section, we review two categories of literatures that

are related to this paper, namely research on data miningfor recruitment analysis, and research on circle detection insocial networks.

Data mining for recruitment analysis. Recent yearshave witnessed the increasing popularity of using data min-ing techniques for addressing human resource management(HRM) problems [14]. The recruitment process is one of theimportant sub-domains of the HRM. Although many classicdata mining tasks, such as classification, association rulesand clustering [4, 6], have been performed to recruitmentfor personnel selection and talent prediction, only few exist-ing work is focus on the analysis of job transition networkfor recruitment. For example, [3] proposed a real-time sys-tem for mining job-related patterns from social media byanalyzing the job transition network. [17] showed a novelapproach for modeling the professional similarity by miningprofessional career trajectories.

Page 10: Talent Circle Detection in Job Transition NetworksTalent Circle Detection in Job Transition Networks Huang Xuy, Zhiwen Yuy, Jingyuan Yangz, Hui Xiongz, Hengshu Zhu] yNorthwestern Polytechnical

Circle detection in social networks. Social circle de-tection in ego networks is first proposed by McAuley andLeskovec [8, 9]. They proposed the node clustering problemand developed a model for identifying the circles includingboth network structure and user profile information fromseveral popular social networks. Recently, [2] applied thetechniques of automatic circle detection in an ego networkon the field of co-authorship network by proposing an unsu-pervised method that combines both various node featuresand node similarity measures. Also, [18] used a multi-viewclustering method for automatically detecting the social cir-cles. Due to the different background in the job transitionnetworks, although we have similar problem of discoveringcircles in the ego networks, the node features and node struc-tures are very different. Therefore, different objective func-tion and learning process have to be designed specificallyfor the job network scenario. To the best of our knowledge,this is the first attempt to detect circles for each companyin a job transition network. We believe the findings fromour model can further help to enhance the effectiveness ofhuman resource tasks, such as staffing.

8. CONCLUSIONIn this paper, we investigated how to identify the right

talent sources for recruitment from Online Professional Net-works (OPNs). Along this line, we first created a job transi-tion network based on job transition trajectories at the orga-nization level. Then, we proposed a talent circle detectionmodel for extracting talent circles from the job transitionnetwork in a way that every circle includes the organiza-tions with similar talent exchange patterns. With the helpof these talent circles, the organizations can find the righttalent for recruitment and the job seekers can locate suitablejobs for themselves. Moreover, based on these identifiedcycles, we developed a talent exchange prediction methodto predict the possible destination companies for the jobhopping employees. As shown in the experimental resultson real-world OPNs data, our approach outperformed thebenchmark methods in terms of modularity.

9. ACKNOWLEDGMENTSThis research was supported in part by the National Basic

Research Program of China (No. 2015CB352400), the Na-tional Natural Science Foundation of China (No. 71329201,61373119, 61332005, 61402369), Microsoft, and the Rutgers2015 Chancellor’s Seed Grant Program.

10. REFERENCES[1] R. Bekkerman and M. Gavish. High-precision

phrase-based document classification on a modernscale. In SIGKDD. ACM, 2011.

[2] T. Chakraborty, S. Patranabis, P. Goyal, andA. Mukherjee. On the formation of circles inco-authorship networks. In SIGKDD. ACM, 2015.

[3] Y. Cheng, Y. Xie, Z. Chen, A. Agrawal,A. Choudhary, and S. Guo. Jobminer: A real-timesystem for mining job-related patterns from socialmedia. In SIGKDD. ACM, 2013.

[4] C.-F. Chien and L.-F. Chen. Data mining to improvepersonnel selection and enhance human capital: Acase study in high-technology industry. ExpertSystems with applications, 2008.

[5] G. Csardi and T. Nepusz. The igraph software packagefor complex network research. InterJournal, 2006.

[6] H. Jantan, A. R. Hamdan, and Z. A. Othman.Knowledge discovery techniques for talent forecastingin human resource application. World Academy ofScience, Engineering and Technology, Penang,Malaysia, 2009.

[7] K. Jarvelin and J. Kekalainen. Cumulated gain-basedevaluation of ir techniques. TOIS, 2002.

[8] J. Leskovec and J. J. Mcauley. Learning to discoversocial circles in ego networks. In NIPS, 2012.

[9] J. Mcauley and J. Leskovec. Discovering social circlesin ego networks. TKDD, 2014.

[10] M. E. Newman. Modularity and community structurein networks. PNAS, 2006.

[11] M. E. Newman and M. Girvan. Finding andevaluating community structure in networks. Physicalreview E, 2004.

[12] V. Nicosia, G. Mangioni, V. Carchiolo, andM. Malgeri. Extending the definition of modularity todirected graphs with overlapping communities.Journal of Statistical Mechanics: Theory andExperiment, 2009.

[13] P. Pons and M. Latapy. Computing communities inlarge networks using random walks. In Computer andInformation Sciences-ISCIS 2005. Springer, 2005.

[14] S. Strohmeier and F. Piazza. Domain driven datamining in human resource management: A review ofcurrent research. Expert Systems with Applications,2013.

[15] Z. Wang, D. Zhang, X. Zhou, D. Yang, Z. Yu, andZ. Yu. Discovering and profiling overlappingcommunities in location-based social networks. IEEETransactions on Systems, Man, and Cybernetics:Systems, 44(4):499–509, 2014.

[16] H. Xu, Z. Yu, H. Xiong, B. Guo, and H. Zhu. Learningcareer mobility and human activity patterns for jobchange analysis. In ICDM, pages 1057–1062. IEEE,2015.

[17] Y. Xu, Z. Li, A. Gupta, A. Bugdayci, and A. Bhasin.Modeling professional similarity by mining professionalcareer trajectories. In SIGKDD. ACM, 2014.

[18] Y. Yang, C. Lan, X. Li, B. Luo, and J. Huan.Automatic social circle detection using multi-viewclustering. In CIKM. ACM, 2014.

[19] Z. Yu, Z. Wang, H. He, J. Tian, X. Lu, and B. Guo.Discovering information propagation patterns inmicroblogging services. TKDD, 10(1):7, 2015.

[20] Z. Yu, H. Xu, Z. Yang, and B. Guo. Personalizedtravel package with multi-point-of-interestrecommendation based on crowdsourced userfootprints. IEEE Transactions on Human-MachineSystems, 46(1):151–158, Feb 2016.

[21] J. Zhao, J. Wu, X. Feng, H. Xiong, and K. Xu.Information propagation in online social networks: atie-strength perspective. Knowledge and InformationSystems, 32(3):589–608, 2012.

[22] H. Zhu, E. Chen, H. Xiong, H. Cao, and J. Tian.Ranking user authority with relevant knowledgecategories for expert finding. World Wide Web,17(5):1081–1107, 2014.