social network extraction of academic researchers

35
1 Social Network Extraction of Academic Researchers Jie Tang, Duo Zhang, and Limin Yao Tsinghua University Oct. 29 th 2007

Upload: cloris

Post on 03-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Social Network Extraction of Academic Researchers. Jie Tang, Duo Zhang, and Limin Yao Tsinghua University Oct. 29 th 2007. Outline. Motivation Related Work Problem Description Our Approach Experimental Results Summary. Motivation. More and more online social networks become available - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Social Network Extraction of Academic Researchers

1

Social Network Extraction of Academic Researchers

Jie Tang, Duo Zhang, and Limin YaoTsinghua University

Oct. 29th 2007

Page 2: Social Network Extraction of Academic Researchers

2

Outline

• Motivation

• Related Work

• Problem Description

• Our Approach

• Experimental Results

• Summary

Page 3: Social Network Extraction of Academic Researchers

3

Motivation

• More and more online social networks become available– e.g., YouTube.com, Facebook.com, etc.

• However, the social networks are usually separated• A question arises: can we build a integrated social

network from the separated ones automatically?• As a case study, how to build an social network

automatically for academic community? – ArnetMiner.org

Page 4: Social Network Extraction of Academic Researchers

4

Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: [email protected] Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Vision Group which is part of the Math Sciences Department.

Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications.

Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology.

DBLP: Ruud Bolle 2006

Nalini K. Ratha, Jonathan Connell, Ruud M. Bolle, Sharat Chikkerur: Cancelable Biometrics: A Case Study in Fingerprints. ICPR (4) 2006: 370-373

EE50

Sharat Chikkerur, Sharath Pankanti, Alan Jea, Nalini K. Ratha, Ruud M. Bolle: Fingerprint Representation Using Localized Texture Features. ICPR (4) 2006: 521-524

EE49

Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti, Ruud M. Bolle: Appearance models for occlusion handling. Image Vision Comput. 24(11): 1233-1243 (2006)EE48

2005

Ruud M. Bolle, Jonathan H. Connell, Sharath Pankanti, Nalini K. Ratha, Andrew W. Senior: The Relation between the ROC Curve and the CMC. AutoID 2005: 15-20EE47

Sharat Chikkerur, Venu Govindaraju, Sharath Pankanti, Ruud M. Bolle, Nalini K. Ratha: Novel Approaches for Minutiae Verification in Fingerprint Images. WACV. 2005: 111-116

EE46

...

Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: [email protected] Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Vision Group which is part of the Math Sciences Department.

Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications.

Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology.

Motivating ExampleContact Information

Educational history

Academic services

Publications

1

1

2

2

Ruud Bolle

Position

Affiliation

Address

Address

Email

PhdunivPhdmajor

Phddate

Msuniv

Msdate

Msmajor

BsunivBsdate

Bsmajor

Research Staff

IBM T.J. Watson Research CenterP.O. Box 704Yorktown Heights, NY 10598 USA

[email protected]

Brown University

1984

Electrical Engineering

Delft University of Technology

Analog Electronics

1977

Delft University of Technology

IBM T.J. Watson Research Center19 Skyline DriveHawthorne, NY 10532 USA

IBM T.J. Watson Research Center

Electrical Engineering

1980

Applied Mathematics

Msmajor

http://researchweb.watson.ibm.com/ecvg/people/bolle.html

Homepage

Ruud BolleName

video database indexingvideo processing

visual human-computer interactionbiometrics applications

Research_Interest

Photo

Publication 1#

Cancelable Biometrics: A Case Study in Fingerprints

ICPR 370

2006

Date

Start_page

Venue

Title

373

End_page

Publication 2#

Fingerprint Representation Using Localized Texture Features

ICPR 521

2006

Date

Start_page

Venue

Title

524

End_page

. . .

Co-authorCo-author

1

Ruud Bolle

2Publication #3

Publication #5

coauthor

coauthor

UIUCaffiliation

Professorposition

2

1

Page 5: Social Network Extraction of Academic Researchers

5

Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: [email protected] Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Vision Group which is part of the Math Sciences Department.

Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications.

Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology.

DBLP: Ruud Bolle 2006

Nalini K. Ratha, Jonathan Connell, Ruud M. Bolle, Sharat Chikkerur: Cancelable Biometrics: A Case Study in Fingerprints. ICPR (4) 2006: 370-373

EE50

Sharat Chikkerur, Sharath Pankanti, Alan Jea, Nalini K. Ratha, Ruud M. Bolle: Fingerprint Representation Using Localized Texture Features. ICPR (4) 2006: 521-524

EE49

Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti, Ruud M. Bolle: Appearance models for occlusion handling. Image Vision Comput. 24(11): 1233-1243 (2006)EE48

2005

Ruud M. Bolle, Jonathan H. Connell, Sharath Pankanti, Nalini K. Ratha, Andrew W. Senior: The Relation between the ROC Curve and the CMC. AutoID 2005: 15-20EE47

Sharat Chikkerur, Venu Govindaraju, Sharath Pankanti, Ruud M. Bolle, Nalini K. Ratha: Novel Approaches for Minutiae Verification in Fingerprint Images. WACV. 2005: 111-116

EE46

...

Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: [email protected] Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Vision Group which is part of the Math Sciences Department.

Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications.

Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology.

Motivating ExampleContact Information

Educational history

Academic services

Publications

1

1

2

2

Ruud Bolle

Position

Affiliation

Address

Address

Email

PhdunivPhdmajor

Phddate

Msuniv

Msdate

Msmajor

BsunivBsdate

Bsmajor

Research Staff

IBM T.J. Watson Research CenterP.O. Box 704Yorktown Heights, NY 10598 USA

[email protected]

Brown University

1984

Electrical Engineering

Delft University of Technology

Analog Electronics

1977

Delft University of Technology

IBM T.J. Watson Research Center19 Skyline DriveHawthorne, NY 10532 USA

IBM T.J. Watson Research Center

Electrical Engineering

1980

Applied Mathematics

Msmajor

http://researchweb.watson.ibm.com/ecvg/people/bolle.html

Homepage

Ruud BolleName

video database indexingvideo processing

visual human-computer interactionbiometrics applications

Research_Interest

Photo

Publication 1#

Cancelable Biometrics: A Case Study in Fingerprints

ICPR 370

2006

Date

Start_page

Venue

Title

373

End_page

Publication 2#

Fingerprint Representation Using Localized Texture Features

ICPR 521

2006

Date

Start_page

Venue

Title

524

End_page

. . .

Co-authorCo-author

1

Ruud Bolle

2Publication #3

Publication #5

coauthor

coauthor

UIUCaffiliation

Professorposition

2

1

Two key issues:• How to accurately extract the researcher profile information from the Web?• How to integrate the information from different sources?

Page 6: Social Network Extraction of Academic Researchers

6

Outline

• Motivation

• Related Work

• Problem Description

• Our Approach

• Experimental Results

• Summary

Page 7: Social Network Extraction of Academic Researchers

7

Related Work – Person Profiling

• Profile Information Extraction– E.g., Yu et al. (2005), resume IE– Alani et al. (2003), Artequakt system

• Contact Information Extraction– E.g., Kristjansson et al. (2004), Interactive extraction– Balog and Rijke (2006), Heuristic rules

• Information Extraction Methods– E.g., HMM (Ghahramani, 1997), – MEMM (McCallum, 2000), – CRFs (Lafferty, 2001)

Page 8: Social Network Extraction of Academic Researchers

8

Related Work – Name Disambiguation

• Unsupervised Methods– Hierarchy clustering, K-way spectral clustering, etc.– E.g. Han (2005), Mann (2003), Tan (2006)

• Supervised Methods– Support Vector Machines, Naïve Bayes, etc.– E.g. Han (2004)

• Graph-based Approach – Random Walk, etc.– E.g. Bekkerman (2005), Malin (2005), Minkov (2006)

Page 9: Social Network Extraction of Academic Researchers

9

Outline

• Motivation

• Related Work

• Problem Description

• Our Approach

• Experimental Results

• Summary

Page 10: Social Network Extraction of Academic Researchers

10

Researcher Social Network Extraction

Researcher

Homepage

Phone

Address

Email

Phduniv

Phddate

PhdmajorMsuniv

Bsmajor

Bsdate

Bsuniv

Affiliation

Postion

Msmajor

Msdate

Fax

Person Photo

Publication

Research_Interest

NameAuthored

Title

Publication_venue

Start_page

End_page

Date

Coauthor

70.60% of the researchers have at least one homepage

or an introducing page

There are a large number of person names having the

ambiguity problem

85.6% from universities

14.4% from companies

71.9% are homepages

28.1% are introducing

pages

60% are natural language text

40% are in lists and tables

70% moved at least one time

Even 3 “Yi Li” graduated the author’s lab

Page 11: Social Network Extraction of Academic Researchers

11

Outline

• Motivation

• Related Work

• Problem Description

• Our Approach

• Experimental Results

• Summary

Page 12: Social Network Extraction of Academic Researchers

12

Markov Random Field

aYbY

cY

eY

fY

dY

)~||()||( ijjiijji YYYYPYYYYP

Special Cases: - Conditional Random Fields - Hidden Markov Random Fields

Markov Property:

Page 13: Social Network Extraction of Academic Researchers

13

CRFs

He is a Professor at

OTH OTH

POS

AFF

POS

OTH

POS

UIUC

OTH

POS

AFF

OTH

POS

AFF

……

AFF

OTH

POS

AFF

… ADR

AFF

ADR

- Green nodes are hidden vars, - Purple nodes are observations

, ,

1( | ) exp ( , | , ) ( , | , )

( ) j j e k k ve E j v V k

p y x t e y x s v y xZ x

Page 14: Social Network Extraction of Academic Researchers

14

…..

ALC

FUC

AMC

PRV

RPA

DEL

PSB

PRV

DEL

PRV

DEL

AUC AUC

ALC

FUC

AMC

AUC

ALC

FUC

AMC

PRV

DEL

AUC

ALC

FUC

AMC

ofRuud is Fellow the IEEEBolle a

Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology...

ofRuud is Fellow the IEEE

…..

ALC

FUC

AMC

PRV

RPA

DEL

PSB

PRV

DEL

PRV

DEL

AUC AUC

ALC

FUC

AMC

AUC

ALC

FUC

AMC

Bolle a

PRV

DEL

AUC

ALC

FUC

AMC

Processing Flow for ProfilingPreprocessing

Determine Tokens

Standard word

Special word

Image Token

Term

Punc. mark

Labeling data

Labeled data

Learning a CRF model

Train

Test

Assigning tags

A unified tagging model

Model Learning

Tagging

Tagging results

Inputted docs

Featuredefinitions

Document

12

3

He obtained his BS in Computer Science in 1999...

Ruud M. Bolle is a Fellow of the IEEE...

...

Scienceobtained BS Computer

…..

ALC RPAPRV ALC FUC

his in

PRV

Page 15: Social Network Extraction of Academic Researchers

15

Token Definitions

Standard word Words in natural language

Special word

Including several general ‘special words’

e.g. email address, IP address, URL, date, number, money, percentage, unnecessary tokens (e.g. ‘===’ and ‘###’), etc.

Image token <IMAGE src="defaul3.jpg" alt=""/>

Term base NP, like “Computer Science”

Punctuation marks

Including period, question mark, and exclamation mark

Standard word Words in natural language

Special word

Including several general ‘special words’e.g. email address, IP address, URL, date, number, money, percentage, unnecessary tokens (e.g. ‘===’ and ‘###’), etc.

Image token

Punctuation marks

Including period, question mark, and exclamation mark

<IMAGE src="defaul3.jpg" alt=""/>

Term base NP, like “Computer Science”

Page 16: Social Network Extraction of Academic Researchers

16

Possible Tag Assignment

Token type Possible tags

Standard word All possible tags

Special wordPosition, Affiliation, Address, Email,

Phone, Fax, Phd/Ms/Bs-date

Image token Photo, Email

Term tokenPosition, Affiliation, Address, Phd/Ms/Bs-

univ, Phd/Ms/Bs-major

Punctuation marksPosition, Affiliation, Address, Email,

Phone, Fax, Phd/Ms/Bs-date

Page 17: Social Network Extraction of Academic Researchers

17

Feature Definition• Content features

Word features

Morphological features

Whether the current token is a word

Whether the word is capitalized

Standard Word

Image Token

Image size The size of the image

Image height/width ratio The value of height/width. The value of a person photo is often larger than 1

Image format JPG or BMP

The number of the “unique color” used in the image and the number of bits used for per pixel, i.e. 32,24,16,8,1

Face recognition Whether the current image contains a person face

Whether the filename contains (partially) the researcher name

Image filename

Whether the “alt” of the image contains (partially) the researcher name

Image “ALT”

Image positive keywords “myself”, “biology”

Image negative keywords “ads”, “banner”, “logo”

Image color

Page 18: Social Network Extraction of Academic Researchers

18

Feature Definition

• Pattern features

• Term features

Positive words “Fax:” for Fax, “director” for Position

Special tokensWhether the current token is a special

word

Term Whether the current token is a term

DictionaryWhether the current token is included

in a dictionary

Page 19: Social Network Extraction of Academic Researchers

19

Our Method to Name Disambiguation

• A hidden Markov Random Field model

• Observable Variables X represent publications

• Hidden Variables Y represent the labels of publications

• Constraints define the dependencies over hidden variables

x1

x2

x3

co-conference

cite

coauthor

cite

coauthor

coauthor

coauthort- coauthor

cite

co-conference

x4

x5

x6

x7

x8

x9

x10

x11

y1=1

y8=1

y9=3

y11=3

y10=3

y2=1

y3=1

y4=2

y7=2

y6=2

y5=2

Page 20: Social Network Extraction of Academic Researchers

20

Objective Function

( | ) ( ) ( | )Y X Y X YP P P

2

1( | ) exp( ( , ))

i

i ix X

X Y D x yZ

P

1

1

1

1

1exp( ( ))

1exp( ( ))

1exp( ( , ))

1exp( ( , ) ( ) [ ( , )])

i

i

k

NN N

i j

i j i j k k i ji j c C

V YZ

V YZ

V i jZ

D x x I y y w c y yZ

{ ( , ) ( ) [ ( , )]} ( , ) logk i

obj i j i j k k i j i ii j c C x X

f D x x I y y w c y y D x y Z

maximize

minimize1 22

Page 21: Social Network Extraction of Academic Researchers

21

Constraint Definition

p1: A, B, Cp2: A, Bp3: A, Dp4: C, D

Mp(0): p1 p2 p3

p1 p2

p3

1 0 00 1 00 0 1

p1: A, B, Cp2: A, Bp3: A, Dp4: C, D

Mp(1): p1 p2 p3

p1 p2

p3

1 1 01 1 00 0 1

p1: A, B, Cp2: A, Bp3: A, Dp4: C, D

Mp(2): p1 p2 p3

p1 p2

p3

1 1 11 1 01 0 1

p1: A, B, Cp2: A, Bp3: A, Dp4: C, D

Mp(3): p1 p2 p3

p1 p2

p3

1 1 11 1 11 1 1

C W Constraint Name Descriptionc1 w1 CoOrg ai

(0).affiliation = aj(0).affiliation

c2 w2 CoAuthor r, s>0, ai(r)=aj

(s)

c3 w3 Citation pi cites pj or pj cites pi

c4 w4 CoEmail ai(0).email = aj

(0).email

c5 w5 Feedback Constraints from user feedback

c6 w6 τ-CoAuthor one common author in τ extension

Page 22: Social Network Extraction of Academic Researchers

22

We define our distance function as follows:

where

We can see that actually maps each vector xi

into another new space, i.e. A1/2xi

To simplify our question, we define A as a diagonal matrix

Parameterized Distance Function

|| || Ti i iA Ax x x

( , ) 1|| || || ||

Ti j

i ji j

D A A

Ax xx x

x x

|| ||i Ax

Page 23: Social Network Extraction of Academic Researchers

23

• Initialization • use constraints to generate initial k clusters

• E-Step

• M-Step• Update cluster centroid• Update parameter matrix A

EM Framework

( , ) { ( , ) ( ) [ ( , )]} ( , )k

obj i i i j j k k i j i ii j c C

f y D x x I h l w c p p D x y

x

:

:

|| ||i

i

ii l h

ii

i l h

y

A

x

x

( , ) ( , ){ ( ) [ ( , )]}

k i

obj i j i ii j k k i j

i j c C x Xm m m

f D x x D x yI l l w c p p

a a a

2 2|| || || ||

|| || || ||( , ) 2 || || || ||

|| || || ||

im i jm jTim jm i j i j

i j i j

m i j

x x x xx x x x x x

D x x x x

a x x

2 2A A

A AA A

2 2A A

A

Page 24: Social Network Extraction of Academic Researchers

24

Outline

• Motivation

• Related Work

• Problem Description

• Our Approach

• Experimental Results

• Summary

Page 25: Social Network Extraction of Academic Researchers

25

Profiling Experiments

• Dataset– IK researchers from ArnetMiner.org

• Baseline– Amilcare– Support Vector Machines– Unified_NT (CRFs without transition features)

• Evaluation measures– Precision, Recall, F1

Page 26: Social Network Extraction of Academic Researchers

26

Profiling Results—5-fold cross validation

Profiling Task Unified Unified_NT SVM AmilcarePhoto 89.11 88.64 88.86 31.62

Position 69.44 64.70 64.68 56.48

Affiliation 83.52 72.16 73.86 46.65

Phone 91.10 78.72 79.71 83.33

Fax 90.83 64.28 64.17 86.88

Email 80.35 75.47 79.37 78.70

Address 86.34 75.15 77.04 66.24

Bsuniv 67.38 57.56 59.54 47.17

Bsmajor 64.20 59.18 60.75 58.67

Bsdate 53.49 40.59 28.49 52.34

Msuniv 57.55 47.49 49.78 45.00

Msmajor 63.35 61.92 62.10 57.14

Msdate 48.96 41.27 30.07 56.00

Phduniv 63.73 53.11 57.01 59.42

Phdmajor 67.92 59.30 59.67 57.93

Phddate 57.75 42.49 41.44 61.19

Overall 83.37 72.09 73.57 62.3083.37

Page 27: Social Network Extraction of Academic Researchers

27

Contribution of Features

68

70

72

74

76

78

80

82

84

F1

mea

sure

Features

content

content+term

content+pattern

all

Page 28: Social Network Extraction of Academic Researchers

28

Disambiguation Experiments

• Data Sets:

Name Set Publications Name Variations

C. Chang 402 97

G. Wu 152 46

K. Zhang 293 40

J. Li 551 102

B. Liang 55 14

M. Hong 108 30

X. Xie 136 36

P. Xu 39 5

H. Xu 182 60

W. Yang 263 82

Name Affiliation Publication

Jing Zhang(54/25)

Shanghai Jiao Tong Univ. 6

Dept. of Automation, Tsinghua Univ.

3

Alabama Univ. 8

Univ. of California, Davis 4

Carnegie Mellon University 5

State Univ. of New York at Albany 4

Yi Li(42/22)

National Univ. of Singapore 6

South China Univ. of Technology 2

George Mason Univ. 2

Chinese Academy of Sciences 5

Univ. of Washington 3

Nanjing Normal Univ. 4

Abbreviated Name dataset Real Name dataset

Page 29: Social Network Extraction of Academic Researchers

29

Experiment Setup

• Baseline MethodUnsupervised Hierarchical Clustering Method

• Measurement#PairsCorrectlyPredictedToSameAuthor

_TotalPairsPredictedToSameAuthor

Pairwise Precision

# PairsCorrectlyPredictedToSameAuthorPairwise_Recall

TotalPairsToSameAuthor

2_ 1

Precision RecallPairwise F measure

Precision+Recall

Page 30: Social Network Extraction of Academic Researchers

30

Disambiguation Results

NameUnsupervised Hierarchical

ClusteringConstraint-based Probabilistic

Framework

Precision Recall F1 Precision Recall F1

C. Chang 0.65 0.59 0.62 0.73 0.67 0.70

G. Wu 0.71 0.62 0.66 0.75 0.75 0.75

K. Zhang 0.75 0.60 0.67 0.79 0.71 0.75

J. Li 0.62 0.52 0.57 0.66 0.59 0.62

B. Liang 0.82 0.76 0.79 0.85 0.89 0.87

M. Hong 0.79 0.65 0.71 0.82 0.75 0.78

X. Xie 0.77 0.73 0.75 0.83 0.82 0.82

P. Xu 0.89 0.95 0.92 0.94 1.00 0.97

H. Xu 0.65 0.59 0.62 0.73 0.67 0.70

W. Yang 0.71 0.62 0.66 0.75 0.75 0.75

Average 0.75 0.60 0.67 0.79 0.71 0.75

Page 31: Social Network Extraction of Academic Researchers

31

Contribution of Different Constraint

0. 000 0. 200 0. 400 0. 600 0. 800 1. 000

Al l

CoAuthor

CoEmai l

CoOrg

k-Author

Reference

No-Coauthor

No-CoEmai l

No-CoOrg

No-k-Author

No-Reference

Basel i neCo

nstr

aint

Com

bina

tion

Pai rwi se F1-Measure

Yi LiJ i ng Zhang

Page 32: Social Network Extraction of Academic Researchers

32

How Profiling and Disambiguation Help Expert Finding

• Expert finding by using a PageRank-based method

0

0.2

0.4

0.6

0.8

1

P@5 P@10 P@20 P@30 R-prec MAP bpre MRR

EF EF+RPE EF+RPE+ND

Page 33: Social Network Extraction of Academic Researchers

33

Outline

• Motivation

• Related Work

• Problem Description

• Our Approach

• Experimental Results

• Summary

Page 34: Social Network Extraction of Academic Researchers

34

Summary

• Investigated the problem of researcher social network extraction

• Proposed a unified approach to perform profiling and a constraint-based probabilistic model to name disambiguation

• Experimental results show that our approaches outperform the baseline methods

• When applying it to expert finding, we obtain a significant improvement on performances

Page 35: Social Network Extraction of Academic Researchers

35

Thanks!

Q&AHP: http://keg.cs.tsinghua.edu.cn/persons/tj/

Online Demo: http://arnetminer.org