ferosa - insights

1

Upload: amrith-krishna

Post on 12-Apr-2017

837 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Ferosa - Insights

FeRoSA F a c e t e d R e c o m m e n d a t i o n

S y s t e m f o r S c i e n t i f i c

A r t i c l e s

Page 2: Ferosa - Insights

Recommendation Engine

Page 3: Ferosa - Insights

Scientific Articles A C L A n t h o l o g y – A c o l l e c t i o n o f 2 0 , 0 0 0

a r t i c l e s i n c o m p u t a t i o n a l l i n g u i s t i c s

Page 4: Ferosa - Insights

Faceted N o t j u s t r e c o m m e n d a t i o n s , b u t h o w t h e y

a r e r e l a t e d

Page 5: Ferosa - Insights

www.ferosa.org L i v e a n d r u n n i n g

Page 6: Ferosa - Insights

•Edge labelling task

b

d l

A b

A

d

l

• Set of Nodes • Links between similar nodes • Label the edges

• Analogy

• Nudge user – suggest why one should buy the combo offered in Flipkart

• Type of social ties in a friendship network

Page 7: Ferosa - Insights

CHALLENGES

Quality

Accessibility

Ranking

Scalable

Q

R

A

S

• High Specificity & Precision • Outperforms current system for

Scientific Articles retrieval by high margin

• Individual ranking per facet • Most relevant entry comes first • Aggregation of ranklists over Content

and Citation network info

• Categorized into 4 facets • Easy to streamline as per need

and filter results

• Random Walks (with restarts) • Independent of domain

Page 8: Ferosa - Insights

Information Overload Even for Relatively closed community like ACL

IR Tools Rather than text based indexing

Varying intentions Streamlined results based on intention, entries may appear, which otherwise may not appear in flat recommendations

Page 9: Ferosa - Insights
Page 10: Ferosa - Insights

Dataset ACL Anthology Collection

Statistics Full Filtered

Number of papers 21,212 9,843

Average number of references (within ACL only)

5.23 6.21

Number of unique authors 17,551 7,892

Number of unique venues 451 280

• Computational Linguistics

• 1961 – 2013

• text data open to public

Page 11: Ferosa - Insights

Form Citation Network

• Identify Citation Contexts and Section heading - parscit

• Section heading to Facet Mapping

• Refinement of facets from prior works

Number of citation contexts extracted

61,051

Number of BG Edges 23,022

Number of AA Edges

10,797

Number of MD Edges

8,828

Number of CM Edges 18,404

AA – Alternative Approaches

BG – Background

CM – Comparison

MD – Method

Page 12: Ferosa - Insights

Induced Subgraphs

• Query Paper • 2 hop citation in either direction • Highly similar papers based on cosine similarity

Nodes

• Edges belonging to a particular facet • 4 different subgraphs for each query paper Edges

Page 13: Ferosa - Insights

Random Walks

• Random walks with restarts • The walker iteratively moves to its neighbourhood with a probability proportional to the

edge weights. • Restart probability c = 0.4, to return to the starting node i. • Teleportation with probability 0.3

Page 14: Ferosa - Insights

Rank Aggregation

Aggregation of ranked lists based on

Content similarity

RWR Values

R package

Optimization problem

Spearman footrule

Page 15: Ferosa - Insights

EXPERIMENTAL RESULTS

• most cosine similar paper comes in 1 hop or 2 hop itself • less edge density as citation increases (due to single edges or few edges) • MD sub-graphs have nodes with high degree • Average path length increases with citation count • clustering coefficient correlates wit edge density • 1-hop nodes contribute more in this measurement.

Page 16: Ferosa - Insights

EVALUATION

FeRoSA

Google Scholar

Microsoft Academic Search

LDA based system (Liang et.al, 2011)

Page 17: Ferosa - Insights

EVALUATION

Page 18: Ferosa - Insights

EVALUATION

• All systems perform better in >2 hop • cosine similarity - FeRoSA works in all sections, while others works marginally better or equivalent to

ferosa only in high or mid • Pr, - FeRoSA in all 3 buckets, others suffer in low citation buckets

Page 19: Ferosa - Insights

Scalable solution

High specificity

Stratification

Flat recommendation

Multi-hop neighbors

Low citation buckets

Page 20: Ferosa - Insights

THANKS