s-mart: novel tree-based structured learning algorithms ... · ‣ a general class of tree-based...

116
S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking Yi Yang * and Ming-Wei Chang # * Georgia Institute of Technology, Atlanta # Microsoft Research, Redmond

Upload: others

Post on 18-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

S-MART: Novel Tree-based Structured Learning Algorithms

Applied to Tweet Entity Linking

Yi Yang* and Ming-Wei Chang#

*Georgia Institute of Technology, Atlanta #Microsoft Research, Redmond

Page 2: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Traditional NLP Settings

Page 3: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Traditional NLP Settings‣ High dimensional sparse features (e.g., lexical features)‣ Languages are naturally in high dimensional spaces.‣ Powerful! Very expressive.

Page 4: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

‣ Linear models‣ Linear Support Vector Machine‣ Maximize Entropy model

Traditional NLP Settings‣ High dimensional sparse features (e.g., lexical features)‣ Languages are naturally in high dimensional spaces.‣ Powerful! Very expressive.

Page 5: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

‣ Linear models‣ Linear Support Vector Machine‣ Maximize Entropy model

Traditional NLP Settings‣ High dimensional sparse features (e.g., lexical features)‣ Languages are naturally in high dimensional spaces.‣ Powerful! Very expressive.

Sparse features + Linear models

Page 6: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Rise of Dense Features

Page 7: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Rise of Dense Features‣ Low dimensional embedding features

Page 8: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Rise of Dense Features‣ Low dimensional embedding features

‣ Low dimensional statistics features

Named mention statisticsClick-through statistics

Page 9: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Rise of Dense Features‣ Low dimensional embedding features

‣ Low dimensional statistics features

Named mention statisticsClick-through statistics

Dense features + Non-linear models

Page 10: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Non-linear Models

Page 11: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Non-linear Models‣ Neural networks

Page 12: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Non-linear Models‣ Neural networks

# of

Pap

ers

in A

CL’

15

0

8.75

17.5

26.25

35

Neural Kernel Tree

35

Page 13: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Non-linear Models‣ Neural networks

# of

Pap

ers

in A

CL’

15

0

8.75

17.5

26.25

35

Neural Kernel Tree

35

‣ Kernel methods‣ Tree-based models (e.g., Random Forest, Boosted Tree)

Page 14: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Non-linear Models‣ Neural networks

# of

Pap

ers

in A

CL’

15

0

8.75

17.5

26.25

35

Neural Kernel Tree

35

‣ Kernel methods‣ Tree-based models (e.g., Random Forest, Boosted Tree)

15

Page 15: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Tree-based Models

Page 16: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Tree-based Models

‣ Empirical successes‣ Information retrieval [LambdaMART; Burges, 2010] ‣ Computer vision [Babenko et al., 2011]‣ Real world classification [Fernandez-Delgado et al., 2014]

Page 17: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Tree-based Models

‣Why tree-based models?‣ Handle categorical features and count data better.‣ Implicitly perform feature selection.

‣ Empirical successes‣ Information retrieval [LambdaMART; Burges, 2010] ‣ Computer vision [Babenko et al., 2011]‣ Real world classification [Fernandez-Delgado et al., 2014]

Page 18: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Contribution‣ We present S-MART: Structured Multiple Additive Regression

Trees‣ A general class of tree-based structured learning algorithms.‣ A friend of problems with dense features.

Page 19: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Contribution‣ We present S-MART: Structured Multiple Additive Regression

Trees‣ A general class of tree-based structured learning algorithms.‣ A friend of problems with dense features.

‣ We apply S-MART to entity linking on short and noisy texts‣ Entity linking utilizes statistics dense features.

Page 20: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Contribution‣ We present S-MART: Structured Multiple Additive Regression

Trees‣ A general class of tree-based structured learning algorithms.‣ A friend of problems with dense features.

‣ We apply S-MART to entity linking on short and noisy texts‣ Entity linking utilizes statistics dense features.

‣ Experimental results show that S-MART significantly outperforms all alternative baselines.

Page 21: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Outline

‣ S-MART: A family of Tree-based Structured Learning Algorithms

‣ S-MART for Tweet Entity Linking‣ Non-overlapping inference

‣ Experiments

Page 22: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Outline

‣ S-MART: A family of Tree-based Structured Learning Algorithms

‣ S-MART for Tweet Entity Linking‣ Non-overlapping inference

‣ Experiments

Page 23: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Structured Learning

Page 24: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Structured Learning

‣ Model a joint scoring function over an input structure and an output structure

S(x,y)

x

y

Page 25: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Structured Learning

‣ Model a joint scoring function over an input structure and an output structure

S(x,y)

x

y

‣Obtain the prediction requires inference (e.g., dynamic programming)

by = argmax

y2Gen(x)S(x,y)

Page 26: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Structured Multiple Additive Regression Trees (S-MART)

Page 27: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Structured Multiple Additive Regression Trees (S-MART)

‣ Assume a decomposition over factors

Page 28: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Structured Multiple Additive Regression Trees (S-MART)

‣ Assume a decomposition over factors

‣ Optimize with functional gradient descents

Page 29: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Structured Multiple Additive Regression Trees (S-MART)

‣ Assume a decomposition over factors

‣ Optimize with functional gradient descents

‣ Model functional gradients using regression trees hm(x,yk)

F (x,yk) = FM (x,yk) =MX

m=1

⌘mhm(x,yk)

Page 30: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent

Page 31: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent

F (x,yk) = w

>f(x,yk)

‣ Linear combination of parameters and feature functions

Page 32: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent

F (x,yk) = w

>f(x,yk)

wm = wm�1 � ⌘m@L

@wm�1

‣ Linear combination of parameters and feature functions

‣ Gradient descent in vector space

Page 33: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent

F (x,yk) = w

>f(x,yk)

wm = wm�1 � ⌘m@L

@wm�1 w0 w1

w2 w3

‣ Linear combination of parameters and feature functions

‣ Gradient descent in vector space

Page 34: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent in Function Space

Page 35: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent in Function Space

F0(x,yk) = 0

Page 36: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent in Function Space

F0(x,yk) = 0

Fm�1(x,yk)

Page 37: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent in Function Space

F0(x,yk) = 0

Fm�1(x,yk)

gm(x,yk) =

@L(y⇤, S(x,yk))

@F (x,yk)

F (x,yk)=Fm�1(x,yk)

Page 38: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent in Function Space

F0(x,yk) = 0

Fm�1(x,yk)

gm(x,yk) =

@L(y⇤, S(x,yk))

@F (x,yk)

F (x,yk)=Fm�1(x,yk)

�gm(x,yk)

Page 39: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent in Function Space

F0(x,yk) = 0

Fm�1(x,yk)

gm(x,yk) =

@L(y⇤, S(x,yk))

@F (x,yk)

F (x,yk)=Fm�1(x,yk)

�gm(x,yk)

Requiring Inference

Page 40: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Gradient Descent in Function Space

F0(x,yk) = 0

Fm(x,yk) = Fm�1(x,yk)� ⌘mgm(x,yk)

Fm(x,yk)

Page 41: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Model Functional Gradients

�gm(x,yk)

Page 42: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Model Functional Gradients

�gm(x,yk)

Page 43: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Model Functional Gradients‣ Pointwise Functional Gradients

Page 44: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Model Functional Gradients‣ Pointwise Functional Gradients‣ Approximation by regression

�gm(x,yk)

Tree

Page 45: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Model Functional Gradients‣ Pointwise Functional Gradients‣ Approximation by regression

�gm(x,yk)

Is linkprob > 0.5 ?

-0.5Is PER ?

-0.1 Is clickprob > 0.1 ?

-0.3

Tree

Page 46: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

S-MART vs. TreeCRF

F yt(x) F (x,yt)

TreeCRF [Dietterich+, 2004]

S-MART

Structure Linear chain Various structures

Loss function Logistic loss Various losses

Scoring function

Page 47: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

S-MART vs. TreeCRF

F yt(x) F (x,yt)

TreeCRF [Dietterich+, 2004]

S-MART

Structure Linear chain Various structures

Loss function Logistic loss Various losses

Scoring function

Page 48: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

S-MART vs. TreeCRF

F yt(x) F (x,yt)

TreeCRF [Dietterich+, 2004]

S-MART

Structure Linear chain Various structures

Loss function Logistic loss Various losses

Scoring function

Page 49: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

S-MART vs. TreeCRF

F yt(x) F (x,yt)

TreeCRF [Dietterich+, 2004]

S-MART

Structure Linear chain Various structures

Loss function Logistic loss Various losses

Scoring function

Page 50: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

S-MART vs. TreeCRF

F yt(x) F (x,yt)

TreeCRF [Dietterich+, 2004]

S-MART

Structure Linear chain Various structures

Loss function Logistic loss Various losses

Scoring function

Page 51: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Outline

‣ S-MART: A family of Tree-based Structured Learning Algorithms

‣ S-MART for Tweet Entity Linking‣ Non-overlapping inference

‣ Experiments

Page 52: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Entity Linking in Short Texts

Page 53: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Entity Linking in Short Texts

‣ Data explosion: noisy and short texts‣ Twitter messages‣Web queries

Page 54: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Entity Linking in Short Texts

‣ Data explosion: noisy and short texts‣ Twitter messages‣Web queries

‣ Downstream applications‣ Semantic parsing and question answering [Yih et al.,

2015]‣ Relation extraction [Riedel et al., 2013]

Page 55: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Tweet Entity Linking

Page 56: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Tweet Entity Linking

Page 57: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Tweet Entity Linking

Page 58: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Tweet Entity Linking

Page 59: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Entity Linking meets Dense Features

Page 60: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Entity Linking meets Dense Features

‣ Short of labeled data‣ Lack of context makes annotation more challenging.‣ Language changes, annotation may become stale and ill-suited

for new spellings and words. [Yang and Eisenstein, 2013]

Page 61: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Entity Linking meets Dense Features

‣ Short of labeled data‣ Lack of context makes annotation more challenging.‣ Language changes, annotation may become stale and ill-suited

for new spellings and words. [Yang and Eisenstein, 2013]

‣ Powerful statistic dense features [Guo et al., 2013]‣ The probability of a surface form to be an entity‣ View count of a Wikipedia page‣ Textual similarity between a tweet and a Wikipedia page

Page 62: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

System Overview

Candidate Generation

Joint Recognition and

Disambiguation

TokenizedMessage

Entity Linking Results

Page 63: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

System Overview

‣ Structured learning: select the best non-overlapping entity assignment ‣ Choose top 20 entity candidates for each surface form‣ Add a special NIL entity to represent no entity should be fired here

Candidate Generation

Joint Recognition and

Disambiguation

TokenizedMessage

Entity Linking Results

Page 64: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

System Overview

‣ Structured learning: select the best non-overlapping entity assignment ‣ Choose top 20 entity candidates for each surface form‣ Add a special NIL entity to represent no entity should be fired here

Candidate Generation

Joint Recognition and

Disambiguation

TokenizedMessage

Entity Linking Results

Page 65: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

System Overview

‣ Structured learning: select the best non-overlapping entity assignment ‣ Choose top 20 entity candidates for each surface form‣ Add a special NIL entity to represent no entity should be fired here

Candidate Generation

Joint Recognition and

Disambiguation

TokenizedMessage

Eli Manning and the New York Giants are going to win the World Series

Entity Linking Results

Page 66: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

System Overview

‣ Structured learning: select the best non-overlapping entity assignment ‣ Choose top 20 entity candidates for each surface form‣ Add a special NIL entity to represent no entity should be fired here

Candidate Generation

Joint Recognition and

Disambiguation

TokenizedMessage

Eli Manning and the New York Giants are going to win the World Series

NIL

Entity Linking Results

Page 67: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

System Overview

‣ Structured learning: select the best non-overlapping entity assignment ‣ Choose top 20 entity candidates for each surface form‣ Add a special NIL entity to represent no entity should be fired here

Candidate Generation

Joint Recognition and

Disambiguation

TokenizedMessage

Eli Manning and the New York Giants are going to win the World Series

NIL

Entity Linking Results

Page 68: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

System Overview

‣ Structured learning: select the best non-overlapping entity assignment ‣ Choose top 20 entity candidates for each surface form‣ Add a special NIL entity to represent no entity should be fired here

Candidate Generation

Joint Recognition and

Disambiguation

TokenizedMessage

Eli Manning and the New York Giants are going to win the World Series

Eli_Manning

Entity Linking Results

Page 69: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

System Overview

‣ Structured learning: select the best non-overlapping entity assignment ‣ Choose top 20 entity candidates for each surface form‣ Add a special NIL entity to represent no entity should be fired here

Candidate Generation

Joint Recognition and

Disambiguation

TokenizedMessage

Eli Manning and the New York Giants are going to win the World Series

Eli_Manning

Entity Linking Results

New_York_Giants World_Series

Page 70: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

S-MART for Tweet Entity Linking

Page 71: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

S-MART for Tweet Entity Linking

‣ Logistic loss

L(y⇤, S(x,y)) =� logP (y

⇤|x)= logZ(x)� S(x,y⇤

)

Page 72: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

S-MART for Tweet Entity Linking

‣ Logistic loss

‣ Point-wise gradients

L(y⇤, S(x,y)) =� logP (y

⇤|x)= logZ(x)� S(x,y⇤

)

gku =@L

@F (x, yk = uk)

= P (yk = uk|x)� 1[y⇤k = uk]

Page 73: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

S-MART for Tweet Entity Linking

‣ Logistic loss

‣ Point-wise gradients

L(y⇤, S(x,y)) =� logP (y

⇤|x)= logZ(x)� S(x,y⇤

)

gku =@L

@F (x, yk = uk)

= P (yk = uk|x)� 1[y⇤k = uk]

Non-overlapping Inference

Page 74: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Forward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

York

GiantsNew York Giants

WorldWorld Series

Series

win

Page 75: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Forward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

York

GiantsNew York Giants

WorldWorld Series

Series

win

Page 76: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Forward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

York

GiantsNew York Giants

WorldWorld Series

Series

win

Page 77: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Forward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

York

GiantsNew York Giants

WorldWorld Series

Series

win

Page 78: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Forward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

York

GiantsNew York Giants

WorldWorld Series

Series

win

Page 79: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Forward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

York

GiantsNew York Giants

WorldWorld Series

Series

win

Page 80: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Backward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

York

GiantsNew York Giants

WorldWorld Series

Series

win

Page 81: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Backward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

YorkGiants

New York Giants

WorldWorld Series

Series

win

Page 82: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Backward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

YorkGiants

New York Giants

WorldWorld Series

Series

win

Page 83: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Backward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

YorkGiants

New York Giants

WorldWorld Series

Series

win

Page 84: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Inference: Backward Algorithm

Eli Manning and the New York Giants are going to win the World SeriesEliEli Manning

Manning

NewNew York

YorkGiants

New York Giants

WorldWorld Series

Series

win

Page 85: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Outline

‣ S-MART: A family of Tree-based Structured Learning Algorithms

‣ S-MART for Tweet Entity Linking‣ Non-overlapping inference

‣ Experiments

Page 86: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Data‣ Named Entity Extraction & Linking (NEEL) Challenge datasets

[Cano et al., 2014]‣ TACL datasets [Fang & Chang, 2014]

Page 87: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Data‣ Named Entity Extraction & Linking (NEEL) Challenge datasets

[Cano et al., 2014]‣ TACL datasets [Fang & Chang, 2014]

Data #Tweet #Entity Date

NEEL Train 2,340 2,202 Jul. ~ Aug. 11

NEEL Test 1,164 687 Jul. ~ Aug. 11

TACL-IE 500 300 Dec. 12

TACL-IR 980 - Dec. 12

Page 88: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Evaluation Methodology ‣ IE-driven Evaluation [Guo et al., 2013]

‣ Standard evaluation of the system ability on extracting entities from tweets‣ Metric: macro F-score

Page 89: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Evaluation Methodology ‣ IE-driven Evaluation [Guo et al., 2013]

‣ Standard evaluation of the system ability on extracting entities from tweets‣ Metric: macro F-score

‣ IR-driven Evaluation [Fang & Chang, 2014]

‣ Evaluation of the system ability on disambiguation of the target entities in tweets‣ Metric: macro F-score on query entities

Page 90: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

AlgorithmsStructured Non-linear Tree-based

Structured Perceptron

Linear SSVM*

Polynomial SSVM

LambdaRank

MART#

S-MART

# winning system of NEEL challenge 2014

* previous state of the art system

Page 91: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

0.7630.665

73.274.6 75.5

77.4

Page 92: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4

Page 93: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4Linear

Page 94: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4Linear

Page 95: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4Linear Non-linear

Page 96: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4Linear

Page 97: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4Linear

Neural based model

Page 98: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4Linear

Page 99: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4Linear

Kernel based model

Page 100: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4Linear

Page 101: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4

Tree based model

Linear

Page 102: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4

81.1Linear

Page 103: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IE-driven EvaluationN

EEL

Test

F1

65

70

75

80

85

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

70.9

0.7630.665

73.274.6 75.5

77.4

81.1

Tree based structured model

Linear

Page 104: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IR-driven EvaluationTA

CL-

IR F

1

50

55

60

65

70

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

58.0

0.7630.665

62.263.6 63.0

67.4

Page 105: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IR-driven EvaluationTA

CL-

IR F

1

50

55

60

65

70

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

58.0

0.7630.665

62.263.6 63.0

67.4

Page 106: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IR-driven EvaluationTA

CL-

IR F

1

50

55

60

65

70

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

58.0

0.7630.665

62.263.6

56.8

63.0

67.4

Page 107: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IR-driven EvaluationTA

CL-

IR F

1

50

55

60

65

70

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

58.0

0.7630.665

62.263.6

56.8

63.0

67.4

Page 108: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IR-driven EvaluationTA

CL-

IR F

1

50

55

60

65

70

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

58.0

0.7630.665

62.263.6

56.8

63.0

67.4

Page 109: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IR-driven EvaluationTA

CL-

IR F

1

50

55

60

65

70

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

58.0

0.7630.665

62.263.6

56.8

63.0

67.4Linear

Page 110: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

IR-driven EvaluationTA

CL-

IR F

1

50

55

60

65

70

SP Linear SSVM Poly SSVM LambdaRank MART S-MART

58.0

0.7630.665

62.263.6

56.8

63.0

67.4Linear

Non-linear

Page 111: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Conclusion

Page 112: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Conclusion

‣ A novel tree-based structured learning framework S-MART‣ Generalization of TreeCRF

Page 113: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Conclusion

‣ A novel tree-based structured learning framework S-MART‣ Generalization of TreeCRF

‣ A novel inference algorithm for non-overlapping structure of the tweet entity linking task.

Page 114: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Conclusion

‣ A novel tree-based structured learning framework S-MART‣ Generalization of TreeCRF

‣ A novel inference algorithm for non-overlapping structure of the tweet entity linking task.

‣ Application: Knowledge base QA (outstanding paper of ACL’15)‣ Our system is a core component of the QA system.

Page 115: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Conclusion

‣ A novel tree-based structured learning framework S-MART‣ Generalization of TreeCRF

‣ A novel inference algorithm for non-overlapping structure of the tweet entity linking task.

‣ Application: Knowledge base QA (outstanding paper of ACL’15)‣ Our system is a core component of the QA system.

‣ Rise of non-linear models‣ We can try advanced neural based structured algorithms‣ It’s worth to try different non-linear models

Page 116: S-MART: Novel Tree-based Structured Learning Algorithms ... · ‣ A general class of tree-based structured learning algorithms. ‣ A friend of problems with dense features. ‣

Thank you!