semantic textual similarity & more on alignment · semantic textual similarity & more on...

Semantic Textual Similarity

& more on Alignment

CMSC 723 / LING 723 / INST 725

MARINE CARPUAT

[email protected]

mailto:[email protected]

2 topics today

• P3 task: Semantic Textual Similarity

– Including Monolingual alignment

• Beyond IBM word alignment

– Synchronous CFGs

Semantic Textual Similarity

Series of tasks at international workshop on

semantic evaluations (SemEval), since 2012

http://alt.qcri.org/semeval2017/task1/

http://alt.qcri.org/semeval2017/task1/

What is Semantic Textual Similarity?

Semantic Similarity

جدالكجد يدجياجد يجدي يج جي

وغو يحيح يحسيفحس يحيحفي

سف ي جي جيييدج كجساكجاس

كححسح حيحي . حفجحسوجح ج

حوحوس دح حدي يجدي يو جي

جيحجفححكسحجسكحك

حفحسوحوشيحيدويويد وي

يوسحفوفوفوطبس تعالى ومالكش

دعوه، هتبنبسط اخر انبساط

Hnh whdun duuhj js ijd dj iow oijd oidj dk

uwhd8 yh djhdhwuih jhu h uh jhihk, jdhhii,

gdytysla, yuiyduinsjsh, iodpisomkncijsi.

Kjhhuduh, dhdhhd hhduhd

jjhuiq…Welcome to my world, trust me you

will never be disappointed djijdp idiowdiw

I iwfiow ifiwoufowi ioiowruo iyfi I wioiwf oid

oi iwoiwy iowuouwr ujjd hihi iohoihiof uouo

ou o oufois f uhdiy oioi oo ouiosufoisuf

iouiouf paidp paudoi uiu fh uhhioiof

Shjkahsiunu iuhndhau dhdkhn hdhaud8

kdhikahdi dhjhd dhjh jiidh iihiiohio hihiahdiod Yo!

Come over here, you will be pleasantly

surprised idoasd io idjioio jidjduio iodio oi iiouio

oiudoi ifuiosu fiuoi oiuiou oi io hiyuify 8iy ih iouoiu

ou o ooihyiush iuh fhdfosiip upouosu oiu oi o

oisyoisy oi sih oiiou ios oisuois uois oudiosu doi

soiddu os oso iio oioisosuo.

Добро пожаловать в

мой мир, поверьте мне вы никогда не будете

разочарованы

안녕하세요제가당신에게전화했지만아무

소용이있을려고 ... 당신이시간을즐기고있었다희망

Quantitative Graded Similarity Score

Confidence Score

Principled Interpretability, which semantic

components/features led to results (hopefully will lead

to us gaining a better understanding of semantics)

Why Semantic Textual Similarity?

• Most NLP applications need some notion of semantic

similarity to overcome brittleness and sparseness

• Provides evaluation beyond surface text processing

• A hub for semantic processing as a black box in

applications beyond NLP

• Lends itself to an extrinsic evaluation of scattered

semantic components

What is STS?

• The graded process by which two snippets of text

(t1 and t2) are deemed equivalent semantically, i.e.

bear the same meaning

• An STS system will quantifiably inform us on how

similar t1 and t2 are, resulting in a similarity score

• An STS system will tell us why t1 and t2 are similar

giving a nuanced interpretation of similarity based

on semantic components’ contributions

What is STS?

• Word similarity has been relatively well studied

– For example according to WN

cord smile 0.02

rooster voyage 0.04

noon string 0.04

fruit furnace 0.05

...

hill woodland 1.48

car journey 1.55

cemetery mound 1.69

...

cemetery graveyard 3.88

automobile car 3.92

More

similar

What is STS?

• Fewer datasets for similarity between

sentences

A forest is a large area where trees grow close

together.

VS.

The coast is an area of land that is next to the sea.

[0.25]

What is STS?


sentences

A forest is a large area where trees grow close

together.

VS.

Woodland is land with a lot of trees.

[2.51]

What is STS?


sentences

Once there was a Czar who had three lovely

daughters.

VS.

There were three beautiful girls, whose father was a

Czar.

[4.3]

Related tasks

• Paraphrase detection

– Are 2 sentences equivalent in meaning?

• Textual Entailment

– Does premise P entail hypothesis H?

• STS provides graded similarity

judgments

Annotation: crowd-sourcing

Annotation: crowd-sourcing

• English annotation process

– Pairs annotated in batches of 20

– Annotators paid $1 per batch

– 5 annotations per pair

– Workers need to have Mturk master qualification

• Defining gold standard judgments

– Median value of annotations

– After filtering low quality annotators (<0.80

correlation with leave-on-out gold & <0.20 Kappa)

Diverse data sources

Evaluation: a shared task

Subset of 2016 results

(Score: Pearson correlation)

STS models

from word to sentence vectors

• Can we perform STS by comparing sentence vector

representation?

• This approach works well for word level similarity

• But can we capture the meaning of a sentence in a single

vector?

“Composing” by averaging

g(“shots fired at residence”)

=

1

4+ + +

shots fired at residence

[Tai et al. 2015, Wieting et al. 2016]

How can we induce word vectors

for composition?

𝒙𝟏 𝒙𝟐

English paraphrases [Wieting et al. 2016]

By our fellow members

By our colleagues

Bilingual sentence pairs [Hermann & Blunsom 2014]

Thus in fact …by our fellow

members

As que podramos … nuestracolega disputado

Bilingual phrase pairs by our fellow member de nuestra colega

STS models:

monolingual alignment

One (of many) approaches to

monolingual entailment

Idea

• Exploit not only similarity

between words

• But also similarity

between their contexts

See Sultan et al. 2013

https://github.com/ma-

sultan/

2 topics today

• P3 task: Semantic Textual Similarity

– Including Monolingual alignment

• Beyond IBM word alignment

– Synchronous CFGs

Aligning words & constituents

• Alignment: mapping between spans of text in

lang1 and spans of text in lang2

– Sentences in document pairs

– Words in sentence pairs

– Syntactic constituents in sentence pairs

• Today: 2 methods for aligning constituents

– Parse and match

– biparse

Parse

&

Match

Parse(-Parse)-Match

• Idea

– Align spans that are consistent with existing

structure

• Pros

– Builds on existing NLP tools

• Cons

– Assume availability of lots of resources

– Assume that representations can be matched


2 methods for aligning constituents:

• Parse and match

– assume existing parses and alignment

• Biparse

– alignment = structure

A “straw man” hypothesis:

All languages have same grammar

The biparsing hypothesis:

All languages have nearly the same grammar

Example for the biparsing hypothesis:


Dekai Wu and Pascale Fung, IJCNLP-2005

HKUST Human Language Technology Center





The biparsing hypothesis :






VP [ VV PP ]

VP VV PP

ITG shorthand

VP VV PP , VV PP

VP VV PP , PP VV

SDTG/SCFG notation

VP VV(1) PP(2) , VV(1) PP(2)

VP VV(1) PP(2) , PP(2) VV(1)

Indexed SDTG/SCFG notationVP VV PP ; 1 2

VP VV PP ; 2 1

Permuted SDTG/SCFG

Synchronous

Context Free Grammars

• Context free grammars (CFG)

– Common way of representing syntax in (monolingual)

NLP

• Synchronous context free grammars (SCFG)

– Generate pairs of strings

– Align sentences by parsing them

– Translate sentences by parsing them

• Key algorithm: how to parse with SCFGs?

SCFG trade off

• Expressiveness

– SCFGs cannot represent all sentence

pairs in all languages

• Efficiency

– SCFGs let us view alignment as parsing

& benefit from well-studied formalism

Synchronous parsing cannot

represent all sentence pairs

A subclass of SCFGs:

Inversion Transduction Grammars

• ITGs are the subclass of SDTGs/SCFGs:

– with only straight and inverted transduction rules

– with only transduction rules of rank < 2

– with only transduction rules of rank < 3

• ITGs are context-free (like SCFGs).

equivalent

For length-4 phrases (or frames),

ITGs can express 22 out of 24 permutations!

ITGs enable efficient DP algorithms[Wu 1995]

e0 e1 e2 e3 e4 e5 e6 e7

c0 c1 c2 c3 c4 c5 c6

Biparsing with CKY

• Given the following SCFGA -> fat, gordos

A -> thin, delgados

N -> cats, gatosVP -> eats, comen

NP -> A(1)N(2),N(2)A(1)

S -> NP(1)VP(2), NP(1)VP(2)

• Let’s parse a sentence pairfat cats eat gatos gordos comen

Example by Matt Post (JHU)

Biparsing with CKY

A -> fat, gordos

A -> thin, delgados


NP -> A(1)N(2),N(2)A(1)

S -> NP(1)VP(2), NP(1)VP(2)

3 comen

2 gordos

1 gatos

fat cats eats

1 2 3

Chart now enumerates pairs of spans

Biparsing with CKY

A -> fat, gordos

A -> thin, delgados


NP -> A(1)N(2),N(2)A(1)

S -> NP(1)VP(2), NP(1)VP(2)

3 comen

2 gordos

1 gatos

fat cats eats

1 2 3

A((1,1),(2,2))

N((2,2),(1,1))

VP((3,3),(3,3))

Apply lexical rules

Biparsing with CKY

A -> fat, gordos

A -> thin, delgados


NP -> A(1)N(2),N(2)A(1)

S -> NP(1)VP(2), NP(1)VP(2)

3 comen

2 gordos

1 gatos

fat cats eats

1 2 3

A((1,1),(2,2))

N((2,2),(1,1))

NP((1,2),(1,2))

VP((3,3),(3,3))

For each block,apply straight & inverted rules

S((3,3),(3,3))

Biparsing with CKY

3 comen

2 gordos

1 gatos

fat cats eats

1 2 3

O(GN3M3)


2 different ways of looking at this problem:

• parse-parse-match

– assume existing parses and alignment

• biparse

– alignment = structure

semantic textual similarity & more on alignment · semantic textual similarity & more on...

Documents