unsupervised and weakly-supervised probabilistic modeling of text

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text

Ivan Titov

Outline

2

Introduction to the Topic

Seminar Plan

Requirements and Grading

What do we want to do with text?

One of the ultimate goals of natural language processing is to learn a computer to understand text Text understanding in an open domain is a very

complex problem which you cannot possibly solve using a set of hand-crafted rules

Instead essentially all the modern approaches to natural language processing use statistical techniques

Example of Ambiguites

… Nissan car and truck plant is located in …… divide life into plant and animal kingdom …

… (Article This) (Noun can) (Modal will) (Verb rust ) …

The dog bit the kid. He was taken to a veterinarian | hospital).

Tiger was in Washington for the PGA tour

NLP Tasks

“Full” language understanding is beyond state of the art and cannot be approached as a single task, instead: Practical Applications:

Relation extraction, question answering, text summarization, translation, ….

Prediction of Linguistic Representations: Syntactic parsing, shallow semantic parsing (semantic

role labeling), discourse parsing, …

Supervised Statistical Methods

Annotate texts with (structured) labels and learn a model from this data

Supervised Statistical Methods

More formally: X – text, Y – label (e.g., syntactic structure) Construct a parameterized model P(Y | X, W) Estimate W on a collection {(Xi, Yi)}i=1…N :

Maximum likelihood estimation:

Predict a label for new example X:

W = argmaxWQ

i=1:::N P (Yi jX i ;W)

Y = argmaxY P (Y jX ;W)

Supervised Statistical Models

Most task in NLP are complex and therefore large amounts of data are needed E.g., the standard PennTreebank Wall Street Journal

dataset around 40,000 sentences (2 mln words) Annotation is not just YES or NO, but usually complex

graphs Domain variability: brittle when applied out-of-domain

A question answering model learned on biological data will be bad work on news data

Many languages Need data: for every language, every domain,

every task ?

Not feasible for many tasks and very expensive for others

Not feasible for many tasks and very expensive for others

Unsupervised and Weakly-Supervised Models

Virtually unlimited amount of unlabeled text (e.g., on the Web)

Unsupervised Models Do not use any kind of labeled data Model jointly P(H, X| W), where H represents interest

for the task in question (latent semantic topics, syntactic relations, etc)

Estimation on an unlabeled dataset {Xi}i=1…N : Maximum Likelihood estimation:

W = argmaxWQ

i

PH i

P (H i ;X i jW)

Sum over the variable you do not

observe

Example: Unsupervised Topic Segmentation

[The hotel is located on Maguire street, one block from the river. Public transport in London is straightforward, the tube station is about an 8 minute walk or you can get a bus for £ 1.50. ] [We had a stunning view (from the floor to ceiling window) of the Tower and the Thames.] [One thing we really enjoyed about this place – our huge bath tub with jacuzzi, this is so different from usually small European hotels. Rooms are nicely decorated and very light.] ...

Location

View

Rooms

Useful for: Summarization (summarize multiple reviews along key

aspects) Sentiment prediction (predict star ratings for each aspect) Visualization ....

Semi-Supervised Learning

11

Small amount of labeled data Large amount of unlabeled data Define a joint model P(X,Y | W) Model estimated on both datasets:

Maximum Likelihood estimation

f (X i ;Yi )gi=1:::N L

fX igi=N L +1:::N L +N U

W = argmaxWQ N L

i=1 P (Yi ;X i jW)Q N L +N U

i=N L +1

PY i

P (Yi ;X i jW)

Sum over the unobserved variable on

unlabeled dataset

Weakly-Supervised Learning (Web)

12

Texts are not just isolated sequences of sentences We always have additional information

User-generated annotation

Can we learn how to summarized, segment, understand using this information?

Can we learn how to summarized, segment, understand using this information?


13

Texts are not just isolated sequences of sentences We always have additional annotation

Temporal Relations between documents

Can we learn to translate, or port semantic model from one language to another?

Can we learn to translate, or port semantic model from one language to another?


14

Texts are not just isolated sequences of sentences We always have additional annotation

User-Generated annotation Temporal Relations between documents Links between documents Clusters of similar documents .......

How useful is it? Can we project annotated resources from language to language? Can we improve unsupervised / supervised models?

Hot topic in NLP recently

Why we will consider probabilistic models?

15

In the class we will focus on (Bayesian) probability models

Why? They provide a concise way to define model and approximation

assumptions They are like LEGO blocks – we can combine different models as

building blocks together to learn a new model for the task Prior knowledge can be integrated in them in a simple and

consistent way Missing data can be easily accounted for (just some over the

corresponding variable) We saw an example in semi-supervised learning

Goals of the seminar

16

Understand the methodology: Classes of models considered in NLP Approximation techniques for learning and inference

(Exact inference will not be tractable for most of the considered problems)

Learn interesting applications of the methods in NLP See that sometimes we can substitute expensive

annotation with a surrogate signal and obtain good results

Plan

17

Next class (April 23): Introduction:

Topic models (PLSA, LDA) Basic learning / inference techniques: EM and Gibbs sampling

Decide on the paper to present On the basis of the survey and the number of registered students, I will

adjust my list and it will be online on Wednesday

Starting from April 30: paper presentations by you

Topics

18

Modelling semantic topics of data collections: Topic segmentation models (including modelling order of topics) Topic hierarchies

Integrating syntax Modeling syntax and topics

Shallow models of semantics Grounded language acquisition

Joint modelling of multiple language Modelling multiple modes:

Gestures and Discourse Learning feature representations from text

Requirements

19

Present a paper to the class We will see how long the presentations should be depending on

the number of students Write 3 critical “reviews” of 3 selected papers (1.5 - 2

pages each) A term paper (12-15 pages) for those getting 7 points

Make sure you are registered to the right “version” in HISPOS! Read papers and participate in discussion

Grades

20

Class participation grade: 60 % You talk and discussion after your talk Your participation in discussion of other talks 3 reviews (5 % each)

Term paper grade: 40 % Only if you get 7 points, otherwise you do not need one Term paper

Presentation

21

Present a paper in an accessible way Have a critical view on the paper: discuss shortcomings,

possible future work, etc To give a good presentation in most of the cases you

may need to read one or two additional papers (e.g., those referenced in the paper)

Links to the tutorials on how to make a good presentation will be available on the class web-page

Send me your slide 4 days before the talk by 6 pm If we keep the class on Friday, it means that the deadline on

Mon by 6 pm I will give my feedback within 2 days of receiving

Presentation

22

Present a paper in an accessible way Have a critical view on the paper: discuss shortcomings,

possible future work, etc To give a good presentation in most of the cases you may

need to read one or two additional papers (e.g., those referenced in the paper)

Links to the tutorials on how to make a good presentation will be available on the class web-page

Send me your slide 4 days before the talk by 6 pm If we keep the class on Friday, it means that the deadline is Mon, 6

pm I will give my feedback within 2 days of receiving

(The first 2 presenters can send me slides 2 days before if they prefer)

Term paper

23

Goal Describe the paper you presented in class Your ideas, analysis, comparison (more later) It should be written in a style of a research paper, the only

difference is that in this paper most of the work you present is not your own

Length: 12 – 15 pages

Grading criteria Clarity Paper organization Technical correctness New ideas are meaningful and interesting

Submitted in PDF to my email

Critical review

24

A short critical (!) essay reviewing one of the paper presented in class One or two paragraphs presenting the essence of the paper Other parts underlying both positive sides of the paper (what you like) and

its shortcomings

The review should be submitted before its presentation in class (Exception is the additional reviews submitted for the

seminars you skipped, later about it) No copy-paste from the paper

Length: 1.5 – 2 pages

Your ideas / analysis

25

Comparison of the methods used in the paper with other material presented in the class or any other related work

Any ideas on improvement of the approach ....

Attendance policy

26

You can skip ONE class without any explanation Otherwise, you will need to write an additional critical

review (for the paper which was presented while you were absent)

Office Hours

27

I would be happy to see you and discuss after the talk from 16:00 – 17:00 on Fridays (may change if the seminar timing changes): Office 3.22, C 7.4

Otherwise, send me email and I find the time

Other stuff

28

Timing of the class Survey (Doodle poll?) Select a paper to present and papers to review by

the next class (we will use Google docs)

unsupervised and weakly-supervised probabilistic modeling of text

Documents

x text

text text understanding

unlabeled text

text summarization

syntactic parsing

maximum likelihood estimation

labeled data large

parameterized model