unsupervised and weakly-supervised probabilistic modeling of text
DESCRIPTION
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text. Ivan Titov. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A. Outline. Introduction to the Topic Seminar Plan Requirements and Grading. What do we want to do with text?. - PowerPoint PPT PresentationTRANSCRIPT
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text
Ivan Titov
Outline
2
Introduction to the Topic
Seminar Plan
Requirements and Grading
What do we want to do with text?
One of the ultimate goals of natural language processing is to learn a computer to understand text Text understanding in an open domain is a very
complex problem which you cannot possibly solve using a set of hand-crafted rules
Instead essentially all the modern approaches to natural language processing use statistical techniques
Example of Ambiguites
… Nissan car and truck plant is located in …… divide life into plant and animal kingdom …
… (Article This) (Noun can) (Modal will) (Verb rust ) …
The dog bit the kid. He was taken to a veterinarian | hospital).
Tiger was in Washington for the PGA tour
NLP Tasks
“Full” language understanding is beyond state of the art and cannot be approached as a single task, instead: Practical Applications:
Relation extraction, question answering, text summarization, translation, ….
Prediction of Linguistic Representations: Syntactic parsing, shallow semantic parsing (semantic
role labeling), discourse parsing, …
Supervised Statistical Methods
Annotate texts with (structured) labels and learn a model from this data
Supervised Statistical Methods
More formally: X – text, Y – label (e.g., syntactic structure) Construct a parameterized model P(Y | X, W) Estimate W on a collection {(Xi, Yi)}i=1…N :
Maximum likelihood estimation:
Predict a label for new example X:
W = argmaxWQ
i=1:::N P (Yi jX i ;W)
Y = argmaxY P (Y jX ;W)
Supervised Statistical Models
Most task in NLP are complex and therefore large amounts of data are needed E.g., the standard PennTreebank Wall Street Journal
dataset around 40,000 sentences (2 mln words) Annotation is not just YES or NO, but usually complex
graphs Domain variability: brittle when applied out-of-domain
A question answering model learned on biological data will be bad work on news data
Many languages Need data: for every language, every domain,
every task ?
Not feasible for many tasks and very expensive for others
Not feasible for many tasks and very expensive for others
Unsupervised and Weakly-Supervised Models
Virtually unlimited amount of unlabeled text (e.g., on the Web)
Unsupervised Models Do not use any kind of labeled data Model jointly P(H, X| W), where H represents interest
for the task in question (latent semantic topics, syntactic relations, etc)
Estimation on an unlabeled dataset {Xi}i=1…N : Maximum Likelihood estimation:
W = argmaxWQ
i
PH i
P (H i ;X i jW)
Sum over the variable you do not
observe
Example: Unsupervised Topic Segmentation
[The hotel is located on Maguire street, one block from the river. Public transport in London is straightforward, the tube station is about an 8 minute walk or you can get a bus for £ 1.50. ] [We had a stunning view (from the floor to ceiling window) of the Tower and the Thames.] [One thing we really enjoyed about this place – our huge bath tub with jacuzzi, this is so different from usually small European hotels. Rooms are nicely decorated and very light.] ...
Location
View
Rooms
Useful for: Summarization (summarize multiple reviews along key
aspects) Sentiment prediction (predict star ratings for each aspect) Visualization ....
Semi-Supervised Learning
11
Small amount of labeled data Large amount of unlabeled data Define a joint model P(X,Y | W) Model estimated on both datasets:
Maximum Likelihood estimation
f (X i ;Yi )gi=1:::N L
fX igi=N L +1:::N L +N U
W = argmaxWQ N L
i=1 P (Yi ;X i jW)Q N L +N U
i=N L +1
PY i
P (Yi ;X i jW)
Sum over the unobserved variable on
unlabeled dataset
Weakly-Supervised Learning (Web)
12
Texts are not just isolated sequences of sentences We always have additional information
User-generated annotation
Can we learn how to summarized, segment, understand using this information?
Can we learn how to summarized, segment, understand using this information?
Weakly-Supervised Learning (Web)
13
Texts are not just isolated sequences of sentences We always have additional annotation
Temporal Relations between documents
Can we learn to translate, or port semantic model from one language to another?
Can we learn to translate, or port semantic model from one language to another?
Weakly-Supervised Learning (Web)
14
Texts are not just isolated sequences of sentences We always have additional annotation
User-Generated annotation Temporal Relations between documents Links between documents Clusters of similar documents .......
How useful is it? Can we project annotated resources from language to language? Can we improve unsupervised / supervised models?
Hot topic in NLP recently
Why we will consider probabilistic models?
15
In the class we will focus on (Bayesian) probability models
Why? They provide a concise way to define model and approximation
assumptions They are like LEGO blocks – we can combine different models as
building blocks together to learn a new model for the task Prior knowledge can be integrated in them in a simple and
consistent way Missing data can be easily accounted for (just some over the
corresponding variable) We saw an example in semi-supervised learning
Goals of the seminar
16
Understand the methodology: Classes of models considered in NLP Approximation techniques for learning and inference
(Exact inference will not be tractable for most of the considered problems)
Learn interesting applications of the methods in NLP See that sometimes we can substitute expensive
annotation with a surrogate signal and obtain good results
Plan
17
Next class (April 23): Introduction:
Topic models (PLSA, LDA) Basic learning / inference techniques: EM and Gibbs sampling
Decide on the paper to present On the basis of the survey and the number of registered students, I will
adjust my list and it will be online on Wednesday
Starting from April 30: paper presentations by you
Topics
18
Modelling semantic topics of data collections: Topic segmentation models (including modelling order of topics) Topic hierarchies
Integrating syntax Modeling syntax and topics
Shallow models of semantics Grounded language acquisition
Joint modelling of multiple language Modelling multiple modes:
Gestures and Discourse Learning feature representations from text
Requirements
19
Present a paper to the class We will see how long the presentations should be depending on
the number of students Write 3 critical “reviews” of 3 selected papers (1.5 - 2
pages each) A term paper (12-15 pages) for those getting 7 points
Make sure you are registered to the right “version” in HISPOS! Read papers and participate in discussion
Grades
20
Class participation grade: 60 % You talk and discussion after your talk Your participation in discussion of other talks 3 reviews (5 % each)
Term paper grade: 40 % Only if you get 7 points, otherwise you do not need one Term paper
Presentation
21
Present a paper in an accessible way Have a critical view on the paper: discuss shortcomings,
possible future work, etc To give a good presentation in most of the cases you
may need to read one or two additional papers (e.g., those referenced in the paper)
Links to the tutorials on how to make a good presentation will be available on the class web-page
Send me your slide 4 days before the talk by 6 pm If we keep the class on Friday, it means that the deadline on
Mon by 6 pm I will give my feedback within 2 days of receiving
Presentation
22
Present a paper in an accessible way Have a critical view on the paper: discuss shortcomings,
possible future work, etc To give a good presentation in most of the cases you may
need to read one or two additional papers (e.g., those referenced in the paper)
Links to the tutorials on how to make a good presentation will be available on the class web-page
Send me your slide 4 days before the talk by 6 pm If we keep the class on Friday, it means that the deadline is Mon, 6
pm I will give my feedback within 2 days of receiving
(The first 2 presenters can send me slides 2 days before if they prefer)
Term paper
23
Goal Describe the paper you presented in class Your ideas, analysis, comparison (more later) It should be written in a style of a research paper, the only
difference is that in this paper most of the work you present is not your own
Length: 12 – 15 pages
Grading criteria Clarity Paper organization Technical correctness New ideas are meaningful and interesting
Submitted in PDF to my email
Critical review
24
A short critical (!) essay reviewing one of the paper presented in class One or two paragraphs presenting the essence of the paper Other parts underlying both positive sides of the paper (what you like) and
its shortcomings
The review should be submitted before its presentation in class (Exception is the additional reviews submitted for the
seminars you skipped, later about it) No copy-paste from the paper
Length: 1.5 – 2 pages
Your ideas / analysis
25
Comparison of the methods used in the paper with other material presented in the class or any other related work
Any ideas on improvement of the approach ....
Attendance policy
26
You can skip ONE class without any explanation Otherwise, you will need to write an additional critical
review (for the paper which was presented while you were absent)
Office Hours
27
I would be happy to see you and discuss after the talk from 16:00 – 17:00 on Fridays (may change if the seminar timing changes): Office 3.22, C 7.4
Otherwise, send me email and I find the time
Other stuff
28
Timing of the class Survey (Doodle poll?) Select a paper to present and papers to review by
the next class (we will use Google docs)