in4080 natural language processing · can use various learners, e.g. logistic regression (maxent)...

IN4080 – 2019 FALLNATURAL LANGUAGE PROCESSING

Jan Tore Lønning

1

Lecture 8, 8 Oct

Information extraction, pipelines

2

Today

Sentence structure:

Constituents and phrases

Treebanks

Information extraction, IE

Chunking

Named entity recognition

Relation extraction, 5 different ways

Pipelines

3

Sentences have inner structure

Sentence: a sequence of words

Properties of words:

morphology, tags, embeddings

Probabilities of sequences

Flat

Sentences have inner structure

The structure determines

whether the sentence is

grammatical or not

The structure determines how to

understand the sentence

So far But

4

Why syntax?

Some sequences of words are

well-formed meaningful

sentences.

Others are not:

Are meaningful of some sentences

sequences well-formed words

It makes a difference:

A dog bit the man.

The man bit a dog.

BOW-models don't capture this

difference

5

Two ways to describe sentence structure6

Phrase structure Dependency structure

Focus of INF2820 Focus of IN2110


Constituent: A group of word which functions as a unit in the sentence

See Wikipedia: Constituent for criteria of constituency

Phrase: A sequence of words which "belong together"

= constituent (for us)

In some theories a phrase is a constituent of more than one word

7 NP

Mary

The small, cute dog

The dog from Baskerville

You

V

ate

saw

enjoied

NP

the apple

the small, cute dog

the apple that Kim had stolen from the store

it

VP

https://en.wikipedia.org/wiki/Constituent_(linguistics)

Phrases

Phrases can be classified into categories:

Noun Phrases, Verb Phrases, Prepositional Phrases, etc.

Phrases of the same category have similar distribution,

e.g. NPs can replace names

(but there are restrictions on case, number, person, gender agreement, etc.)

Phrases of the same category have similar structure, simplified:

NP (roughly): (DET) ADJ* N PP* (+ some alternatives, e.g. pronoun)

PP: PREP NP

8

Phrase structure

A sentence is hierarchically

ordered into phrases

Various syntactic theories and

models and NLP tools depart

with respect to the actual trees:

Models based on X-bar theory

prefer "deep threes": binary

branching

Penn treebank prefers shallow

trees

9

A Penn treebank tree10

Today

Sentence structure:


Treebanks


Chunking



Pipelines

11

Treebanks

A collection of analyzed sentences/trees

Penn treebank is best known

12

13

Treebanks

Treebanks are corpora in which each sentence has been paired with a parse tree (presumably the right one).

These are generally created

By first parsing the collection with an automatic parser

And then having human annotators correct each parse as necessary.

This requires detailed annotation guidelines that provide a POS tagset, a grammar and instructions for how to deal with particular grammatical constructions.

Treebank Grammars

Treebanks implicitly define a grammar for the language covered in the treebank.

Such grammars tend to be very flat due to the fact that they tend to avoid recursion.

To ease the annotators burden

For example, the Penn Treebank has 4500 different rules for VPs. Among them...

Speech and Language Processing - Jurafsky and Martin

14

Different types of treebanks

Hand-made

Human annotators assign trees.

The trees define a grammar:

Many rules

Penn uses flat trees

Parse bank

Start with a grammar

And a parser

Parse the sentences

A human annotator selects the best analysis between the candidates

May be used for training a parse ranker

October 7, 2019

15

Treebanks

There are available free dependency treebanks for many languages

The place to start in these days: http://universaldependencies.org/

CONLL-formats:

One word per line, a number of columns for various information

CONLL-X, CONLL-U – different POSTAGs

16

fromAndrei's INF5830 slides

http://universaldependencies.org/

Today

Sentence structure:


Treebanks


Chunking



Pipelines

17

IE basics

Bottom-Up approach

Start with unrestricted texts, and do the best you can

The approach was in particular developed by the Message Understanding Conferences (MUC) in the 1990s

Select a particular domain and task

18

Information extraction (IE) is the task of

automatically extracting structured information

from unstructured and/or semi-structured

machine-readable documents. (Wikipedia)

Steps19

(Some appro-

aches do these

steps in a

different order

– or

simultaneously)From NLTK

Some example systems20

Stanford core nlp: http://corenlp.run/

SpaCy (Python): https://spacy.io/docs/api/

OpenNLP (Java): https://opennlp.apache.org/docs/

GATE (Java): https://gate.ac.uk/

UDPipe: http://ufal.mff.cuni.cz/udpipe

Online demo: http://lindat.mff.cuni.cz/services/udpipe/

http://corenlp.run/

https://spacy.io/docs/api/

https://opennlp.apache.org/docs/

https://gate.ac.uk/

http://ufal.mff.cuni.cz/udpipe

http://lindat.mff.cuni.cz/services/udpipe/

Today

Dependency parsing:

Wrap-up

Evaluation

Treebanks and pipelines


Chunking



21

Next steps

Chunk together words to phrases

22

NP-chunks

Exactly what is an NP-chunk?

It is an NP

But not all NPs are chunks

Flat structure: no NP-chunk is part of another NP chunk

Maximally large

Opposing restrictions

23

[ The/DT market/NN ] for/IN

[ system-management/NN software/NN ] for/IN

[ Digital/NNP ]

[ 's/POS hardware/NN ] is/VBZ fragmented/JJ enough/RB that/IN

[ a/DT giant/NN ] such/JJ as/IN

[ Computer/NNP Associates/NNPS ] should/MD do/VB well/RB there/RB ./.

Regular Expression Chunker

Input POS-tagged sentences

Use a regular expression over POS to identify NP-chunks

NLTK example:

It inserts parentheses

24

grammar = r"""NP: {<DT|PP\$>?<JJ>*<NN>}

{<NNP>+} """

http://www.nltk.org/book/ch07.html

IOB-tags

B-NP: First word in NP

I-NP: Part of NP, not first word

O: Not part of NP (phrase)

Properties

One tag per token

Unambiguous

Does not insert anything in the

text itself

25

Assigning IOB-tags

The process can be considered a form for tagging

POS-tagging: Word to POS-tag

IOB-tagging: POS-tag to IOB-tag

But one may in addition use additional features, e.g. words

Can use various types of classifiers

NLTK uses a MaxEnt Classifier (=LogReg, but the implementation is slow)

We can modify along the lines of mandatory assignment 2, using scikit-learn

26

27

J&M, 3. ed.

Evaluating (IOB-)chunkers

cp = nltk.RegexpParser("")

test_sents = conll ('test', chunks=['NP'])

IOB Accuracy: 43.4%

Precision: 0.0%

Recall: 0.0%

F-Measure: 0.0%

What do we evaluate?

IOB-tags? or

Whole chunks?

Yields different results

For IOB-tags:

Baseline: majority class O,

yields > 33%

Whole chunks:

Which chunks did we find?

Harder

Lower numbers

28

Evaluating (IOB-)chunkers

cp = nltk.RegexpParser("")

test_sents = conll ('test',

chunks=['NP'])

IOB Accuracy: 43.4%

Precision: 0.0%

Recall: 0.0%

F-Measure: 0.0%

>> cp = nltk.RegexpParser(

r"NP: {<[CDJNP].*>+}")

IOB Accuracy: 87.7%

Precision: 70.6%

Recall: 67.8%

F-Measure: 69.2%

29

Today

Sentence structure:


Treebanks


Chunking



Pipelines

30

Named entities31

Named entity:

Anything you can refer to by a proper name

i.e. not all NP (chunks):

high fuel prices

Maybe longer NP than just chunk:

Bank of America

Find the phrases

Classify them

Citing high fuel prices, [ORG United Airlines]

said [TIME Friday] it has increased fares by

[MONEY $6] per round trip on flights to

some cities also served by lower-cost

carriers. [ORG American Airlines], a unit of

[ORG AMR Corp.], immediately matched the

move, spokesman [PER Tim Wagner] said.

[ORG United], a unit of [ORG UAL Corp.],

said the increase took effect [TIME Thursday]

and applies to most routes where it

competes against discount carriers, such as

[LOC Chicago] to [LOC Dallas] and [LOC

Denver] to [LOC San Francisco].

Types of NE

The set of types vary between different systems

Which classes are useful depend on application

32

Ambiguities33

Gazetteer

Useful: List of names,

e.g.

Gazetteer: list of

geographical names

But does not remove all

ambiguities

cf. example

34

Representation (IOB)35

Feature-based NER

Similar to tagging and chunking

You will need features from several layers

Features may include

Words, POS-tags, Chunk-tags, Graphical prop.

and more (See J&M, 3.ed)

36

Feature-based NER algorithms37

Greedy decoding

"Word-by word", decide for the first word, then for the second word, etc.

Can use various learners, e.g. Logistic regression (MaxEnt)

We can use our set-up for mandatory 2 with smaller adjustments

For shortcomings and better alternatives, c.f. J&M, 3. ed, ch.8:

Maximum Entropy Markov Models (MEMM)

Conditional random fields (Preferred approach until recently

Neural NER

The last years: neural architectures show the best results

J&M, 3. ed., ch. 17, sec. 17.1.3, not curriculum in IN4080

IN5550

38

Evaluation

Have we found the correct NERs?

Evaluate precision and recall as for chunking

For the correctly identified NERs, have we labelled them correctly?

39

Today

Sentence structure:


Treebanks


Chunking



Pipelines

40

Goal

Extract the relations that exist

between the (named) entities in the

text

A fixed set of relations (normally)

Determined by application:

Jeopardy

Preventing terrorist attacks

Detecting illness from medical record

…

41

• Born_in

• Date_of_birth

• Parent_of

• Author_of

• Winner_of

• Part_of

• Located_in

• Acquire

• Threaten

• Has_symptom

• Has_illness

Examples42

Methods for relation extraction43

1. Hand-written patterns

2. Machine Learning (Supervised classifiers)

3. Semi-supervised classifiers via bootstrapping

4. Semi-supervised classifiers via distant supervision

5. Unsupervised


Example: acquisitions

[ORG]…( buy(s)|

bought|

aquire(s|d) )…[ORG]

Hand-write patterns like this

Properties:

High precision

Will only cover a small set of

patterns

Low recall

Time consuming

(Also in NLTK, sec 7.6)

44

Example45






5. Unsupervised

2. Supervised classifiers47

A corpus

A fixed set of entities and relations

The sentences in the corpus are hand-annotated:

Entities

Relations between them

Split the corpus into parts for training and testing

Train a classifier:

Choose learner: Naive Bayes, Logistic regression (Max Ent), SVM, …

Select features

2. Supervised classifiers, contd.48

Training:

Use pairs of entities within the same sentence with no relation between them

as negative data

Classification

1. Find the NERs

2. For each pair of NERs determine whether there is a relation between them

3. If there is, label the relation

Examples of features49

American

Airlines, a unit

of AMR,

immediately

matched the

move,

spokesman Tim

Wagner said

Properties50

The bottleneck is the availability of training data

To hand label data is time consuming

Mostly applied to restricted domains

Does not generalize well to other domains






5. Unsupervised

3. Semisupervised, bootstrapping

If we know a pattern for a relation, we can determine whether a pair stands in the relation

Conversely: If we know that a pair stands in a relationship, we can find patterns that describe the relation

52

Pairs:

IBM – AlchemyAPI

Google – YouTube

Facebook - WhatsApp

Patterns:

[ORG]…bought…[ORG]

Relation

ACQUIRE

Example53

(IBM, AlchemyAPI): ACQUIRE

Search for sentences containing IBM and AlchemyAPI

Results (Web-search, Google, btw. first 10 results):

IBM's Watson makes intelligent acquisition of Denver-based AlchemyAPI(Denver Post)

IBM is buying machine-learning systems maker AlchemyAPI Inc. to bolster its Watson technology as competition heats up in the data analytics and artificial intelligence fields. (Bloomberg)

IBM has acquired computing services provider AlchemyAPI to broaden its portfolio of Watson-branded cognitive computing services. (ComputerWorld)

Example contd.54

Extract patterns

IBM's Watson makes intelligent acquisition of Denver-based AlchemyAPI

(Denver Post)

IBM is buying machine-learning systems maker AlchemyAPI Inc. to bolster its

Watson technology as competition heats up in the data analytics and artificial

intelligence fields. (Bloomberg)

IBM has acquired computing services provider AlchemyAPI to broaden its

portfolio of Watson-branded cognitive computing services. (ComputerWorld)

Procedure

From the extracted sentences,

we extract patterns

Use these patterns to extract

more pairs of entities that stand

in these patterns

These pairs may again be used

for extracting more patterns,

etc.

…makes intelligent acquisition …

… is buying …

… has acquired …

55

Bootstrapping56

A little more57

We could

either extract pattern templates and searching for these

or features for classification and build a classifier

If we use patterns we should generalize

makes intelligent acquisition (make(s)|made) JJ* acquisition

During the process we should evaluate before we extend:

Does the new pattern recognize other pairs we know stand in the relation?

Does the new pattern return pairs that are not in the relation? (Precision)






5. Unsupervised

4. Distant supervision for RE

Combine:

A large external knowledge base, e.g. Wikipedia, Word-net

Large amounts of unlabeled text

Extract tuples that stand in known relation from knowledge base:

Many tuples

Follow the bootstrapping technique on the text

59

4. Distant supervision for RE

Properties:

Large data sets allow for

fine-grained features

combinations of features

Evaluation

Requirement

Large knowledge-base

60






5. Unsupervised

5. Unsupervised relation extraction

Open IE

Example:

1. Tag and chunk

2. Find all word sequences

satisfying certain syntactic constraints,

in particular containing a verb

These are taken to be the relations

3. For each such, find the immediate non-vacuous NP to the left and to the right

4. Assign a confidence score

United has a hub in Chicago, which is the headquarters of United Continental Holdings.

r1: <United, has a hub in, Chicago>

r2: <Chicago, is the headquarters of, United Continental Holdings>

62

Evaluating relation extraction

Supervised methods can be

evaluated on each of the

examples in a test set.

For the semi-supervised

method:

we don’t have a test set.

we can evaluate the precision of

the returned examples manually

Beware the difference between

Determine for a sentence

whether an entity pair in the sen-

tence is in a particular relation

Recall and precision

Determine from a text:

We may use several occurrences

of the pair in the text to draw a

conclusion

Precision

63

We skip the confidence scoring

More fine grained IE

Tokenization+tagging

Identifying the "actors"

Chunking

Named-entity recognition

Co-reference resolution

Relation detection

Event detection

Co-reference resolution of events

Temporal extraction

Template filling

64

So far Possible refinements

Today

Sentence structure:


Treebanks


Chunking



Pipelines

65

Some example systems66

Stanford core nlp: http://corenlp.run/

SpaCy (Python): https://spacy.io/docs/api/

OpenNLP (Java): https://opennlp.apache.org/docs/

GATE (Java): https://gate.ac.uk/

UDPipe: http://ufal.mff.cuni.cz/udpipe

Online demo: http://lindat.mff.cuni.cz/services/udpipe/

http://corenlp.run/

https://spacy.io/docs/api/

https://opennlp.apache.org/docs/

https://gate.ac.uk/

http://ufal.mff.cuni.cz/udpipe

http://lindat.mff.cuni.cz/services/udpipe/

in4080 natural language processing · can use various learners, e.g. logistic regression (maxent)...

Documents