sequence classification: chunking & ner shallow processing techniques for nlp ling570 november...

Post on 19-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Sequence Classification:

Chunking & NERShallow Processing Techniques for NLP

Ling570November 23, 2011

Roadmap Named Entity Recognition

Chunking

HW #9

Named Entity Recognition

RoadmapNamed Entity Recognition

Definition

Motivation

Challenges

Common Approach

Named Entity RecognitionTask: Identify Named Entities in (typically)

unstructured text

Typical entities:Person namesLocationsOrganizationsDatesTimes

ExampleMicrosoft released Windows Vista in 2007.

Example due to F. Xia

ExampleMicrosoft released Windows Vista in 2007.

<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>

Example due to F. Xia

ExampleMicrosoft released Windows Vista in 2007.

<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>

Entities:Often application/domain specific

Business intelligence:

Example due to F. Xia

ExampleMicrosoft released Windows Vista in 2007.

<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>

Entities:Often application/domain specific

Business intelligence: products, companies, featuresBiomedical:

Example due to F. Xia

ExampleMicrosoft released Windows Vista in 2007.

<ORG>Microsoft</ORG> released <PRODUCT>Windows Vista</PRODUCT> in <YEAR>2007</YEAR>

Entities:Often application/domain specific

Business intelligence: products, companies, featuresBiomedical: Genes, proteins, diseases, drugs, …

Example due to F. Xia

Named Entity TypesCommon categories

Named Entity ExamplesFor common categories:

Why NER?Machine translation:

Why NER?Machine translation:

Person

Why NER?Machine translation:

Person names typically not translatedPossibly transliteratedWaldheim

Number:

Why NER?Machine translation:

Person names typically not translatedPossibly transliteratedWaldheim

Number: 9/11: Date vs ratio911: Emergency phone number, simple number

Why NER?Information extraction:

MUC task: Joint ventures/mergersFocus on

Why NER?Information extraction:

MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),

valuations

Why NER?Information extraction:

MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),

valuations

Information retrieval:Named entities focus of retrieval In some data sets, 60+% queries target NEs

Why NER?Information extraction:

MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),

valuations

Information retrieval:Named entities focus of retrieval In some data sets, 60+% queries target NEs

Text-to-speech:

Why NER? Information extraction:

MUC task: Joint ventures/mergersFocus on Company names, Person Names (CEO),

valuations

Information retrieval: Named entities focus of retrieval In some data sets, 60+% queries target NEs

Text-to-speech: 206-616-5728

Phone numbers (vs other digit strings) , differ by language

ChallengesAmbiguity

Washington chose

ChallengesAmbiguity

Washington choseD.C., State, George, etc

Most digit strings

ChallengesAmbiguity

Washington choseD.C., State, George, etc

Most digit strings

cat: (95 results)

ChallengesAmbiguity

Washington choseD.C., State, George, etc

Most digit strings

cat: (95 results)CAT(erpillar) stock tickerComputerized Axial TomographyChloramphenicol Acetyl Transferasesmall furry mammal

Context & Ambiguity

EvaluationPrecision

Recall

F-measure

ResourcesOnline:

Name listsBaby name, who’s who, newswire services,

census.govGazetteersSEC listings of companies

ToolsLingpipeOpenNLPStanford NLP toolkit

Approaches to NERRule/Regex-based:

Approaches to NERRule/Regex-based:

Match names/entities in listsRegex:

Approaches to NERRule/Regex-based:

Match names/entities in listsRegex: e.g \d\d/\d\d/\d\d: 11/23/11Currency: $\d+\.\d+

Approaches to NERRule/Regex-based:

Match names/entities in listsRegex: e.g \d\d/\d\d/\d\d: 11/23/11Currency: $\d+\.\d+

Machine Learning via Sequence Labeling:Better for names, organizations

Approaches to NERRule/Regex-based:

Match names/entities in listsRegex: e.g \d\d/\d\d/\d\d: 11/23/11Currency: $\d+\.\d+

Machine Learning via Sequence Labeling:Better for names, organizations

Hybrid

NER as Sequence Labeling

NER as Classification TaskInstance:

NER as Classification TaskInstance: token

Labels:

NER as Classification TaskInstance: token

Labels:Position: B(eginning), I(nside), Outside

NER as Classification TaskInstance: token

Labels:Position: B(eginning), I(nside), OutsideNER types: PER, ORG, LOC, NUM

NER as Classification TaskInstance: token

Labels:Position: B(eginning), I(nside), OutsideNER types: PER, ORG, LOC, NUMLabel: Type-Position, e.g. PER-B, PER-I, O, …How many tags?

NER as Classification TaskInstance: token

Labels:Position: B(eginning), I(nside), OutsideNER types: PER, ORG, LOC, NUMLabel: Type-Position, e.g. PER-B, PER-I, O, …How many tags?

(|NER Types|x 2) + 1

NER as Classification: Features

What information can we use for NER?

NER as Classification: Features

What information can we use for NER?

NER as Classification: Features

What information can we use for NER?

Predictive tokens: e.g. MD, Rev, Inc,..

How general are these features?

NER as Classification: Features

What information can we use for NER?

Predictive tokens: e.g. MD, Rev, Inc,..

How general are these features? Language? Genre? Domain?

NER as Classification: Shape Features

Shape types:

NER as Classification: Shape Features

Shape types: lower: e.g. cumming

All lower case

NER as Classification: Shape Features

Shape types: lower: e.g. cumming

All lower casecapitalized: e.g. Washington

First letter uppercase

NER as Classification: Shape Features

Shape types: lower: e.g. cumming

All lower casecapitalized: e.g. Washington

First letter uppercaseall caps: e.g. WHO

all letters capitalized

NER as Classification: Shape Features

Shape types: lower: e.g. cumming

All lower casecapitalized: e.g. Washington

First letter uppercaseall caps: e.g. WHO

all letters capitalizedmixed case: eBay

Mixed upper and lower case

NER as Classification: Shape Features

Shape types: lower: e.g. cumming

All lower casecapitalized: e.g. Washington

First letter uppercaseall caps: e.g. WHO

all letters capitalizedmixed case: eBay

Mixed upper and lower caseCapitalized with period: H.

NER as Classification: Shape Features

Shape types: lower: e.g. cumming

All lower casecapitalized: e.g. Washington

First letter uppercaseall caps: e.g. WHO

all letters capitalizedmixed case: eBay

Mixed upper and lower caseCapitalized with period: H.Ends with digit: A9

NER as Classification: Shape Features

Shape types: lower: e.g. cumming

All lower case capitalized: e.g. Washington

First letter uppercase all caps: e.g. WHO

all letters capitalized mixed case: eBay

Mixed upper and lower case Capitalized with period: H. Ends with digit: A9 Contains hyphen: H-P

Example Instance Representation

Example

Sequence LabelingExample

EvaluationSystem: output of automatic tagging

Gold Standard: true tags

EvaluationSystem: output of automatic tagging

Gold Standard: true tags

Precision: # correct chunks/# system chunks

Recall: # correct chunks/# gold chunks

F-measure:

EvaluationSystem: output of automatic tagging

Gold Standard: true tags

Precision: # correct chunks/# system chunks

Recall: # correct chunks/# gold chunks

F-measure:

F1 balances precision & recall

EvaluationStandard measures:

Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)

EvaluationStandard measures:

Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)

Classifiers vs evaluation measuresClassifiers optimize tag accuracy

EvaluationStandard measures:

Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)

Classifiers vs evaluation measuresClassifiers optimize tag accuracy

Most common tag?

EvaluationStandard measures:

Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)

Classifiers vs evaluation measuresClassifiers optimize tag accuracy

Most common tag? O – most tokens aren’t NEs

Evaluation measures focuses on NE

EvaluationStandard measures:

Precision, Recall, F-measureComputed on entity types (Co-NLL evaluation)

Classifiers vs evaluation measuresClassifiers optimize tag accuracy

Most common tag? O – most tokens aren’t NEs

Evaluation measures focuses on NE

State-of-the-art:Standard tasks: PER, LOC: 0.92; ORG: 0.84

Hybrid ApproachesPractical sytems

Exploit lists, rules, learning…

Hybrid ApproachesPractical sytems

Exploit lists, rules, learning…Multi-pass:

Early passes: high precision, low recallLater passes: noisier sequence learning

Hybrid ApproachesPractical sytems

Exploit lists, rules, learning…Multi-pass:

Early passes: high precision, low recallLater passes: noisier sequence learning

Hybrid system:High precision rules tag unambiguous mentions

Use string matching to capture substring matches

Hybrid ApproachesPractical sytems

Exploit lists, rules, learning…Multi-pass:

Early passes: high precision, low recallLater passes: noisier sequence learning

Hybrid system:High precision rules tag unambiguous mentions

Use string matching to capture substring matchesTag items from domain-specific name listsApply sequence labeler

Chunking

RoadmapChunking

Definition

Motivation

Challenges

Approach

What is Chunking?Form of partial (shallow) parsing

What is Chunking?Form of partial (shallow) parsing

Extracts major syntactic units, but not full parse trees

What is Chunking?Form of partial (shallow) parsing

Extracts major syntactic units, but not full parse trees

Task: identify and classify Flat, non-overlapping segments of a sentence

What is Chunking?Form of partial (shallow) parsing

Extracts major syntactic units, but not full parse trees

Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrases

What is Chunking?Form of partial (shallow) parsing

Extracts major syntactic units, but not full parse trees

Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrasesCorrespond to major POS

May ignore some categories; i.e. base NP chunking

What is Chunking?Form of partial (shallow) parsing

Extracts major syntactic units, but not full parse trees

Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrasesCorrespond to major POS

May ignore some categories; i.e. base NP chunkingCreate simple bracketing

[NPThe morning flight][PPfrom][NPDenver][Vphas arrived]

What is Chunking?Form of partial (shallow) parsing

Extracts major syntactic units, but not full parse trees

Task: identify and classify Flat, non-overlapping segments of a sentenceBasic non-recursive phrasesCorrespond to major POS

May ignore some categories; i.e. base NP chunkingCreate simple bracketing

[NPThe morning flight][PPfrom][NPDenver][Vphas arrived]

[NPThe morning flight] from [NPDenver] has arrived

Why Chunking?Used when full parse unnecessary

Why Chunking?Used when full parse unnecessary

Or infeasible or impossible (when?)

Why Chunking?Used when full parse unnecessary

Or infeasible or impossible (when?)

Extraction of subcategorization frames Identify verb arguments

e.g. VP NP VP NP NP VP NP to NP

Why Chunking?Used when full parse unnecessary

Or infeasible or impossible (when?)

Extraction of subcategorization frames Identify verb arguments

e.g. VP NP VP NP NP VP NP to NP

Information extraction: who did what to whom

Why Chunking?Used when full parse unnecessary

Or infeasible or impossible (when?)

Extraction of subcategorization frames Identify verb arguments

e.g. VP NP VP NP NP VP NP to NP

Information extraction: who did what to whom

Summarization: Base information, remove mods

Why Chunking?Used when full parse unnecessary

Or infeasible or impossible (when?)

Extraction of subcategorization frames Identify verb arguments

e.g. VP NP VP NP NP VP NP to NP

Information extraction: who did what to whom

Summarization: Base information, remove mods

Information retrieval: Restrict indexing to base NPs

Processing Example Tokenization: The morning flight from Denver has arrived

Processing Example Tokenization: The morning flight from Denver has arrived

POS tagging: DT JJ N PREP NNP AUX V

Processing Example Tokenization: The morning flight from Denver has arrived

POS tagging: DT JJ N PREP NNP AUX V

Chunking: NP PP NP VP

Processing Example Tokenization: The morning flight from Denver has arrived

POS tagging: DT JJ N PREP NNP AUX V

Chunking: NP PP NP VP

Extraction: NP NP VP

etc

ApproachesFinite-state Approaches

Grammatical rules in FSTsCascade to produce more complex structure

ApproachesFinite-state Approaches

Grammatical rules in FSTsCascade to produce more complex structure

Machine LearningSimilar to POS tagging

Finite-State Rule-Based Chunking

Hand-crafted rules model phrasesTypically application-specific

Finite-State Rule-Based Chunking

Hand-crafted rules model phrasesTypically application-specific

Left-to-right longest match (Abney 1996)Start at beginning of sentenceFind longest matching rule

Finite-State Rule-Based Chunking

Hand-crafted rules model phrasesTypically application-specific

Left-to-right longest match (Abney 1996)Start at beginning of sentenceFind longest matching ruleGreedy approach, not guaranteed optimal

Finite-State Rule-Based Chunking

Chunk rules:Cannot contain recursion

NP -> Det Nominal:

Finite-State Rule-Based Chunking

Chunk rules:Cannot contain recursion

NP -> Det Nominal: OkayNominal -> Nominal PP:

Finite-State Rule-Based Chunking

Chunk rules:Cannot contain recursion

NP -> Det Nominal: OkayNominal -> Nominal PP: Not okay

Examples:NP (Det) Noun* NounNP Proper-NounVP VerbVP Aux Verb

Finite-State Rule-Based Chunking

Chunk rules: Cannot contain recursion

NP -> Det Nominal: OkayNominal -> Nominal PP: Not okay

Examples: NP (Det) Noun* Noun NP Proper-Noun VP Verb VP Aux Verb

Consider: Time flies like an arrow

Is this what we want?

Cascading FSTsRicher partial parsing

Pass output of FST to next FST

Cascading FSTsRicher partial parsing

Pass output of FST to next FST

Approach:First stage: Base phrase chunkingNext stage: Larger constituents (e.g. PPs, VPs)Highest stage: Sentences

Example

Chunking by ClassificationModel chunking as task similar to POS tagging

Instance:

Chunking by ClassificationModel chunking as task similar to POS tagging

Instance: tokens

Labels: Simultaneously encode segmentation &

identification

Chunking by ClassificationModel chunking as task similar to POS tagging

Instance: tokens

Labels: Simultaneously encode segmentation &

identification IOB (or BIO tagging) (also BIOE or BIOSE)

Segment: B(eginning), I (nternal), O(utside)

Chunking by ClassificationModel chunking as task similar to POS tagging

Instance: tokens

Labels: Simultaneously encode segmentation &

identification IOB (or BIO tagging) (also BIOE or BIOSE)

Segment: B(eginning), I (nternal), O(utside)Identity: Phrase category: NP, VP, PP, etc.

Chunking by ClassificationModel chunking as task similar to POS tagging

Instance: tokens

Labels: Simultaneously encode segmentation &

identification IOB (or BIO tagging) (also BIOE or BIOSE)

Segment: B(eginning), I (nternal), O(utside)Identity: Phrase category: NP, VP, PP, etc.The morning flight from Denver has arrivedNP-B NP-I NP-I PP-B NP-B VP-B VP-I

Chunking by ClassificationModel chunking as task similar to POS tagging

Instance: tokens

Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE)

Segment: B(eginning), I (nternal), O(utside)Identity: Phrase category: NP, VP, PP, etc.The morning flight from Denver has arrivedNP-B NP-I NP-I PP-B NP-B VP-B VP-INP-B NP-I NP-I NP-B

Features for ChunkingWhat are good features?

Features for ChunkingWhat are good features?

Preceding tagsfor 2 preceding words

Features for ChunkingWhat are good features?

Preceding tagsfor 2 preceding words

Wordsfor 2 preceding, current, 2 following

Features for ChunkingWhat are good features?

Preceding tagsfor 2 preceding words

Wordsfor 2 preceding, current, 2 following

Parts of speechfor 2 preceding, current, 2 following

Features for ChunkingWhat are good features?

Preceding tagsfor 2 preceding words

Wordsfor 2 preceding, current, 2 following

Parts of speechfor 2 preceding, current, 2 following

Vector includes those features + true label

Chunking as ClassificationExample

EvaluationSystem: output of automatic tagging

Gold Standard: true tags Typically extracted from parsed treebank

Precision: # correct chunks/# system chunks

Recall: # correct chunks/# gold chunks

F-measure:

F1 balances precision & recall

State-of-the-ArtBase NP chunking: 0.96

State-of-the-ArtBase NP chunking: 0.96

Complex phrases: Learning: 0.92-0.94Most learners achieve similar results

Rule-based: 0.85-0.92

State-of-the-ArtBase NP chunking: 0.96

Complex phrases: Learning: 0.92-0.94Most learners achieve similar results

Rule-based: 0.85-0.92

Limiting factors:

State-of-the-ArtBase NP chunking: 0.96

Complex phrases: Learning: 0.92-0.94Most learners achieve similar results

Rule-based: 0.85-0.92

Limiting factors:POS tagging accuracy Inconsistent labeling (parse tree extraction)Conjunctions

Late departures and arrivals are common in winterLate departures and cancellations are common in winter

HW #9

Building a MaxEnt POS Tagger

Q1: Build feature vector representations for POS tagging in SVMlight format

maxent_features.* training_file testing_file rare_wd_threshold rare_feat_threshold outdir

training_file, testing_file: like HW#7w1/t1 w2/t2 …wn/tn

Filter rare words and infrequent features

Store vectors & intermediate representations in outdir

Feature RepresentationsFeatures:

Ratnaparkhi, 1996, Table 1 (duplicated in MaxEnt slides)

Character issues:Replace “,” with “comma”Replace “:” with “colon”

Mallet and svmlight format use these as delimiters

Q2: ExperimentsRun MaxEnt classification using your training and

test files

Compare effects of different thresholds on feature count, accuracy, and runtime

Note: Big filesThis assignment will produce even larger sets of

results that HW#8. Please gzip your tar files. If the DropBox won’t accept the files, you can store

the files on patas. Just let Sanghoun know where to find them.

top related