parsing with context-free grammars for asr julia hirschberg cs 4706 slides with contributions from...
TRANSCRIPT
Parsing withContext-Free Grammars for
ASR
Julia Hirschberg
CS 4706
Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin
What is Syntax?
• Structure of language• How words are arranged together and related to one
another• Goal of syntactic analysis: relate surface form (what
someone says or writes) to underlying structure, to support semantic analysis (what the utterance or text means)
• Syntactic representation: typically a tree structure
Structure in Strings
• A set of words, or, a lexicon: the a small nice big very boy girl sees likes
• Some `good’ (grammatical) sentences:– the boy likes a girl – the small girl likes the big girl– a very small nice boy sees a very nice boy
• Some bad (ungrammatical) sentences:– *the boy the girl– *small boy likes nice girl
• Can we find a way of distinguishing between the two kinds of sequences?
• Can we identify similarities among grammatical subsequences?
One Version of Constituent Structure
• Lexicon: the a small nice big very boy girl sees likes• Grammatical sentences:
– (the) boy (likes a girl) – (the small) girl (likes the big girl)– (a very small nice) boy (sees a very nice boy)
• Ungrammatical sentences:– *(the) boy (the girl)– *(small) boy (likes the nice girl)
Another Constituency Hypothesis
• Lexicon: the a small nice big very boy girl sees likes• Grammatical sentences:
– (the boy) likes (a girl) – (the small girl) likes (the big girl)– (a very small nice boy) sees (a very nice boy)
• Ungrammatical sentences:– *(the boy) (the girl)– *(small boy) likes (the nice girl)
• Better: fewer types of constituents (blue and red are of same type)
Even More Structures
• Lexicon: the a small nice big very boy girl sees likes• Grammatical sentences:
– ((the) boy) likes ((a) girl) – ((the) (small) girl) likes ((the) (big) girl)– ((a) ((very) small) (nice) boy) sees ((a) ((very) nice)
girl)• Ungrammatical sentences:
– *((the) boy) ((the) girl)– *((small) boy) likes ((the) (nice) girl)
From Substrings to Trees
• (((the) boy) likes ((a) girl))
boythe
likesgirl
a
How do we Label the Nodes?
• ( ((the) boy) likes ((a) girl) )• Choose constituents so each one has one non-bracketed
word: the head• Group words by distribution of constituents they head
(POS)– Noun (N), verb (V), adjective (Adj), adverb (Adv),
determiner (Det)• Category of constituent: XP, where X is POS
– NP, S, AdjP, AdvP, DetP
Types of Nodes
• (((the/Det) boy/N) likes/V ((a/Det) girl/N))
boy
the
likes
girl
a
DetP
NP NP
DetP
S
Phrase-structuretree
nonterminalsymbols= constituents
terminal symbols = words
Determining Part-of-Speech
A blue seat/a child seat: noun or adjective?– Syntax:
• a blue seat a child seat• a very blue seat *a very child seat • this seat is blue *this seat is child
– Morphology:• bluer *childer
– blue and child are not the same POS – blue is Adj, child is Noun
Determining Part-of-Speech
– Preposition or particle?• A he threw out the garbage• B he threw the garbage out the door• A he threw the garbage out • B *he threw the garbage the door out
– The two out are not same POS• A is particle, B is Preposition
Constituency
• Some Noun phrases (NPs)• A red dog on a blue tree• A blue dog on a red tree• Some big dogs and some little dogs• A dog• I• Big dogs, little dogs, red dogs, blue dogs,
yellow dogs, green dogs, black dogs, and white dogs
• How do we know these form a constituent?
NP Constituency
• NPs can all appear before a verb:– Some big dogs and some little dogs are going
around in cars…– Big dogs, little dogs, red dogs, blue dogs,
yellow dogs, green dogs, black dogs, and white dogs are all at a dog party!
– I do not• But individual words can’t always appear before
verbs:– *little are going…– *blue are…– *and are
• Must be able to state generalizations like:– Noun phrases occur before verbs
PP Constituency
• Preposing and postposing:– Under a tree is a yellow dog. – A yellow dog is under a tree.
• But not:– *Under, is a yellow dog a tree. – *Under a is a yellow dog tree.
• Prepositional phrases notable for ambiguity in attachment– I saw a man on a hill with a telescope.
Context-Free Grammars
• Defined in formal language theory– Terminals: e.g. cat – Non-terminal symbols: e.g. NP, VP– Start symbol: e.g. S– Rewriting rules: e.g. S NP VP
• Start with start symbol, rewrite using rules, done when only terminals left
A Fragment of English
S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the
Input: the cat is on the mat
Derivations in a CFG
S
S
S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the
Derivations in a CFG
NP VP
NP
S
VP
S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the
Derivations in a CFG
DetP N VP
DetP
NP
S
VP
N
S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the
Derivations in a CFG
the cat VP
catthe
DetP
NP
S
VP
N
S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the
Derivations in a CFG
the cat V PP
catthe
DetP
NP
PP
S
VP
N V
S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the
Derivations in a CFG
the cat is Prep NP
catthe is
DetP
NP
PP
Prep
S
VP
N
NP
V
S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the
Derivations in a CFG
the cat is on Det N
catthe is
DetP
NP
DetP
PP
Prep
S
VP
N
NP
V
S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the
on N
Derivations in a CFG
the cat is on the mat
catthe is
DetP
NP
DetP
PP
Prep
S
VP
N
NP
V
S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the
on N
the mat
A More Complicated Fragment of English
• S NP VP• S VP• VP V PP• VP V NP• VP V• NP DetP NP• NP N NP• NP N• PP Prep NP
• N cat | mat | food | bowl | Mary• V is | likes | sits• Prep on | in | under• DetP the | a
Mary likes the cat bowl.
The cat ate the tasty food.
Hello. Nice talking to you.
Pocket Sphinx Grammar Format
• Variables go in angle brackets, e.g. <city>• Terminals must appear in your pronunciation
dictionary (case sensitive)• X Y is concatenation -- e.g., I WANT• (X | Y) means X or Y -- e.g., (WANT|NEED)• Square brackets mean optional -- e.g., [ON]
FRIDAY• * means that the expansion may be spoken zero or
more times -- e.g. <digit>*• + means one or more times -- e.g. <digit>+
Example
• <city> = BOSTON | NEWYORK | WASHINGTON | BALTIMORE;
• <time> = MORNING | EVENING; • <day> = FRIDAY | MONDAY; • public <query> = (((WHAT TRAINS LEAVE) |
(WHAT TIME CAN I TRAVEL) | (IS THERE A TRAIN)) (FROM|TO) <city> [(FROM|TO) <city>] ON <day> [<time>]);
Hello. No. I want to go on Tuesday. When does the train leave?
Next Class
• Language modeling for large vocabulary applications: Ngrams