learning semantics before syntax dana angluin leonor becerra-bonache [email protected]...
TRANSCRIPT
![Page 2: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/2.jpg)
CONTENTS
1. MOTIVATION
2. MEANING AND DENOTATION FUNCTIONS
3. STRATEGIES FOR LEARNING MEANINGS
4. OUR LEARNING ALGORITHM4.1. Description4.2. Formal results4.3. Empirical results
5. DISCUSSION AND FUTURE WORK
![Page 3: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/3.jpg)
CONTENTS
1. MOTIVATION
2. MEANING AND DENOTATION FUNCTIONS
3. STRATEGIES FOR LEARNING MEANINGS
4. OUR LEARNING ALGORITHM4.1. Description4.2. Formal results4.3. Empirical results
5. DISCUSSION AND FUTURE WORK
![Page 4: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/4.jpg)
Among the more interesting remaining theoretical questions [in Grammatical Inference] are: inference in the presence of noise, general strategies for interactive presentation and the inference of systems with semantics.
[Feldman, 1972]
1. MOTIVATION
![Page 5: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/5.jpg)
Results obtained in Grammatical Inference show that learning formal languages from positive data is hard.
- Omit semantic information
- Reduce the learning problem to syntax learning
1. MOTIVATION
Among the more interesting remaining theoretical questions [in Grammatical Inference] are: inference in the presence of noise, general strategies for interactive presentation and the inference of systems with semantics.
[Feldman, 1972]
![Page 6: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/6.jpg)
Important role of semantics and context in the early stages of children’s language acquisition, especially in the 2-word stage.
1. MOTIVATION
![Page 7: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/7.jpg)
1. MOTIVATION
Can semantic information simplify
the learning problem?
![Page 8: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/8.jpg)
Inspired by the 2-word stage, we propose:
Differences with respect to other approaches:
- Our model does not rely on a complex syntactic mechanism
- The input of our learning algorithm is utterances and the situations in which these utterances are produced.
Simple computational model that takes
into account semantics and context
1. MOTIVATION
![Page 9: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/9.jpg)
Our model is also designed to address the issue of the kinds of input available to the learner.
- Positive data plays the main role in the process of language acquisition.
- We also want to model another kind of information that is available to the child during the 2-word stage:
CHILD: Eve lunchADULT: Eve is having lunch
[Brown and Bellugi, 1964]
1. MOTIVATION
Corrections given by means of meaning-preserving expansions of incomplete sentences uttered by the child.
![Page 10: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/10.jpg)
In the presence of semantics determined by a shared context, such corrections appear to be closely related to positive data.
CHILD ADULT
SITUATION
Daddy is throwing the ball!Daddy throw
Daddy is throwing the ball!Daddy throw
POSITIVE DATA
CORRECTION
1. MOTIVATION
![Page 11: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/11.jpg)
Our model accommodates two different tasks: comprehension and production.
We focus initially on a simple formal framework.
1. MOTIVATION
Comprehension task:
the red triangle
![Page 12: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/12.jpg)
Our model accommodates two different tasks: comprehension and production.
We focus initially on a simple formal framework.
1. MOTIVATION
Production task:
red triangle
![Page 13: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/13.jpg)
Here we consider comprehension and positive data.
The scenario is cross-situational and supervised.
The goal of the learner is to learn the meaning function, allowing the learner to comprehend novel utterances.
1. MOTIVATION
![Page 14: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/14.jpg)
CONTENTS
1. MOTIVATION
2. MEANING AND DENOTATION FUNCTIONS
3. STRATEGIES FOR LEARNING MEANINGS
4. OUR LEARNING ALGORITHM4.1. Description4.2. Formal results4.3. Empirical results
5. DISCUSSION AND FUTURE WORK
![Page 15: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/15.jpg)
To specify a meaning function, we use:
- A finite state transducer M that maps sequences of words to sequences of predicate symbols.
- A path-mapping function π that maps sequences of predicate symbols to sequences of logical atoms.
2. MEANING AND DENOTATION FUNCTIONS
![Page 16: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/16.jpg)
2. MEANING AND DENOTATION FUNCTIONS
q
0
q
1
q
2
q
3
q
4
q
5
the / ε
circle / cisquare / sqtriangle / tr
blue / blgreen / grred / re
circle / cisquare / sqtriangle / tr
above / abbelow / abt
to / ε the / ε
of / ε
left / leright / let
q
6
A meaning transducer M1 for a class of sentences in English:
![Page 17: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/17.jpg)
2. MEANING AND DENOTATION FUNCTIONS
the blue triangle above the square
< bl, tr, ab, sq >
Pat
h-m
apF
ST
< bl(x1), tr(x1), ab(x1, x2 ), sq(x2) >
![Page 18: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/18.jpg)
2. MEANING AND DENOTATION FUNCTIONS
u = the blue triangle above the square
S1 = {bi(t1 ), bl(t1 ), tr(t1 ), ab(t1, t2 ), bi(t2 ), gr(t2 ), sq(t2 )}t2
t1
f(x1 )=t1 and f(x2 )=t2 is the unique match in S1
To determine a denotation:
A denotation function is specified by a choice of parameter which from {first, last}.
English: which=firstMandarin: which=last
< bl(x1), tr(x1), ab(x1, x2 ), sq(x2) >
![Page 19: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/19.jpg)
CONTENTS
1. MOTIVATION
2. MEANING AND DENOTATION FUNCTIONS
3. STRATEGIES FOR LEARNING MEANINGS
4. OUR LEARNING ALGORITHM4.1. Description4.2. Formal results4.3. Empirical results
5. DISCUSSION AND FUTURE WORK
![Page 20: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/20.jpg)
English: input = triangle output = tr (independently of the state)
3. STRATEGIES FOR LEARNING MEANINGS
Assumption 1. For all states q Q and words w W, γ(q, w) is independent of q
Cross-situational conjunctive learning strategy: for each encountered word w, we consider all utterances ui containing w and their corresponding situations Si, and form the intersection of the sets of predicates occurring in these Si.
C(w) = ∩ {predicates(Si ): w in ui }
![Page 21: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/21.jpg)
Background predicates removed (they are present in every situation).
3. STRATEGIES FOR LEARNING MEANINGS
![Page 22: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/22.jpg)
CONTENTS
1. MOTIVATION
2. MEANING AND DENOTATION FUNCTIONS
3. STRATEGIES FOR LEARNING MEANINGS
4. OUR LEARNING ALGORITHM4.1. Description4.2. Formal results4.3. Empirical results
5. DISCUSSION AND FUTURE WORK
![Page 23: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/23.jpg)
1. Find the current background predicates.
2. Form the partition K according to word co-occurrence classes.
3. Find the set of unary predicates that occur in every situation in which K occurred, and assign at most one non-background unary predicate to each word co-occurrence class.
4. Find all the binary predicates that are possible meanings of K, and assign at most one non-background binary predicate to each word co-occurrence class not already assigned a unary predicate.
5. For each word not yet assigned a value, assign ε.
4.1. Description
Input: sequences of pairs (Si, ui)
Goal: to learn a meaning function γ’ such that γ(u) = γ’(u) for all utterances u L(M).
![Page 24: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/24.jpg)
Background predicates: bi (representing big)
4.1. Description
Step 1
Step 2
![Page 25: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/25.jpg)
Background predicates: bi (representing big)
4.1. Description
Step 1
Step 2
![Page 26: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/26.jpg)
New example is added: (brtlbbt, el triangulo rojo a la izquierda del triangulo azul)
4.1. Description
Step 3
Step 5
![Page 27: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/27.jpg)
t3
t1t2
u = the green circle to the right of the red triangle
S = {bi(t1 ), re(t1 ), tr(t1 ), le(t1, t2 ), bi(t2 ), gr(t2 ), ci(t2 ),ab(t2, t3), bi(t3), re(t3), sq(t3)}
Set of unary predicates (found it in step 3) is used to define a partial meaning function.
< gr, ci, re, tr >
< t2, t1 >, < t3, t2, t1 >, < t2, t3, t1 >, < t2, t1, t3 >
Find possible order of arguments of binary predicates.
Only orderings compatible with < gr, ci, re, tr >:
possible(S, u) = let
4.1. Description
![Page 28: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/28.jpg)
4.2. Formal results
Assumption 2. We assume that the output function γ is well-behaved with respect to co-occurrence classes.
Mandarin: tr {san, jiao}
Greek: ci {o, kyklos}
Assumption 1. For all states q Q and words w W, γ(q, w) is independent of q.
Theorem 1. Under Assumptions 1 through 6, the learning algorithm finitely converges to a meaning function γ’ such that γ’(u)= γ(u) for every u L(M).
![Page 29: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/29.jpg)
4.2. Formal results
Assumption 3. For all co-occurrence classes K, the set of predicates commonto meanings of utterances from L(M) containing K is just γ(K).
English: {to, of}
the circle to the right of the square {ci, let, sq}
the triangle to the left of the circle {tr, le, ci}
the square to the right of the triangle {sq, let, tr}
= Ø
Assumption 4. Kn converges to the correct co-occurrence classes.
Spanish: 6 random examples (circulo rojo)
![Page 30: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/30.jpg)
Assumption 5. For each co-occurrence class K, C’(K) converges to the set of primary predicates that occur in meanings of utterances containing K.
Assumption 6. If the unary predicates are correctly learned, then everyincorrect binary predicate is eliminated by incompatibility with some situationin the data.
Spanish: 6 random examples triangulo – ((gr 1) (tr 1))
+ 1 example triangulo – ((tr 1))
4.2. Formal results
< gr, ci, re, tr > English:
possible(S, u) =
letorderings compatible
le
![Page 31: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/31.jpg)
Implementation and test of our algorithm:
- Arabic - Mandarin- English - Russian- Greek - Spanish- Hebrew - Turkish- Hindi
In addition, we created a second English sample labeled Directions (e.g., go to the circle and then north to the triangle).
Goal: to asses the robustness of our assumptions for the domain of geometric shapes and the adequacy of our model to deal with cross-linguistic data.
4.3. Empirical results
![Page 32: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/32.jpg)
EXPERIMENT 1
Native speakers translated a set of 15 utterances.
Results:
- For English, Mandarin, Spanish and English Directions samples: 15 initial examples are sufficient for
Word co-occurrence classes to converge
Correct resolution of the binary predicates
- For the other samples: 15 initial examples are not sufficient to ensure convergence to the final sets of predicates associated with each class of words.
4.3. Empirical results
![Page 33: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/33.jpg)
Spanish: results for initial sample have converged
4.3. Empirical results
![Page 34: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/34.jpg)
Greek: results after convergence; kokkinos and prasinos not sufficiently resolved
4.3. Empirical results
![Page 35: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/35.jpg)
EXPERIMENT 2
Construction of meaning transducers for each language in our study.
Large random samples.
Results:
- Our theoretical assumptions are satisfied and a correct meaning function is found in all the cases,
4.3. Empirical results
except for Arabic and Greek; some of our assumptions are violated, and a fully correct meaning function is not guaranteed in these two cases. However, a largely correct meaning function is achieved.
![Page 36: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/36.jpg)
EXPERIMENT 3
10 runs for each language, each run consisting of generating a sequence of random examples until convergence.
Statistics on the results of the number of examples to convergence of the random runs:
4.3. Empirical results
![Page 37: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/37.jpg)
CONTENTS
1. MOTIVATION
2. MEANING AND DENOTATION FUNCTIONS
3. STRATEGIES FOR LEARNING MEANINGS
4. OUR LEARNING ALGORITHM4.1. Description4.2. Formal results4.3. Empirical results
5. DISCUSSION AND FUTURE WORK
![Page 38: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/38.jpg)
What about computational feasibility?
- Word co-occurrence classes, the sets of predicates that have occurred with them, and background predicates can all be maintained efficiently and incrementally.
- The problem of determining whether there is a match of π(M(u)) in a situation S when there are N variables and at least N things, includes as a special case finding a directed path of length N in the situation graph, which is NP-hard in general.
*It is likely that human learners do not cope well with situations involving arbitrarily many things, and it is important to find good models of focus of attention.
5. DISCUSSION AND FUTURE WORK
![Page 39: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/39.jpg)
Future work:
- To relax some of the more restrictive assumptions (in the current framework, disjunctive meaning cannot be learned, nor can a function that assigns meaning to more than one of a set of co-occurring words).
- Statistical approaches may produce more powerful versions of the models we consider.
- To incorporate production and syntax learning by the learner, as well as corrections and expansions from the teacher.
5. DISCUSSION AND FUTURE WORK
![Page 40: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/40.jpg)
Angluin, D., Becerra-Bonache, L.: Learning Meaning Before Syntax. Technical Report YALE/DCS/TR1407, Computer Science Department, Yale University (2008).
Brown, R. and Bellugi, U.: Three processes in the child’s acquisition of syntax. Harvard Educational Review 34,133-151 (1964).
Feldman, J.: Some decidability results on grammatical inference and complexity. Information and Control 20, 244-262 (1972)
REFERENCES
![Page 41: LEARNING SEMANTICS BEFORE SYNTAX Dana Angluin Leonor Becerra-Bonache dana.angluin@yale.edu leonor.becerra-bonache@yale.edu](https://reader035.vdocuments.net/reader035/viewer/2022062801/56649e7b5503460f94b7d125/html5/thumbnails/41.jpg)