learning from text

67
Zadar, August 2010 1 Learning from Text Colin de la Higuera University of Nantes

Upload: dimaia

Post on 23-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Learning from Text. Colin de la Higuera University of Nantes. Acknowledgements. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning from Text

Zadar, August 2010 1

Learning from Text

Colin de la HigueraUniversity of Nantes

Page 2: Learning from Text

Zadar, August 2010

2

Cdlh 2010

Acknowledgements Laurent Miclet, Jose Oncina, Tim Oates, Anne-

Muriel Arigon, Leo Becerra-Bonache, Rafael Carrasco, Paco Casacuberta, Pierre Dupont, Rémi Eyraud, Philippe Ezequel, Henning Fernau, Jean-Christophe Janodet, Satoshi Kobayachi, Thierry Murgue, Frédéric Tantini, Franck Thollard, Enrique Vidal, Menno van Zaanen,...

http://pagesperso.lina.univ-nantes.fr/~cdlh/http://videolectures.net/colin_de_la_higuera/

Page 3: Learning from Text

Zadar, August 2010

3

Cdlh 2010

Outline1. Motivations, definition and difficulties2. Some negative results3. Learning k-testable languages from

text4. Learning k-reversible languages from

text5. Conclusions

http://pagesperso.lina.univ-nantes.fr/~cdlh/slides/Chapters 8 and 11

Page 4: Learning from Text

Zadar, August 2010

4

Cdlh 2010

1 Identification in the limit

L Pres ℕXA class of languages

A class of grammars

G

L A learnerThe naming function

yields

a

(ℕ)=(ℕ) yields()=yields()L(a())=yields()

Page 5: Learning from Text

Zadar, August 2010

5

Cdlh 2010

Learning from text

Only positive examples are available Danger of over-generalization: why

not return *? The problem is “basic”:

Negative examples might not be available

Or they might be heavily biased: near-misses, absurd examples…

Base line: all the rest is learning with help

Page 6: Learning from Text

Zadar, August 2010

6

Cdlh 2010

PTA

?

GI as a search problem

Page 7: Learning from Text

Zadar, August 2010

7

Cdlh 2010

Questions?

Data is unlabelled… Is this a clustering problem? Is this a problem posed in other

settings?

Page 8: Learning from Text

Zadar, August 2010

8

Cdlh 2010

2 The theory

Gold 67: No super-finite class can be identified from positive examples (or text) only

Necessary and sufficient conditions for learning

Literature: inductive inference, ALT series, …

Page 9: Learning from Text

Zadar, August 2010

9

Cdlh 2010

Limit point

A class L of languages has a limit point if there exists an infinite sequence Ln nℕ

of languages in L such that L0 L1 … Ln …, and there exists another

language L L such that L = nℕLn

L is called a limit point of L

Page 10: Learning from Text

Zadar, August 2010

10

Cdlh 2010

L is a limit point

L0 L1L2

L3

Li

L

Page 11: Learning from Text

Zadar, August 2010

11

Cdlh 2010

Theorem

If L admits a limit point, then L is not learnable from text

Proof:Proof: Let si be a presentation in length-lex order for Li, and s be a presentation in length-lex order for L. Then nℕ i / kn si

k = sk

Note: having a limit point is a sufficient condition for non learnability; not a necessary condition

Page 12: Learning from Text

Zadar, August 2010

12

Cdlh 2010

Mincons classes

A class is mincons if there is an algorithm which, given a sample S,

builds a GG such that S L L(G) L = L(G)

Ie there is a unique minimum (for inclusion) consistent grammar

Page 13: Learning from Text

Zadar, August 2010

13

Cdlh 2010

Accumulation point (Kapur 91)

A class L of languages has an accumulation point if there exists an infinite sequence Sn nℕ of sets such that S0 S1 … Sn …, and L= nℕSn L …and for any nℕ there exists a language Ln’ in L such that Sn Ln’ L.

The language L is called an accumulation point of L

Page 14: Learning from Text

Zadar, August 2010

14

Cdlh 2010

L is an accumulation point

L

Ln’

S0 S1S2

S3

Sn

Page 15: Learning from Text

Zadar, August 2010

15

Cdlh 2010

Theorem (for Mincons classes)

L admits an accumulation point

iff

L is not learnable from text

Page 16: Learning from Text

Zadar, August 2010

16

Cdlh 2010

Infinite Elasticity

If a class of languages has a limit point there exists an infinite ascending chain of languages L0

L1 … Ln ….This property is called infinite

elasticity

Page 17: Learning from Text

Zadar, August 2010

17

Cdlh 2010

Infinite Elasticity

x0 x1

x2

x3

xi Xi+1 Xi+2 Xi+3 Xi+4

Page 18: Learning from Text

Zadar, August 2010

18

Cdlh 2010

Finite elasticity

L has finite elasticity if it does not have infinite elasticity

Page 19: Learning from Text

Zadar, August 2010

19

Cdlh 2010

Theorem (Wright)

If L (G) has finite elasticity and is

mincons, then G is learnable.

Page 20: Learning from Text

Zadar, August 2010

20

Cdlh 2010

Tell tale sets

L(G)

L(G’)TG

x4

x3

x2

x1

Forbidden

Page 21: Learning from Text

Zadar, August 2010

21

Cdlh 2010

Theorem (Angluin)

G is learnable iff there is a computable partial function : G ℕ* such that:

1) nℕ, (G,n) is defined iff GG and L(G)

2) GG, TM={(G,n): nℕ} is a finite subset of L(G) called a tell-tale subset

3) G,G’M, if TM L(G’) then L(G’) L(G)

Page 22: Learning from Text

Zadar, August 2010

22

Cdlh 2010

Proposition (Kapur 91)

A language L in L has a tell-tale subset iff L is not an accumulation point.

(for mincons)

Page 23: Learning from Text

Zadar, August 2010

23

Cdlh 2010

Summarizing

Many alternative ways of proving that identification in the limit is not feasible

Methodological-philosophical discussion We still need practical solutions

Page 24: Learning from Text

Zadar, August 2010 24

3 Learning k-testable languages

P. García and E. Vidal. Inference of K-testable languages in the strict sense and applications

to syntactic pattern recognition. Pattern Analysis and Machine Intelligence, 12(9):920–

925, 1990P. García, E. Vidal, and J. Oncina. Learning

locally testable languages in the strict sense. In Workshop on Algorithmic Learning Theory

(Alt 90), pages 325–338, 1990

Page 25: Learning from Text

Zadar, August 2010

25

Cdlh 2010

Definition

Let k0, a k-testable language in the strict sense (k-TSS) is a 5-tuple Zk=(, I, F, T, C) with: a finite alphabet I, F k-1 (allowed prefixes of length k-1 and

suffixes of length k-1) T k (allowed segments) C <k contains all strings of length less

than k Note that I∩F=C∩Σk-1

Page 26: Learning from Text

Zadar, August 2010

26

Cdlh 2010

The k-testable language is L(Zk)=I* *F - *(k-T)*C

Strings (of length at least k) have to use a good prefix and a good suffix of length k-1, and all sub-strings have to belong to T. Strings of length less than k should be in C

Or: k-T defines the prohibited segments

Key idea: use a window of size k

Page 27: Learning from Text

Zadar, August 2010

27

Cdlh 2010

An example (2-testable)

I={a}

F={a}

T={aa, ab, ba}C={,a}

ab

a

a

ba

Page 28: Learning from Text

Zadar, August 2010

28

Cdlh 2010

Window language

By sliding a window of size 2 over a string we can parse

ababaaababababaaaab OK aaabbaaaababab not OK

Page 29: Learning from Text

Zadar, August 2010

29

Cdlh 2010

The hierarchy of k-TSS languages

k-TSS()={L*: L is k-TSS} All finite languages are in k-TSS() if k

is large enough! k-TSS() [k+1]-TSS() (bak)* [k+1]-TSS() (bak)* k-TSS()

Page 30: Learning from Text

Zadar, August 2010

30

Cdlh 2010

A language that is not k-testable

b

aa

ba

a

Page 31: Learning from Text

Zadar, August 2010

31

Cdlh 2010

K-TSS inference

Given a sample S, L(ak-TSS(S))= Zk where Zk=((S), I(S), F(S), T(S), C(S) ) and (S) is the alphabet used in S C(S)=(S)<kS I(S)=(S)k-1Pref(S) F(S)= (S)k-1Suff(S) T(S)=(S)k {v: uvwS}

Page 32: Learning from Text

Zadar, August 2010

32

Cdlh 2010

Example

S={a, aa, abba, abbbba} Let k=3

(S)={a, b} I(S)= {aa, ab} F(S)= {aa, ba} C(S)= {a , aa} T(S)={abb, bbb, bba}

L(a3-TSS(S))= ab*a+a

Page 33: Learning from Text

Zadar, August 2010

33

Cdlh 2010

Building the corresponding automaton

Each string in IC and PREF(IC) is a state Each substring of length k-1 of strings in T is a

state is the initial state Add a transition labeled b from u to ub for each

state ub Add a transition labeled b from au to ub for

each aub in T Each state/substring that is in F is a final state Each state/substring that is in C is a final state

Page 34: Learning from Text

Zadar, August 2010

34

Cdlh 2010

Running the algorithm

S={a, aa, abba, abbbba}

I={aa, ab}

F={aa, ba}

T={abb, bbb, bba}C={a, aa}

a

ab

babb

aaa

b

b

b

a

a

a

ab

babb

aa

Page 35: Learning from Text

Zadar, August 2010

35

Cdlh 2010

Properties (1)

S L(ak-TSS(S))

L(ak-TSS(S)) is the smallest k-TSS language that contains S If there is a smaller one, some prefix, suffix

or substring has to be absent

Page 36: Learning from Text

Zadar, August 2010

36

Cdlh 2010

Properties (2)

ak-TSS identifies any k-TSS language in the limit from polynomial data Once all the prefixes, suffixes and

substrings have been seen, the correct automaton is returned

If YS, L(ak-TSS(Y)) L(ak-TSS(S))

Page 37: Learning from Text

Zadar, August 2010

37

Cdlh 2010

Properties (3)

L(ak+1-TSS(S)) L(ak-TSS(S))

In Ik+1 (resp. Fk+1 and Tk+1) there are less allowed prefixes (resp. suffixes or substrings) than in Ik (resp. Fk and Tk)

k>maxxSx, L(ak-TSS(S))= S Because for a large k, Tk(S)=

Page 38: Learning from Text

Zadar, August 2010 38

4 Learning k-reversible languages from text

D. Angluin. Inference of reversible languages. Journal of the Association for Computing Machinery, 29(3):741–765, 1982

Page 39: Learning from Text

Zadar, August 2010

39

Cdlh 2010

The k-reversible languages

The class was proposed by Angluin (1982) The class is identifiable in the limit from text The class is composed by regular languages

that can be accepted by a DFA such that its reverse is deterministic with a look-ahead of k

Page 40: Learning from Text

Zadar, August 2010

40

Cdlh 2010

Let A=(, Q, , I, F) be a NFA, we denote by AT=(, Q, T, F, I) the

reversal automaton with:

T(q,a)={q’Q: q(q’,a)}

Page 41: Learning from Text

Zadar, August 2010

41

Cdlh 2010

0 1

3

b2

4

a

ba

a a a

0 1

3

b2

4

a

ba

a a a

A

AT

Page 42: Learning from Text

Zadar, August 2010

42

Cdlh 2010

Some definitions

u is a k-successor of q if │u│=k and (q,u)

u is a k-predecessor of q if │u│=k and T(q,uT)

is 0-successor and 0-predecessor of any state

Page 43: Learning from Text

Zadar, August 2010

43

Cdlh 2010

0 1

3

b2

4b

a

a a a

A

aa is a 2-successor of 0 and 1 but not of 3

a is a 1-successor of 3 aa is a 2-predecessor of 3 but not of

1

a

Page 44: Learning from Text

Zadar, August 2010

44

Cdlh 2010

A NFA is deterministic with look-ahead k if q,q’Q: qq’

(q,q’I) (q,q’(q”,a))

(u is a k-successor of q) (v is a k-successor of q’) uv

Page 45: Learning from Text

Zadar, August 2010

45

Cdlh 2010

Prohibited:

2

1

a

a

u

u

│u│=k

Page 46: Learning from Text

Zadar, August 2010

46

Cdlh 2010

Example

This automaton is not deterministic with look-ahead 1 but is deterministic with look-ahead 2

0 1

3

b2

4

a

ba

a a a

Page 47: Learning from Text

Zadar, August 2010

47

Cdlh 2010

K-reversible automata

A is k-reversible if A is deterministic and AT is deterministic with look-ahead k

Example

0 1

b

2ba

a

b

0 1

b

2ba

a

bdeterministic deterministic with look-ahead 1

Page 48: Learning from Text

Zadar, August 2010

48

Cdlh 2010

Notations

RL(, k) is the set of all k reversible languages over alphabet

RL() is the set of all k-reversible languages over alphabet (ie for all values of k)

ak-RL is the learning algorithm we describe

Page 49: Learning from Text

Zadar, August 2010

49

Cdlh 2010

Properties

There are some regular languages that

are not in RL()

RL(,k) RL(,k-1)

Page 50: Learning from Text

Zadar, August 2010

50

Cdlh 2010

Violation of k-reversibility

Two states q, q’ violate the k-reversibility condition if

they violate the deterministic condition: q,q’(q”,a)

or they violate the look-ahead condition:

q,q’F, uk: u is k-predecessor of both q and q’

uk, (q,a)=(q’,a) and u is k-predecessor of both q and q’

Page 51: Learning from Text

Zadar, August 2010

51

Cdlh 2010

Learning k-reversible automata

Key idea: the order in which the merges are performed does not matter!

Just merge states that do not comply with the conditions for k-reversibility

Page 52: Learning from Text

Zadar, August 2010

52

Cdlh 2010

K-RL algorithm (ak-RL)

Data: kℕ, S sample of a k-RL language L

A0=PTA(S) ={{q}:qQ}While B,B’ k-reversibility violators do

= -B-B’ {BB’}A=A0/

Page 53: Learning from Text

Zadar, August 2010

53

Cdlh 2010

K-RL Algorithm (ak-RL)

Data: kℕ, S sample of a k-RL language LA=PTA(S)While q,q’ k-reversibility violators do

A=merge(A,q,q’)

Page 54: Learning from Text

Zadar, August 2010

54

Cdlh 2010

Let S={a, aa, abba, abbbba}

a

ab abb

aa

abbbbabbb abbbba

abba

a

b b b b a

a

a

k=2

Violators, for u= ba

Page 55: Learning from Text

Zadar, August 2010

55

Cdlh 2010

S={a, aa, abba, abbbba}

a

ab abb

aa

abbbbabbb

abba

a

b b b b

a

a

a

k=2

Violators, for u= bb

Page 56: Learning from Text

Zadar, August 2010

56

Cdlh 2010

S={a, aa, abba, abbbba}

a

ab abb

aa

abbb

abbaa

b b b

b

a

a

k=2

Suppose k=1. Then now a, aa and abba violate.

Page 57: Learning from Text

Zadar, August 2010

57

Cdlh 2010

Properties (1)

k0, S, ak-RL(S) is a k-reversible language

L(ak-RL(S)) is the smallest k-reversible language that contains S

The class RL(, k) is identifiable in the limit from text

Page 58: Learning from Text

Zadar, August 2010

58

Cdlh 2010

Properties (2)

Any regular language is k-reversible iff (u1v)-1L (u2v)-1L and │v│=k

(u1v)-1L=(u2v)-1L

(if two strings are prefixes of a string of length at least k, then the strings are

Nerode-equivalent)

Page 59: Learning from Text

Zadar, August 2010

59

Cdlh 2010

Properties (3)

L(ak-RL(S)) L(a(k-1)-RL(S))

RL(, k) RL(, k-1)

Page 60: Learning from Text

Zadar, August 2010

60

Cdlh 2010

Properties (4)

The time complexity is O(k║S║3)

The space complexity is O(║S║)

Page 61: Learning from Text

Zadar, August 2010

61

Cdlh 2010

Properties (4) Polynomial aspects

Polynomial characteristic sets Polynomial update time But not necessarily a polynomial

number of mind changes

Page 62: Learning from Text

Zadar, August 2010

62

Cdlh 2010

Extensions

Sakakibara built an extension for context-free grammars whose tree language is k-reversible

Marion & Besombes propose an extension to tree languages

Different authors propose to learn these automata and then estimate the probabilities as an alternative to learning stochastic automata

Page 63: Learning from Text

Zadar, August 2010

63

Cdlh 2010

Exercises

Build a language L that is not k-reversible, k0

Prove that the class of all k-reversible languages is not learnable from text

Run ak-RL on S={aa, aba, abb, abaaba, baaba} for k=0,1,2,3

Page 64: Learning from Text

Zadar, August 2010

64

Cdlh 2010

Solution (idea)

Lk={ai: ik}

Then for each k: Lk is k-reversible but not k-1 reversible.

And ULk = a*

So there is an accumulation point…

Page 65: Learning from Text

Zadar, August 2010

65

Cdlh 2010

6 Conclusions

Window languages

Page 66: Learning from Text

Zadar, August 2010

66

Cdlh 2010

Exercise (1)

Let Jn={w*: wn} And J=U{Jn} Find an algorithm that identifies J in the

limit from text Prove that this algorithm works in

polynomial update time Prove that it admits a polynomial locking

sequence (characteristic set) Prove that the algorithm does not meet

Yokomori’s conditions

Page 67: Learning from Text

Zadar, August 2010

67

Cdlh 2010

Exercise (2)

Let Bn,w={u*: dedit(u,w)n}

And B=U{Bn,w}

Find an algorithm that identifies B in the limit from text.

Does your algorithm meet Yokomori’s conditions?