![Page 1: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/1.jpg)
1
Property testing and learning on strings and trees
Michel de Rougemont
University Paris II & LRI
Joint work with E. Fischer, Technion,
F. Magniez, LRI
![Page 2: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/2.jpg)
2
1. Testers and Correctors on a class K
2. Tester for regular words and regular trees with the Edit Distance with Moves
3. Detailed proof of a key result (u.stat captures the distance)
4. Application to learning regular properties
Property testing
![Page 3: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/3.jpg)
3
Let F be a property on a class K of structures U
An ε -tester for F is a probabilistic algorithm A such that:• If U |= F, A accepts• If U is ε far from F, A rejects with high probability • Time(A) independent of n.
Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994O. Goldreich, S. Goldwasser and D. Ron, Property Testing and its connection to Learning and
Approximation, 1996.
Tester usually implies a linear time corrector.
(ε1, ε2)-Tolerant Tester
1. Testers on a class K
![Page 4: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/4.jpg)
4
1. Satisfiability : T |= F
2. Approximate Satisfiability T |= F
3. Approximate Equivalence
Image on a class K of trees
F F F
F defar -
Approximate Satisfiability and Equivalence
GF
G
![Page 5: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/5.jpg)
5
History of Testers
Self-testers and correctors for Linear Algebra ,Blum & Kanan 1989
Robust characterizations of polynomials, R. Rubinfeld, M. Sudan, 1994
Testers for graph properties : k-colorability, Goldreich and al. 1996
Regular languages have testers, Alon et al. 2000s
Testers for Regular tree languages , Mdr and Magniez, 2004
Charaterization of testable properties on graphs, Alon et al. 2005
New areas: Sublinear algorithms, Approximation of decision problems
2
![Page 6: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/6.jpg)
6
1. Distance d’Edition: Insertions, Effacements, Modifications
2. Distance Edition avec déplacements:
0111000011110011001
0111011110000011001
3. Distance Edition avec déplacements se généralise aux arbres ordonnés
2. Edit Distance with moves
'( , ') ; ( , ) ( , ')
W Ldist W W dist W L Min dist W W
![Page 7: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/7.jpg)
7
Uniform Statistics
W=001010101110 longueur n, n-k+1 blocs de longueur k=1/ε
1
1.#....
#)(.
2
1
knn
nWstatu
k
...."00...1" ofnumber #"00...0" ofnumber #
2
1
nn
"11...1" ofnumber #
....2kn
Pour k=2, n-k+1=11
1
4 1. ( ) . . ( )
4 11
2
u stat W u stat W
( , ') . ( ) . ( ') ,
dist W W u stat W u stat W proche,longueur desont mots les lorsque
Distance de mots: • NP-complet• Testable, O(1): échantillonner N sous-mots de longueur k: Y(W) et Y(W’) Si |Y(w)-Y(w’)| <ε. accepter, sinon rejeter
![Page 8: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/8.jpg)
8
Tester for a regular language
W: 0000000000111111111111Y: 000001000011111101111Z: 1111111111110000000000
T: 01001010001011000111010101
a b
0
1
1H A
0.5 / 2
/ 2. ( ) . ( ) . ( )
/ 2
0.5 / 2
u stat W u stat Z u stat Y
0001
1000
)(.25,025,025,025.0
Tstatu
T YW
Z
Automate A définit L, et un polytope H dans l’espace des u.stats
Testeur x dans L: • Testable, O(1): calculer Y(W),
• Si dist(Y(w),H) <ε. accepter, sinon rejeter Remarque: robustesse au bruit.
![Page 9: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/9.jpg)
9
Pair (A,H)
Blocs, k=2, m=4, | Σ |=4, | Σ| k +1=17:
Boucles de taille 1 bloc: {(aa,ca:1),(bb,2),(cc,ac:3),(dd:4)}
1 2
3 4
a
b
b
ca
cd
d
aa ca
H A
ac cc
bb
dd
![Page 10: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/10.jpg)
10
Corrector of a regular language
Y: 000001000011111101111 est ε -proche de L(A)
Correction déterministe:1. Décomposition en sous-mots admissibles
000001000011111101111 000001 000111111 1111 2. Décomposition en composantes connexes
000001 000 111111 11113. Recomposition (déplacements)
000 000001 111111 1111 distance 3 de Y
a b
0
1
1
A
![Page 11: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/11.jpg)
11
Corrector of an ordered tree
2 moves, dist=2
Automate d’arbre ou DTD: t: l,r r: l,r
![Page 12: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/12.jpg)
12
XML Corrector: http://www.lri.fr/~mdr/xml/
![Page 13: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/13.jpg)
13
Applications
Testers: • Estimate the distance between two XML files,• Décide if an XML F is ε-valid,• Décide if two DTDs are close.
Correctors: If an XML file F is ε-close from a DTD,• Find a valid F’ ε-close to F; • Rank XML files for a set of DTD’s (supervised learning)
Program Verification:• Decide if two automata are ε-close in polynomial time.• Approximate Model-Checking: http://www.lri.fr/~mdr/vera/
• Specification language• Model • Distance
![Page 14: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/14.jpg)
14
3. Block and Uniform statistics
W=001010101110 length n, b.stat: consecutive subwords of length k, n/k blocksu.stat: any subwords of length k, n-k+1 blocks
1401
61)(.
Wstatb
#....
#
/1)(.
2
1
kn
n
knWstatb ....
"00...1" ofnumber #"00...0" ofnumber #
2
1
nn
"11...1" ofnumber #
....2kn
For k=2, n/k=6 2
441
111)(.
Wstatu
1)'(.)(. :studyMain WstatuWstatu
1k
![Page 15: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/15.jpg)
15
Tester for equality of strings
Edit distance with moves. NP-complete problem, but approximable in constant time with additive error.
Uniform statistics ( ): W=001010101110
Theorem 1. |u.stat(w)-ustat(w’)| approximates dist(w,w’) .
Sample N subwords of length k, compute Y(w) and Y(w’):
Lemma (Chernoff). Y(w) approximates u.stat(w).
Corollary. |Y(w)-Y(w’)| approximates dist(w,w’) .
Tester: If |Y(w)-Y(w’)| <ε. accept, else reject.
1)(
...1
Ni
iXN
wY
0...010
iX
2441
111)(.
Wstatu
1)'(
...1
Ni
iXN
wY
1k
![Page 16: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/16.jpg)
16
Let F be a property on strings.
Soundness: ε-close strings have close statistics
Robustness: ε-far strings have far statistics
F is Equality on pairs of strings.For theorem 1, we prove:
1. b.stat is robust2. u.stat is sound3. u.stat is robust
Soundness and Robustness
.)',( nwwdist
.)',( nwwdist
![Page 17: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/17.jpg)
17
Robustness of b.stat
Robustness of b-stat: ).)'(.)(. .21()',( nwstatbwstatbwwdist
.)',( then )'(.)(. If nwwdistwstatbwstatb
)'()''( t.s. 'w'construct then )'(.)(. If wstatbwstatbwstatbwstatb
1401
61)(.
Wstatb
1302
61)'(.
Wstatb
in W' 3 andin W 4 "10" #but in W' 2 andin W 1"00"#
: Example on w. onssubstituti )'(.)(.2
most at after wstatbwstatb.n
"10" intoit change andin W "00" ofblock one take:'W'
![Page 18: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/18.jpg)
18
Soundness of u.stat
Soundness of u-stat:
Simple edit:
Move w=A.B.C.D, w’=A.C.B.D:
Hence, for ε2.n operations,
Remark: b.stat is not sound.Problem: robustness of u.stat ? Harder! We need an auxiliary distribution and two key lemmas.
.6)'(.)(. .)',( 2 wstatuwstatunwwdist
.2
12)'(.)(.
nknkwstatuwstatu
.6
1)1(3.2)'(.)(. nkn
kwstatuwstatu
.6)'(.)(. wstatuwstatu
![Page 19: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/19.jpg)
19
Statistics on words
k
k
Kt k-t
Block statistics: b.stat
Uniform statistics: u.stat
Block Uniform statistics: bu.stat
1k
)(. ii vstatbX )(. 11 vstatbX
1v iv
))(.())(.()(./,...1
vstatbEvstatbEnKwstatbu
Kniiti
. 2kcK
![Page 20: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/20.jpg)
20
Uniform Statistics
ABKnkbu )1).(1( : by missedk length of subwords#
., onsdistributi uniform twoand ALet : Lemma BA BA
BB
AB .2.Then BA
).
()(.)(. 4
/2
nOwstatbuvstatu
/2
3. ,1 with lemma previous Apply the
nKknB
.)(. )(. w 4
/2
nwstatuwstatbu
Lemma 2:
![Page 21: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/21.jpg)
21
Block Uniform Statistics
))(.())(.()(./,...1
vstatbEvstatbEnKwstatbu
Kniiti
1][0 ],)[(.][ ),(. uXuvstatbuXvstatbX iiiii
])[(. is on Average t.independen is ][Each uwstatbui uXi
2Kn-8
e]])[(.])[(.])[(.Pr[ : Bound Chernofft
uwstatbutuwstatbuuvstatb 2
Kn-8k
.e])(.)(.)(.Pr[ : BoundUnion t
wstatbutwstatbuvstatb 0]
2)(.)(.Pr[
2. tandn enough largeFor k
wstatbuvstatb
cw)dist(v, and 2
)(.)(. vw vstatbwstatbuLemma 1:
![Page 22: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/22.jpg)
22
Robustness of the uniform Statistics
Robustness of u-stat:
By Lemma 1:
By Lemma 2:
.5,6)'(. )(. .5)',( wstatuwstatunwwdist
2)(.)(. vw vstatbwstatbu
.)(. )(. w 4
/2
nwstatuwstatbu
w' w,from close v'Get v,
stat.u- of robustness impliesstat -b of Robustness
Tolerant tester:
Theorem: for two words w and w’ large enough, the tester:1. Accepts if w=w’ with probability 1 2. Accepts if w,w’ are ε2-close with probability 2/33. Rejects if w,w’ are ε-far with probability 2/3
..5)',( ).)'(.)(. .21( :bstat of Robustness nwwdistnwstatbwstatb
.5)'( )( ifAccept ),O(cN wYwY
![Page 23: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/23.jpg)
23
Membership and Equivalence tester
Membership Tester for w in L (regular):1. Construction of the tester: Precompute Hε 2. Tester: Compute Y(w) (approx. b.stat(w)). Accept iff Y(w) is at distance less than ε to Hε
Construction: Time is Tester: query complexity in time complexity inRemark 1: Time complexity of previous testers was exponential in m.Remark 2: The same method works for L context-free.
Tester of 1. Compute Hε,A and Hε,B
2. Reject if Hε,A and Hε,B are different.
Time polynomial in m=Max(|A |, |B |):
BA
O(k).
m
O(k)
O(k).
m
2O(k).
![Page 24: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/24.jpg)
24
4. Application to learning
Model: take random words according to a distribution D:
U.stat representation:
Negative examples could include the distance.
Learning algorithm: convex hulls of positive examples.
![Page 25: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/25.jpg)
25
PAC learning
The regular language is a polytope for u.stat.
Polytopes have a finite VC dimension. Hence they are PAC learnable.
Problem: the learnt concept may be ε-far from the language L.
For special distributions D, it may be ε-close. Example: D is uniform and the polytopes are « large ».
![Page 26: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/26.jpg)
26
Conclusion
1. Tester for the Edit Distance with Moves 2. Tester for membership to a regular set3. Equivalence tester for automata
• Polynomial time approximate algorithm (PSPACE-complete)• Generalization to Buchi automata : approximate Model-
Checking• Context-Free Languages: exponential algorithm (undecidable
problem)
4. PAC learning versus dist-Learning
![Page 27: 1 Property testing and learning on strings and trees Michel de Rougemont University Paris II & LRI Joint work with E. Fischer, Technion, F. Magniez, LRI](https://reader035.vdocuments.net/reader035/viewer/2022070305/55141ff65503466d1a8b47ec/html5/thumbnails/27.jpg)
27
Generalizations
Buchi Automata. Distance on infinite words:Two words are ε-close if
A word is ε-close to a language L if there exists w’ in L s. t. W and w’ are ε-close.
Statistics: set of accumulation points of
H: compatible loops of connected components of accepting states
Tester for Buchi Automata: Compute HA and HB
Reject if HA and HB are different.
Equivalence of CF grammars is undecidable, Approximate equivalence in exponential.
(n))w'dist(w(n), lim sup n
w(n))(. nstatb