an index of data size to extract decomposable structures in lad hirotaka ono mutsunori yagiura...
TRANSCRIPT
![Page 1: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/1.jpg)
An Index of Data Sizeto Extract Decomposable Structures in LAD
Hirotaka Ono
Mutsunori Yagiura
Toshihide Ibaraki
(Kyoto University)
![Page 2: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/2.jpg)
Overview1. Overview of LAD2. Decomposability
- Importance & motivation3. An index of decomposability
- #data vectors needed to extract reliable decomposable structures
- Based on probabilistic analyses4. Numerical experiments5. Conclusion
![Page 3: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/3.jpg)
Logical Analysis of Data (LAD)
Input:
Output: discriminant function
nFT }1 ,0{ ,
Fx
Txxf
for 0
for 1 )(
T: positive examples (the phenomenon occurs)F: negative examples (the phenomenon does not occur)
f(x): a logical explanation of the phenomenon
For a phenomenon
![Page 4: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/4.jpg)
Example: influenzaFever Headache Cough Snivel Stomachache
1 1 0 1 1
1 0 1 1 1
1 1 1 1 0
1 0 0 1 1
1 1 0 0 0
0 1 0 1 1
T
F
: Set of patients having influenza: Set of patients having common coldF
T
An example of discriminant functions: 431421)( xxxxxxxf
1=Yes, 0=No
5x4x3x1x 2x
Discriminant function f (x) represents knowledge “influenza”.
One kind of knowledge acquisition
![Page 5: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/5.jpg)
Guideline to find a discriminant function
• Simplicity• Explain the structure of the phenomenon
![Page 6: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/6.jpg)
x1 x2 x3 x4 x5 h(x[S1])
T
1 1 0 1 1 1
1 0 1 1 1 1
1 1 1 1 0 1
F
1 0 0 1 1 0
1 1 0 0 0 1
0 1 0 1 1 1
Decomposability
S0 {1, 4, 5}
h(x[S1]) x2 x3
f (x) x1x2x4 x1x3x4
x1x4 h(x[S1])
decomposable!
S1 {2, 3}
f is decomposable f (x) g(x[S0], h(x[S1]))
(T, F) is decomposable decomposable discriminant f
![Page 7: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/7.jpg)
Example: concept of “square”
i 1 1 1 0
ii 1 1 1 1
iii 0 1 1 0
iv 1 0 0 1
v 1 1 0 1
1x 2x 3x 4x
1x : the lengths of all edges are equal2x : the number of vertices is 43x : contains a right angle4x : the area is over 100
T
F iii
iv
i ii
v
![Page 8: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/8.jpg)
Example: concept of “square”Square
- the lengths of all edges are equal- the number of vertices is 4
- contains a right angle
- contains a right angle
Square
- rhombus
- the lengths of all edges are equal- the number of vertices is 4
![Page 9: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/9.jpg)
Hierarchical structures and decomposable structures
Concept
attribute attributeattributeattributeattributeattributeattribute
)(xf
![Page 10: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/10.jpg)
Hierarchical structures and decomposable structures
Concept
attribute attributeattributeattribute
attributeattributeattribute
]))[(],[()( 10 SxhSxgxf
Sub-Concept
])[( 1Sxh)(xf
0S
1S
![Page 11: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/11.jpg)
Previous research on decomposability
]))[(],[( 10 SxhSxg),( FT
• Finding basic decomposable functions (e.g, ) for given and attribute sets
• case: polynomial time [Boros, et al. 1994]
• Finding other classes (positive, Horn, and their mixtures ) of decomposable functions for and attribute set
[Makino, et al. 1995]
• Finding a (positive) decomposable function for given ( is not given)
• NP-hard • proposing a heuristic algorithm [Ono, et al. 1999]
),( FT
]))[(],[( 10 SxhSxg
]))[(],[( 10 SxhSxg),( 10 SS),( FT
![Page 12: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/12.jpg)
The number of data and decomposable structures
• Case 1: The size of given data is small.– Advantage:
Less computational time is needed to find a decomposable structure.
– Disadvantage:Decomposable structures easily exist in data(because of less constraints)= Most decomposable structures are deceptive.
![Page 13: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/13.jpg)
The number of data and decomposable structures
• Case 2: The size of given data is large.– Advantage:
Deceptive decomposable structures will not be found.
– Disadvantage:More computational time is needed.
How many data vectors should be prepared
to extract real decomposable structures?
Index of decomposability
![Page 14: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/14.jpg)
(T, F) is decomposable conflict graph of (T, F) is bipartite
Overview of our approach
Assume that (T, F) is the set of l randomly chosen vectors from {0, 1}n.
1. Compute the probability of an edge to appear in the conflict graph
2. Regard the conflict graph as a random graph
Investigate the probability of the conflict graph to be non-bipartite
![Page 15: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/15.jpg)
Conflict graph
1 0 0 1 1
0 1 0 1 0
1 0 0 1 0
1 0 0 0 1
0 1 0 1 1
0 1 0 0 1
0S 1S
T
F00
01
11
10
Conflict graph
1)11( Suppose h
])[( 1Sxh0)01( h 1)10( h
0)11( h
(T, F) is decomposable conflict graph of (T, F) is bipartite
![Page 16: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/16.jpg)
Probability of an edge to appear in conflict graph
0S 1S
T
F yy
a
b
a
b
graph.conflict in the appears ),( Edge bae ),( byay There exists a linked pair .
. and
or , and
TbyFay
FbyTay
A pair of vectors is called linked if ),( byay
![Page 17: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/17.jpg)
otherwise. 0
linked, is ),( 1 byayX ey
0}1,0{ Sy
eye XX
1eX
Define a random variable by
where
edge appears in the conflict graph.
We want to compute .
eX
1Pr)1Pr(0}1,0{ Sy
eye XX
graph.conflict in the appears ),( Edge bae ),( byay There exists a linked pair .
e
![Page 18: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/18.jpg)
Assumptions
• Generation of (T, F)
- |T| + |F| = l vectors are randomly sampled from {0, 1}n without replacement.
- A sampled vector is in T with probability p, and in F with probability q 1 p.
• M 2n
• || 02 Sm
![Page 19: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/19.jpg)
How to compute
1Pr)1Pr(0}1,0{ Sy
eye XX
)1Pr( eyX is easier to compute.
1. Both of2. They have different values (i.e., 0 and 1).
. in chosen are and FTbyay
)),(( 1 baeX ey
)1(
)1(2)1Pr(
MM
llpqX ey
2. 1.
![Page 20: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/20.jpg)
Upper and lower bounds on
)1Pr( eX
)1(
)1(2)1Pr(
MM
llpqX ey
By Markov’s inequality and linearity of expectation,
)1(
)1(2)1Pr(Ex
ExEx)Ex( 00 }1,0{}1,0{
MM
llpqmXmXm
XXX
eyey
yey
yeye
SS
)1Pr( )1Pr( 00 }1,0{',
'}1,0{
SS yyeyey
yey XXX
By the principle of inclusion and exclusion,
)1Pr( eX
Upper Bound
Lower Bound
)1Pr( eX
![Page 21: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/21.jpg)
Approximation of )1Pr( eX
2
2
2)1(
)1(2
M
lpqm
MM
llpqm
)1Pr( eX
holds. )1Pr( , smallFor eX
![Page 22: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/22.jpg)
Random graph
r 1r 0r
rIn our analysis, is assumed to be the probability of an edge to appear in the conflict graph.
Random graph G(N, r)
- N: the number of vertices
- Each edge e (u, v) appears in G(N, r)
with probability r independently
![Page 23: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/23.jpg)
Probability of a random graph to be non-bipartite
Yodd: Random variable representing the number of odd cycles in G(N, r)Pr(Yodd 1): Probability that G(N, r) is not bipartite
odd :
3oddodd 2
Ex 1Pr
kNk
kk
rk
NYY
Markov’s inequality
)1()1( kNNNN k The number of sequences of k vertices
![Page 24: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/24.jpg)
k
kk
k
kNk
kk
rk
Nr
k
NY
odd :
3odd :
3odd 22
Ex
zz
z
1
1ln
2
1
2
1
Taylor series of ln(1 z))10( zNrz
)(zU
)(zU
hold? 1)Ex( doesWhen odd Y
Upper bound:
1 Ex 9950.0 odd YNr
![Page 25: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/25.jpg)
)1( ln42)(
1
2 Ex ε5.0
odd :3
odd :3
odd
ε5.0
ONc
k
c
Nrk
rNY
N
kk
k
kNk
kk
Lower bound when Nr 1:
1 if as Ex odd NrNY
For sufficiently large N, 1 1Ex odd NrY
(c [0, 1) and (0, 0.5) are constants)
1 Ex 9950.0 odd YNr
hold? 1)Ex( doesWhen odd Y
![Page 26: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/26.jpg)
Assumptions
Our index
2
2
2)1Pr(M
lpqmX e
Probability of an edge to appear in conflict graph
Threshold for a random graphto be bipartite or not
1Nr
nM 2 || 02 Sm |||| FTl
)1(Pr and 2 || 1 eS XrN
1
2
2
2||
22)1Pr(21 1
ne
S lpq
M
lpqm
m
MXNr
pql n /2 1
- probabilities p and q are given by p : q |T| : |F|
- conflict graph is a random graph
(|S0| |S1| n)
![Page 27: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/27.jpg)
Our index
pqFT n /2 1
• If , tends to have many deceptive decomposable structures.
• If tends to have no deceptive decomposable structure.
pqFT n /2 |||| 1
,/2 |||| 1 pqFT n ) ,( FT
) ,( FT
1 ,:: qpFTqp
![Page 28: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/28.jpg)
Numerical Experiments
1. Prepare non-decomposable randomly generated functions and construct 10 for each data size ( )
2. Check their decomposability
Randomly generated data Target functions are not decomposable Dimensions of data are n 10, 20 Two types of data:
are biased and not biasedqp and
|||| FT ),( FT
![Page 29: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/29.jpg)
Randomly generated data
)5.0 ,5.0() ,( ,10 qpn our index
Sampling ratio (%)
Rat
io o
f de
com
posa
ble
(T, F
)s (
%)
![Page 30: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/30.jpg)
Randomly generated data
)1.0 ,9.0() ,( ,10 qpn )5.0 ,5.0() ,( ,20 qpn
Sampling ratio (%) Sampling ratio (%)
Rat
io o
f de
com
posa
ble
(T, F
)s (
%)
Rat
io o
f de
com
posa
ble
(T, F
)s (
%)
our index
![Page 31: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/31.jpg)
Breast Cancer in Wisconsin (a.k.a BCW) Already binarized The dimension is n 11 Comparison with randomly generated data wit
h the same n, p and q
Real-world data
![Page 32: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/32.jpg)
BCW and randomly generated data
)270.0 ,730.0() ,( ,11 qpnBCW Randomly generated data
Sampling ratio (%) Sampling ratio (%)
Rat
io o
f de
com
posa
ble
(T, F
)s (
%)
Rat
io o
f de
com
posa
ble
(T, F
)s (
%)
our index
![Page 33: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/33.jpg)
Discussion and conclusion
1 ,:: /2 1 qpFTqppqFT n
An index to extract reliable decomposable structures
Computational experiments on random & real-world data
- proposed index is a good estimate
- |S0| 1 or |S1| 2 threshold behavior is not clear
![Page 34: An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)](https://reader035.vdocuments.net/reader035/viewer/2022070403/56649f2c5503460f94c46fc5/html5/thumbnails/34.jpg)
Future workAnalyses on sharpness of the threshold behavior:
to know sufficient |T| + |F| to extract reliable decomposable structures
Apply similar approach to other classes of Boolean functions
|T| |F|
#dec
ompo
sabl
e
st
ruct
ures
proposed index
we want to estimate