semantic pattern transformation
Post on 13-Sep-2014
172 views
DESCRIPTION
Talk at IKNOW 2013, describing the Semantic Pattern Transformation. This process transforms feature vectors, which are commonly used in machine learning into a semantic representation. The advantage is that we can use this model across all domains, which is not possible for the raw feature vectors without cumbersome preprocessing operations.TRANSCRIPT
IAIK
Semantic Pattern Transformation
IKNOW 2013Peter Teufl, Herbert Leitold, Reinhard Posch
IAIK
Our BackgroundTopics
Mobile device security
Cloud security
Security consulting for public insititutions (Austria)
IT security research
IT security lectures
e-GovernmentA-SIT
IAIK
Why does he talk about Knowledge Discovery?
How does IT security relate to knowledge discovery?
eGov - eParticipation: document analysis, twitter etc.
intrusion detection systems (network traffic analysis)
malware detection (network traffic, mobile phones)
mobile application analysis (metadata, market descriptions)
mobile application security (hot topic, BYOD, etc.)
IAIK
What to expect?
Motivation for the Semantic Pattern Transformation
Basic concepts, techniques
How does it work? Evaluation?
Applications, results, current topics!
IAIK
EnvironmentArbitrary features
No apriori knowledge
Heteregenous domainsClustering
Supervised learning
Anomaly Detection
Semantic search
VisualizationExtracting knowledge
Text analysis
Android market descriptionshistograms
flexible deployment
new domains
termsnumbers
IAIK
Process...•Different processing steps
•From defining the goals
•To extracting the desired knowledge
•Machine learning algorithms are often used within KDD
•However, the complete machine learning process is quite similar to KDD
Knowledge discovery goals
Target data set
Preprocessing
Data extraction
Data mining method
Data mining algorithm
Knowledge extraction
Data mining
Knowledge processing
Fayyad et al. Machine learning
Domain-specific data set
KDTMachine learning
goals
Instance extraction
Feature selection, construction
Instance selection
Machine learning algorithm
Preprocessing
Algorithm application
Interpretation
ML-KDT
IAIK
ADAPTATION COMPLEXITY?•Assuming an arbitrary data-set (e-Participation,
Android Market applications)
•Further assuming: a knowledge discovery goal: e.g., unsupervised clustering
•Then: we need to adapt the steps on the left
•And: We need to adapt this setup when the data changes, even when the knowledge discovery goals remain the same!
•Android Market applications vs. text documents vs. network traffic vs. malware detection?
Domain-specific data set
Machine learning goals
Instance extraction
Feature selection, construction
Instance selection
Algorithm selection
Preprocessing
Algorithm application
Interpretation
Machine Learning
High
Dependence on domain data and goals
Medium Low
IAIK
TOWARDS A SEMANTIC REPRESENTATION•Finding a new representation...
•New representation is called Semantic Patterns
•Key properties:
•Still a vector representation (compatible to old representation)
•Not the feature values themselves, but their semantic relations are represented
•All values have the same meaning and feature type (activation)
•Transformation from raw data into Semantic Patterns:Semantic Pattern Transformation
IAIK
SEMANTIC PATTERN TRANSFORMATION•The Semantic Pattern Transformation is arranged
in five layers
•Layer 1 - Feature extraction
•Layer 2 - Associative network - Node generation
•Layer 3 - Associative network - Link generation
•Layer 4 - Spreading activation (SA)
•Layer 5 - Analysis (machine learning, semantic search etc.)
Data set
Relation
FROM TO TIMEFROM TO TIME
FROM TO TIME SF 2Instance SF 1 DF 1 DF 2SF 2
SV
MV
SVSV
SV
MV
SV
MV
MV
P 1
P 3 P 4
P 2
Supervised learning
Unsupervised clustering
Semantic relations
Feature value relevance
Anomaly detection
Semantic development over
timePattern similarity
Layer 1Feature Extraction
Layer 2 - 3Associative Network Generation
Layer 4Spreading Activation
Layer 5Analysis
SF 2
Instances
MapMap
Map
IAIK
SPT: Layer 1 - Feature extraction
Extract features, their values and determine the type(categorical, distance-based)
Categorical: Exports
Distance-based: Unemployment rate, fertility rate
Country Exports Unemployment rate Fertility rateC1 coffee 20% 5C2 cacao 20% 5C3 coffee, cacao 20% 5C4 machinery 5% 2C5 chemicals 5% 2C6 chemicals, machinery 5% 2C7 chemicals, cacao 20% missing dataC8 missing data 20% 5C9 coffee, cacao missing data missing data
IAIK
SPT: Layer 2 - Node generation
20%
5%
coffee
cocoa
machinery
chemicals
5
2
Country Exports Unemployment rate Fertility rateC1 coffee 20% 5C2 cacao 20% 5C3 coffee, cacao 20% 5C4 machinery 5% 2C5 chemicals 5% 2C6 chemicals, machinery 5% 2C7 chemicals, cacao 20% missing dataC8 missing data 20% 5C9 coffee, cacao missing data missing data
Categorical feature values:
one node for each value
Distance-based feature values: map value ranges to single nodes
Associative network
IAIK
SPT: Layer 3 - Link generation
0.25
0.75
0.5
Link Weight
1.00
20%
5
5%
coffee
cocoa
machinery
chemicals
2
Country Exports Unemployment rate Fertility rateC1 coffee 20% 5C2 cacao 20% 5C3 coffee, cacao 20% 5C4 machinery 5% 2C5 chemicals 5% 2C6 chemicals, machinery 5% 2C7 chemicals, cacao 20% missing dataC8 missing data 20% 5C9 coffee, cacao missing data missing data
coffee, 20%, 5
chemicals, cacao, 20%
IAIK
SPT: Layer 4 - Spreading activationCreating a Semantic Pattern: in this case for “coffee” and “cacao”
Set activation value of the two nodes to 1.0
Spread this activation value to neighboring nodes via the weighted links
20%5
5%
coffee
cocoa
machinery
chemicals
2
1.0
1.0
IAIK
SPT: Layer 4 - Spreading activationTypically, one would create Semantic Patterns for all instances within the data set
E.g. a pattern for C1 by activating coffee, 20% and 5
However, we can also create patterns for feature values: e.g. “coffee”Country Exports Unemployment rate Fertility rate
C1 coffee 20% 5C2 cacao 20% 5C3 coffee, cacao 20% 5C4 machinery 5% 2C5 chemicals 5% 2C6 chemicals, machinery 5% 2C7 chemicals, cacao 20% missing dataC8 missing data 20% 5C9 coffee, cacao missing data missing data
IAIK
SPT: Layer 4 - Spreading activation
After SA: each node in the network has an activation value
By representing the nodes and their activation values as a vector, we gaina Semantic Pattern
coffee cocoa machinery chemicals 20% 5% 5 2
0.00 0.08 0.38 0.300.00 0.001.151.15
cocoa
1.15
coffee
1.15
20%
0.385
0.30
chemicals
0.08
2
0.00
5%
0.00
machinery
0.00
IAIK
0
0.25
0.50
coffee cacao machinery chemicals 20% 5% 5 2
Export: CacaoUnsorted Semantic Pattern
0
0.25
0.50
coffee cacao machinery chemicals 20% 5% 5 2
Export: CoffeeUnsorted Semantic Pattern
0
0.25
0.50
coffee cacao machinery chemicals 20% 5% 5 2
Fertility: 2Unsorted Semantic Pattern
Country Exports Unemployment rate Fertility rateC1 coffee 20% 5C2 cacao 20% 5C3 coffee, cacao 20% 5C4 machinery 5% 2C5 chemicals 5% 2C6 chemicals, machinery 5% 2C7 chemicals, cacao 20% missing dataC8 missing data 20% 5C9 coffee, cacao missing data missing data
Each feature value is represented by a semantic fingerprint
Allows for an instant analysis of semantic relations to other feature values
Sort, mean, variance, adding, subtracting
IAIK
SPT: Layer 5 - AnalysisCalculating the distance between two patterns (Euclidean distance, Cosine similarity)
For unsupervised clustering, semantic-aware search algorithms
Keyword search for coffeeKeyword search for coffeeKeyword search for coffeeKeyword search for coffeeC1 coffee 20% 5C3 coffee, cacao 20% 5C9 coffee, cacao missing data missing data
Semantic aware search for coffeeSemantic aware search for coffeeSemantic aware search for coffeeSemantic aware search for coffeeC9 coffee, cacao missing data missing dataC1 coffee 20% 5C3 coffee, cacao 20% 5C2 cacao 20% 5C8 missing data 20% 5C7 chemicals, cacao 20% missing dataC5 chemicals 5% 2C6 chemicals, machinery 5% 2C4 machinery 5% 2
IAIK
SPT: Layer 5 - AnalysisMachine learning: apply any machine learning algorithm to the Semantic Patterns
Unsupervised clustering
Supervised learning
Semantic-aware search
Knowledge discovery: semantic relations, arbitrary procedures: mean, variance etc.
Anomaly detection, feature relevance, simple operations (variance, mean, etc.)
Visualization
IAIK
Benefits?Domain-specific data
set
Machine learning goals
Instance extraction
Feature selection, construction
Instance selection
Algorithm selection
Preprocessing
Algorithm application
Interpretation
Machine Learning
Domain-specific data set
Machine learning goals
Instance extraction
Feature selection, construction
Instance selection
Algorithm selection
Preprocessing
Algorithm application
Interpretation
High
Dependence on domain data and goals
Medium Low
Application in heterogeneous domains regardless of the nature of the data
Except for Layer 1, we do not need any manual setup for the layers
Regardless of the analyzed data, the Semantic Patterns always use the same model
This means: Regardless of the deployed knowledge discovery method, we can always use the same methods for knowledge extraction!
IAIK
Comparingthe two models
Country Coffee Cacao Machinery Chemicals 20% 5% 5 2C1 1.30 0.53 0.00 0.08 1.45 0.00 1.45 0.00C2 0.45 1.38 0.00 0.15 1.53 0.00 1.45 0.00C3 1.45 1.53 0.00 0.15 1.68 0.00 1.60 0.00C4 0.00 0.00 1.30 0.38 0.00 1.38 0.00 1.38C5 0.00 0.08 0.38 1.30 0.08 1.38 0.00 1.38C6 0.00 0.08 1.37 1.37 0.08 1.53 0.00 1.53C7 0.30 1.30 0.08 1.15 1.30 0.15 0.45 0.15C8 0.30 0.38 0.00 0.08 1.30 0.00 1.30 0.00C9 1.15 1.15 0.00 0.08 0.38 0.00 0.30 0.00
0
0.75
1.50
coffee cacao machinery chemicals 20% 5% 5 2
Mean pattern: C4, C5, C6Unsorted Semantic Pattern
0
1.00
2.00
coffee cacao machinery chemicals 20% 5% 5 2
Mean pattern: C1, C2, C3Unsorted Semantic Pattern
Country Coffee Cacao Machinery Chemicals Unemployment rate Fertility rateC1 1 0 0 0 20% 5C2 0 1 0 0 20% 5C3 1 1 0 0 20% 5C4 0 0 1 0 5% 2C5 0 0 0 1 5% 2C6 0 0 1 1 5% 2C7 0 1 0 1 20% missing dataC8 missing datamissing datamissing datamissing data 20% 5C9 1 1 0 0 missing data missing data
Same model: Android application, a country or a document... the activation values always have the same meaning
Semantic Patterns
Value-centric feature vectors
IAIK
Evaluation
26 data sets from the UCI machine learning repository
Supervised: SVM
Unsupervised: EM and k-Means
Application to raw data and to Semantic Patterns
Data set Label Inst DF SF Classes SVM (N) SVM (NN) SVM (P) KM (N) KM (NN) KM (P) EM (NN) EM (P)
Breast Cancer BCDermatology DEKR vs. KP KRLymph LYMushroom MUSoybean SOSplice SPVote VOZoo ZO
Anneal ANColic COCredit-A CACredit-G CGHeart-C HCHeart-H HHHepatitis HE
Breast-w BWDiabetes DIGlass GLHeart-Statlog HSIonosphere IOIris IRSegment SESonar SOVehicle VEVowel VO
SVMSVMSVM K-MeansK-MeansK-Means EMEMSP-Parameters: D=0.5, Comb=E, Norm=L, MDL=1.5, σ = 0.2SP-Parameters: D=0.5, Comb=E, Norm=L, MDL=1.5, σ = 0.2SP-Parameters: D=0.5, Comb=E, Norm=L, MDL=1.5, σ = 0.2SP-Parameters: D=0.5, Comb=E, Norm=L, MDL=1.5, σ = 0.2SP-Parameters: D=0.5, Comb=E, Norm=L, MDL=1.5, σ = 0.2SP-Parameters: D=0.5, Comb=E, Norm=L, MDL=1.5, σ = 0.2SP-Parameters: D=0.5, Comb=E, Norm=L, MDL=1.5, σ = 0.2SP-Parameters: D=0.5, Comb=E, Norm=L, MDL=1.5, σ = 0.2
CategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategorical286 9 2 0.03 0.04 0.04 0.01 0.01 0.06 0.00 0.08366 1 33 6 0.93 0.92 0.95 0.58 0.09 0.86 0.87 0.87
3196 36 2 0.75 0.75 0.72 0.00 0.01 0.00 0.04 0.00148 18 4 0.53 0.51 0.48 0.13 0.18 0.25 0.26 0.27
8124 22 2 1.00 1.00 1.00 0.48 0.47 0.45 0.61 0.59683 35 19 0.92 0.92 0.93 0.59 0.62 0.73 0.79 0.79
3190 60 3 0.71 0.72 0.80 0.03 0.03 0.44 0.41 0.31435 16 2 0.76 0.74 0.67 0.47 0.48 0.47 0.49 0.45101 17 7 0.94 0.94 0.97 0.78 0.78 0.82 0.82 0.85
TotalTotalTotalTotal 0.73 0.73 0.73 0.34 0.30 0.45 0.48 0.47MixedMixedMixedMixedMixedMixedMixedMixedMixedMixedMixedMixed
898 6 32 6 0.86 0.86 0.92 0.23 0.03 0.30 0.31 0.32368 7 15 2 0.31 0.32 0.31 0.13 0.03 0.05 0.10 0.12689 6 9 2 0.41 0.41 0.39 0.16 0.02 0.25 0.17 0.21
1000 7 13 2 0.11 0.10 0.12 0.01 0.01 0.00 0.01 0.02303 6 7 5 0.36 0.36 0.29 0.24 0.01 0.36 0.31 0.28294 6 7 5 0.32 0.31 0.33 0.27 0.01 0.32 0.28 0.25155 5 14 2 0.25 0.28 0.21 0.13 0.00 0.21 0.22 0.24
TotalTotalTotalTotal 0.37 0.38 0.37 0.17 0.02 0.21 0.20 0.20NumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumerical
699 9 2 0.78 0.78 0.77 0.73 0.74 0.82 0.72 0.58768 8 2 0.18 0.18 0.15 0.05 0.03 0.10 0.10 0.08214 9 7 0.30 0.30 0.50 0.34 0.39 0.33 0.37 0.36270 13 2 0.36 0.36 0.37 0.25 0.02 0.39 0.29 0.27351 34 2 0.48 0.48 0.50 0.12 0.12 0.16 0.25 0.25150 4 3 0.87 0.87 0.87 0.71 0.71 0.75 0.81 0.78
2310 19 7 0.88 0.88 0.90 0.61 0.53 0.59 0.62 0.60208 60 2 0.23 0.23 0.23 0.01 0.01 0.02 0.01 0.01846 18 4 0.51 0.51 0.48 0.11 0.19 0.19 0.10 0.19990 10 3 11 0.63 0.63 0.76 0.06 0.34 0.23 0.19 0.25
TotalTotalTotalTotal 0.52 0.52 0.55 0.30 0.31 0.36 0.35 0.34
IAIK
• Applications described in several publications, which analyze
• e-Participation (Egyptian revolution, Fukoshima, Mitmachen): text documents
• Intrusion detection: event correlation
• RDF data analysis (semantic web)
• WiFi privacy (analyzing captured emails)
• Android Market application analysis
DOES IT WORK?
IAIK
Current ProjectAndroid application security
Container applications for BYOD (require encryption, secure communication, key derivation functions, root checks etc.)
Manual analysis is cumbersome
Semantic Patterns
Extract Dalvik VM code, features (opcodes, methods, local variables etc.)
Apply Semantic Patterns technique
Clustering, supervised learning, anomaly detection etc.
IAIK
Current Project
IAIK
Current Project
Also works directly on the phone...
Detecting SMS catchers/sniffers
More fine grained detection
assymmetric cryptography
symmetric cryptography
IAIK
Outlook
Publish the Java API...
basically a converter from arbitrary feature vectors to Semantic Patterns (e.g. in/out in ARFF format)
Deep learning...
IAIK
Thx!
IAIK
IAIK
K-MeansPar
K-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-Means EMEMEMEMEMEMEMEMEMEMTotal BC DE KR LY MU SO SP VO ZO Total BC DE KR LY MU SO SP VO ZO
NNN
D 0.0
D 0.1D 0.3D 0.5D 0.7
D 0.1D 0.3D 0.5D 0.7
D 0.1D 0.3D 0.5D 0.7
D 0.1D 0.3D 0.5D 0.7
Raw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw Data0.341 0.012 0.584 0.004 0.131 0.475 0.587 0.031 0.467 0.782 Not availableNot availableNot availableNot availableNot availableNot availableNot availableNot availableNot availableNot available0.296 0.007 0.094 0.010 0.176 0.472 0.616 0.030 0.476 0.783 0.477 0.002 0.871 0.036 0.258 0.610 0.789 0.410 0.494 0.822
Semantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic Patterns0.443 0.025 0.849 0.003 0.199 0.413 0.728 0.465 0.493 0.814 0.449 0.004 0.767 0.001 0.222 0.590 0.740 0.423 0.489 0.801
Comb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=LComb=E Norm=L0.442 0.029 0.811 0.004 0.245 0.545 0.726 0.387 0.476 0.759 0.441 0.074 0.885 0.000 0.271 0.615 0.786 0.004 0.505 0.8260.447 0.068 0.846 0.004 0.241 0.482 0.724 0.424 0.476 0.758 0.460 0.079 0.875 0.001 0.258 0.592 0.788 0.250 0.449 0.8460.452 0.061 0.856 0.000 0.245 0.448 0.733 0.437 0.467 0.820 0.468 0.079 0.874 0.001 0.265 0.592 0.789 0.306 0.452 0.8500.422 0.069 0.826 0.000 0.209 0.275 0.728 0.419 0.463 0.804 0.465 0.079 0.874 0.001 0.252 0.579 0.799 0.312 0.445 0.847
Comb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=LComb=S Norm=L0.441 0.056 0.853 0.000 0.244 0.453 0.733 0.399 0.476 0.759 0.433 0.079 0.872 0.001 0.270 0.572 0.794 0.001 0.476 0.8290.434 0.075 0.820 0.000 0.228 0.411 0.718 0.431 0.472 0.750 0.466 0.079 0.881 0.001 0.280 0.592 0.802 0.298 0.437 0.8280.439 0.060 0.792 0.000 0.235 0.416 0.741 0.405 0.463 0.836 0.466 0.079 0.871 0.001 0.251 0.581 0.805 0.310 0.445 0.8480.422 0.067 0.798 0.000 0.224 0.364 0.726 0.376 0.462 0.782 0.462 0.087 0.875 0.001 0.254 0.580 0.776 0.292 0.445 0.845
Comb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=SComb=E Norm=S0.418 0.029 0.790 0.006 0.236 0.311 0.705 0.449 0.496 0.742 0.472 0.002 0.893 0.000 0.263 0.571 0.767 0.432 0.495 0.8200.452 0.030 0.860 0.001 0.231 0.470 0.715 0.475 0.491 0.799 0.476 0.002 0.914 0.000 0.261 0.586 0.775 0.427 0.495 0.8230.448 0.048 0.799 0.009 0.215 0.539 0.725 0.450 0.493 0.758 0.472 0.002 0.897 0.000 0.267 0.584 0.758 0.427 0.484 0.8290.448 0.033 0.850 0.000 0.230 0.495 0.712 0.435 0.493 0.787 0.473 0.002 0.903 0.000 0.250 0.586 0.773 0.427 0.484 0.829
Comb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=SComb=S Norm=S0.439 0.029 0.806 0.009 0.250 0.435 0.727 0.439 0.494 0.760 0.475 0.002 0.903 0.000 0.254 0.576 0.764 0.429 0.495 0.8520.420 0.015 0.775 0.004 0.210 0.436 0.717 0.409 0.443 0.774 0.474 0.002 0.901 0.000 0.271 0.584 0.763 0.427 0.484 0.8370.429 0.030 0.789 0.009 0.226 0.410 0.716 0.448 0.485 0.749 0.476 0.002 0.904 0.000 0.255 0.586 0.767 0.427 0.484 0.8540.438 0.040 0.839 0.006 0.246 0.418 0.726 0.409 0.480 0.775 0.480 0.002 0.910 0.000 0.269 0.615 0.771 0.431 0.494 0.825
IAIK
K-MeansPar
K-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-Means EMEMEMEMEMEMEMEMTotal AN CO CA CG HC HH HE Total AN CO CA CG HC HH HE
NNN
σ 0.0
σ 0.2
σ 0.4
σ 0.6
σ 0.8
σ 0.0
σ 0.2
σ 0.4
σ 0.6
σ 0.8
σ 0.0
σ 0.2
σ 0.4
σ 0.6
σ 0.8
σ 0.0
σ 0.2
σ 0.4
σ 0.6
σ 0.8
σ 0.0
σ 0.2
σ 0.4
σ 0.6
σ 0.8
Raw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw Data0.165 0.226 0.129 0.155 0.009 0.237 0.269 0.131 Not availableNot availableNot availableNot availableNot availableNot availableNot availableNot available0.017 0.028 0.030 0.016 0.012 0.014 0.012 0.004 0.201 0.312 0.103 0.171 0.013 0.309 0.278 0.223
Semantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsD=0.0 MDL=2.0D=0.0 MDL=2.0D=0.0 MDL=2.0D=0.0 MDL=2.0D=0.0 MDL=2.0D=0.0 MDL=2.0D=0.0 MDL=2.0D=0.0 MDL=2.0 D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0
0.193 0.253 0.135 0.113 0.007 0.356 0.293 0.195 0.190 0.291 0.098 0.227 0.003 0.228 0.258 0.2270.198 0.271 0.147 0.116 0.007 0.356 0.301 0.189 0.182 0.280 0.098 0.162 0.003 0.244 0.258 0.231
0.204 0.240 0.157 0.145 0.009 0.356 0.327 0.194 0.184 0.226 0.099 0.229 0.004 0.245 0.258 0.2270.194 0.221 0.154 0.145 0.008 0.359 0.275 0.196 0.194 0.291 0.097 0.240 0.003 0.217 0.281 0.2290.200 0.258 0.152 0.098 0.007 0.358 0.327 0.197 0.192 0.293 0.097 0.232 0.004 0.228 0.258 0.230
D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0 D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.00.211 0.320 0.042 0.262 0.001 0.325 0.311 0.215 0.210 0.327 0.127 0.218 0.021 0.237 0.311 0.229
0.201 0.257 0.032 0.262 0.001 0.323 0.311 0.222 0.210 0.322 0.126 0.218 0.021 0.237 0.320 0.229
0.208 0.299 0.035 0.261 0.001 0.326 0.311 0.220 0.211 0.322 0.127 0.218 0.021 0.237 0.320 0.229
0.204 0.281 0.029 0.262 0.001 0.325 0.311 0.220 0.211 0.321 0.128 0.218 0.021 0.237 0.320 0.229
0.207 0.292 0.041 0.263 0.001 0.326 0.311 0.216 0.209 0.310 0.127 0.218 0.021 0.237 0.320 0.229
D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5 D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.50.216 0.317 0.065 0.249 0.001 0.357 0.320 0.203 0.204 0.322 0.123 0.212 0.016 0.275 0.247 0.2330.211 0.295 0.052 0.247 0.000 0.355 0.320 0.209 0.204 0.322 0.123 0.212 0.016 0.275 0.247 0.2360.216 0.314 0.074 0.248 0.001 0.357 0.320 0.198 0.205 0.323 0.123 0.206 0.016 0.275 0.252 0.237
0.212 0.308 0.046 0.249 0.001 0.356 0.320 0.209 0.204 0.320 0.125 0.208 0.016 0.275 0.246 0.2360.211 0.293 0.063 0.248 0.000 0.354 0.320 0.201 0.204 0.323 0.125 0.208 0.016 0.275 0.249 0.232
D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0 D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.00.217 0.304 0.048 0.244 0.000 0.390 0.311 0.219 0.206 0.319 0.117 0.229 0.010 0.255 0.277 0.233
0.218 0.313 0.062 0.244 0.000 0.388 0.311 0.208 0.207 0.317 0.126 0.239 0.010 0.255 0.268 0.233
0.221 0.309 0.084 0.243 0.000 0.389 0.311 0.209 0.205 0.319 0.127 0.224 0.010 0.255 0.268 0.233
0.213 0.285 0.057 0.243 0.000 0.387 0.311 0.210 0.206 0.307 0.127 0.240 0.010 0.255 0.268 0.233
0.211 0.295 0.036 0.244 0.000 0.387 0.311 0.205 0.204 0.305 0.127 0.240 0.010 0.255 0.259 0.233
D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0 D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.00.203 0.294 0.030 0.248 0.000 0.335 0.315 0.196 0.192 0.323 0.108 0.248 0.009 0.201 0.250 0.205
0.208 0.306 0.059 0.248 0.000 0.334 0.315 0.193 0.190 0.321 0.107 0.237 0.009 0.201 0.251 0.205
0.205 0.310 0.050 0.248 0.000 0.334 0.315 0.178 0.193 0.322 0.122 0.243 0.009 0.201 0.249 0.205
0.207 0.300 0.063 0.248 0.001 0.333 0.313 0.192 0.192 0.321 0.122 0.243 0.010 0.201 0.245 0.205
0.210 0.330 0.050 0.246 0.001 0.336 0.315 0.191 0.192 0.323 0.122 0.243 0.009 0.201 0.240 0.205
IAIK
K-MeansPar
K-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-MeansK-Means EMEMEMEMEMEMEMEMEMEMEMTotal BW DI GL HS IO IR SE SO VE VO Total BW DI GL HS IO IR SE SO VE VO
NNN
σ 0.0
σ 0.2
σ 0.4
σ 0.6
σ 0.8
σ 0.0
σ 0.2
σ 0.4
σ 0.6
σ 0.8
σ 0.0
σ 0.2
σ 0.4
σ 0.6
σ 0.8
σ 0.0
σ 0.2
σ 0.4
σ 0.6
σ 0.8
σ 0.0
σ 0.2
σ 0.4
σ 0.6
σ 0.8
Raw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw DataRaw Data0.299 0.734 0.052 0.335 0.254 0.121 0.708 0.608 0.006 0.113 0.057 Not availableNot availableNot availableNot availableNot availableNot availableNot availableNot availableNot availableNot availableNot available0.307 0.735 0.030 0.388 0.019 0.123 0.705 0.529 0.008 0.188 0.342 0.346 0.718 0.103 0.370 0.289 0.254 0.806 0.621 0.005 0.103 0.194
Semantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsSemantic PatternsD=0.0 MDL=1.5D=0.0 MDL=1.5D=0.0 MDL=1.5D=0.0 MDL=1.5D=0.0 MDL=1.5D=0.0 MDL=1.5D=0.0 MDL=1.5D=0.0 MDL=1.5 D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0D=0.0 MDL=1.0
0.315 0.724 0.039 0.329 0.309 0.045 0.717 0.582 0.026 0.198 0.183 0.317 0.777 0.006 0.312 0.239 0.218 0.651 0.592 0.016 0.174 0.1860.323 0.724 0.025 0.334 0.344 0.071 0.730 0.590 0.012 0.198 0.196 0.327 0.752 0.001 0.318 0.240 0.218 0.766 0.598 0.016 0.167 0.197
0.318 0.719 0.026 0.285 0.316 0.051 0.769 0.600 0.008 0.199 0.203 0.323 0.727 0.011 0.287 0.229 0.217 0.749 0.600 0.018 0.176 0.2180.317 0.722 0.025 0.298 0.357 0.040 0.712 0.602 0.013 0.199 0.201 0.317 0.732 0.009 0.316 0.232 0.221 0.637 0.606 0.025 0.175 0.2140.299 0.646 0.015 0.294 0.328 0.026 0.686 0.581 0.014 0.198 0.200 0.325 0.703 0.006 0.305 0.233 0.216 0.796 0.594 0.019 0.181 0.195
D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0D=0.5 MDL=1.0 D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.0D=0.7 MDL=1.00.333 0.817 0.072 0.293 0.338 0.181 0.611 0.614 0.009 0.164 0.234 0.302 0.579 0.082 0.332 0.285 0.184 0.633 0.634 0.006 0.099 0.1830.333 0.817 0.076 0.278 0.340 0.181 0.621 0.621 0.009 0.151 0.237 0.300 0.579 0.082 0.307 0.285 0.184 0.636 0.632 0.006 0.117 0.1760.326 0.817 0.068 0.286 0.335 0.181 0.587 0.604 0.009 0.149 0.228 0.301 0.579 0.086 0.310 0.285 0.184 0.639 0.643 0.006 0.095 0.1830.327 0.817 0.072 0.269 0.337 0.181 0.604 0.580 0.009 0.166 0.232 0.301 0.579 0.076 0.319 0.285 0.184 0.639 0.632 0.006 0.109 0.185
0.334 0.817 0.071 0.303 0.336 0.181 0.610 0.605 0.011 0.163 0.244 0.300 0.579 0.079 0.311 0.285 0.184 0.633 0.633 0.006 0.109 0.183D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5D=0.5 MDL=1.5 D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5D=0.7 MDL=1.5
0.352 0.817 0.099 0.298 0.382 0.143 0.751 0.601 0.018 0.193 0.218 0.339 0.579 0.086 0.348 0.324 0.242 0.761 0.596 0.013 0.187 0.252
0.358 0.817 0.100 0.330 0.385 0.163 0.751 0.588 0.015 0.194 0.232 0.339 0.579 0.086 0.356 0.324 0.242 0.761 0.595 0.012 0.192 0.2390.352 0.817 0.096 0.315 0.387 0.143 0.738 0.576 0.019 0.193 0.231 0.340 0.579 0.092 0.348 0.324 0.242 0.761 0.603 0.012 0.194 0.2410.348 0.817 0.103 0.288 0.383 0.158 0.716 0.579 0.015 0.194 0.226 0.339 0.579 0.094 0.355 0.324 0.242 0.761 0.602 0.012 0.181 0.2400.356 0.817 0.098 0.296 0.378 0.166 0.776 0.604 0.012 0.190 0.225 0.338 0.579 0.107 0.355 0.324 0.242 0.752 0.597 0.012 0.177 0.236
D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0D=0.5 MDL=2.0 D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.0D=0.7 MDL=2.00.329 0.817 0.054 0.339 0.330 0.064 0.752 0.563 0.017 0.151 0.199 0.323 0.579 0.105 0.347 0.266 0.228 0.784 0.585 0.015 0.092 0.2270.328 0.817 0.052 0.320 0.330 0.064 0.753 0.585 0.017 0.144 0.196 0.325 0.579 0.098 0.359 0.266 0.228 0.784 0.584 0.015 0.098 0.238
0.331 0.817 0.055 0.313 0.330 0.109 0.767 0.562 0.012 0.149 0.194 0.323 0.579 0.105 0.358 0.266 0.228 0.784 0.576 0.015 0.090 0.2300.330 0.817 0.059 0.335 0.328 0.073 0.765 0.560 0.019 0.148 0.199 0.326 0.579 0.099 0.351 0.266 0.228 0.798 0.595 0.015 0.091 0.2350.333 0.817 0.064 0.321 0.330 0.068 0.764 0.593 0.013 0.158 0.200 0.326 0.579 0.104 0.361 0.266 0.228 0.798 0.585 0.015 0.090 0.237
D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0D=0.5 MDL=3.0 D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.0D=0.7 MDL=3.00.322 0.817 0.026 0.326 0.333 0.099 0.739 0.567 0.022 0.136 0.153 0.304 0.579 0.001 0.362 0.200 0.228 0.728 0.574 0.032 0.114 0.2240.322 0.817 0.029 0.326 0.320 0.127 0.702 0.583 0.017 0.150 0.150 0.307 0.579 0.000 0.364 0.208 0.228 0.735 0.573 0.029 0.113 0.2360.317 0.817 0.035 0.318 0.320 0.099 0.705 0.556 0.024 0.140 0.154 0.306 0.579 0.001 0.355 0.211 0.228 0.726 0.572 0.035 0.113 0.237
0.328 0.817 0.026 0.342 0.328 0.118 0.759 0.563 0.020 0.150 0.153 0.307 0.579 0.001 0.363 0.219 0.228 0.729 0.575 0.029 0.113 0.2330.323 0.817 0.029 0.330 0.322 0.099 0.731 0.563 0.023 0.151 0.161 0.304 0.579 0.001 0.356 0.204 0.224 0.713 0.589 0.030 0.119 0.226
IAIK
DistanceData
Missing
EucEucEucEucEucEucEucEuc CosCosCosCosCosCosCosCosRawRawRawRaw Semantic PatternsSemantic PatternsSemantic PatternsSemantic Patterns RawRawRawRaw Semantic PatternsSemantic PatternsSemantic PatternsSemantic Patterns
0% 10% 50% 90% 0% 10% 50% 90% 0% 10% 50% 90% 0% 10% 50% 90%
BCDEKRLYMUSOSPVOZOTotal
ANCOCACGHCHHHETotal
BWDIGLHSIOIRSESOVEVOTotal
CategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategorical0.52 0.52 0.52 0.52 0.54 0.54 0.53 0.50 0.53 0.53 0.53 0.51 0.54 0.54 0.53 0.510.68 0.66 0.55 0.32 0.81 0.80 0.38 0.22 0.66 0.66 0.67 0.36 0.81 0.80 0.74 0.460.54 0.54 0.53 0.52 0.52 0.52 0.51 0.50 0.54 0.54 0.53 0.51 0.52 0.52 0.52 0.510.63 0.68 0.63 0.30 0.63 0.59 0.64 0.48 0.59 0.53 0.51 0.32 0.61 0.58 0.56 0.350.64 0.64 0.62 0.57 0.68 0.67 0.62 0.53 0.57 0.57 0.56 0.54 0.67 0.67 0.67 0.620.65 0.63 0.53 0.22 0.75 0.70 0.09 0.08 0.58 0.56 0.50 0.18 0.73 0.72 0.63 0.280.48 0.47 0.44 0.38 0.62 0.46 0.39 0.39 0.44 0.44 0.41 0.37 0.57 0.57 0.54 0.450.80 0.79 0.76 0.67 0.78 0.78 0.68 0.51 0.62 0.63 0.67 0.62 0.79 0.79 0.78 0.720.83 0.81 0.72 0.31 0.86 0.85 0.64 0.24 0.80 0.79 0.71 0.31 0.86 0.84 0.76 0.410.64 0.64 0.59 0.42 0.69 0.66 0.50 0.38 0.59 0.58 0.57 0.41 0.68 0.67 0.64 0.48
MixedMixedMixedMixedMixedMixedMixedMixedMixedMixedMixedMixedMixedMixedMixedMixed0.64 0.63 0.55 0.38 0.66 0.67 0.51 0.38 0.44 0.46 0.50 0.38 0.66 0.66 0.61 0.420.59 0.59 0.56 0.51 0.59 0.58 0.52 0.50 0.50 0.50 0.51 0.51 0.62 0.62 0.60 0.570.62 0.61 0.59 0.54 0.65 0.65 0.60 0.52 0.55 0.55 0.54 0.51 0.65 0.64 0.63 0.570.52 0.52 0.52 0.50 0.52 0.53 0.54 0.53 0.51 0.51 0.52 0.51 0.52 0.52 0.52 0.520.86 0.86 0.85 0.81 0.87 0.87 0.85 0.81 0.81 0.81 0.82 0.81 0.87 0.87 0.86 0.840.87 0.86 0.85 0.82 0.87 0.87 0.83 0.80 0.84 0.84 0.83 0.81 0.88 0.88 0.87 0.830.59 0.58 0.56 0.50 0.64 0.64 0.58 0.55 0.52 0.51 0.55 0.52 0.65 0.65 0.64 0.570.67 0.67 0.64 0.58 0.69 0.69 0.63 0.58 0.60 0.60 0.61 0.58 0.69 0.69 0.68 0.62
NumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumericalNumerical0.86 0.86 0.76 0.68 0.91 0.91 0.84 0.69 0.62 0.61 0.59 0.50 0.90 0.89 0.88 0.840.55 0.54 0.53 0.53 0.56 0.55 0.54 0.50 0.53 0.53 0.52 0.50 0.56 0.55 0.55 0.530.49 0.45 0.31 0.30 0.53 0.52 0.42 0.31 0.51 0.51 0.48 0.29 0.53 0.52 0.48 0.340.64 0.63 0.59 0.52 0.69 0.69 0.61 0.53 0.54 0.54 0.55 0.51 0.69 0.69 0.65 0.600.51 0.52 0.55 0.54 0.61 0.61 0.56 0.46 0.46 0.46 0.47 0.51 0.61 0.61 0.60 0.570.81 0.60 0.47 0.33 0.83 0.81 0.75 0.67 0.87 0.84 0.77 0.34 0.84 0.81 0.76 0.750.61 0.53 0.21 0.15 0.57 0.57 0.43 0.17 0.39 0.40 0.44 0.27 0.57 0.57 0.55 0.410.54 0.53 0.51 0.50 0.54 0.54 0.51 0.50 0.52 0.52 0.52 0.52 0.54 0.54 0.54 0.530.35 0.33 0.29 0.26 0.37 0.37 0.35 0.28 0.36 0.36 0.36 0.31 0.37 0.37 0.36 0.330.15 0.15 0.12 0.09 0.22 0.21 0.16 0.10 0.20 0.20 0.17 0.10 0.21 0.21 0.20 0.130.55 0.51 0.43 0.39 0.58 0.58 0.52 0.42 0.50 0.50 0.49 0.38 0.58 0.58 0.56 0.50
IAIK
Data set EUC (N) EUC (NN) COS (NN) EUC (NN) COS (NN) EUC (NN) COS (NN)
BCDEKRLYMUSOSPVOZOTotal
ANCOCACGHCHHHETotal
BWDIGLHSIOIRSESOVEVOTotal
RAWRAWRAW BaselineBaseline Semantic PatternsSemantic PatternsCategoricalCategoricalCategoricalCategoricalCategoricalCategoricalCategorical
0.52 0.53 0.53 0.52 0.53 0.54 0.540.68 0.68 0.66 0.67 0.67 0.81 0.810.54 0.54 0.54 0.54 0.54 0.52 0.520.63 0.63 0.59 0.60 0.57 0.63 0.610.64 0.64 0.57 0.64 0.64 0.68 0.670.65 0.65 0.58 0.69 0.70 0.75 0.730.48 0.48 0.44 0.48 0.48 0.62 0.570.80 0.80 0.62 0.80 0.80 0.78 0.790.84 0.83 0.80 0.85 0.84 0.86 0.860.64 0.64 0.59 0.64 0.64 0.69 0.68
MixedMixedMixedMixedMixedMixedMixed0.64 0.64 0.44 0.64 0.65 0.65 0.660.59 0.59 0.50 0.59 0.60 0.58 0.620.62 0.62 0.55 0.61 0.61 0.61 0.650.52 0.52 0.51 0.52 0.52 0.52 0.520.86 0.86 0.81 0.85 0.85 0.86 0.870.87 0.87 0.84 0.86 0.86 0.86 0.880.59 0.59 0.52 0.61 0.60 0.63 0.650.67 0.67 0.60 0.67 0.67 0.67 0.69
NumericalNumericalNumericalNumericalNumericalNumericalNumerical0.86 0.86 0.62 0.74 0.74 0.89 0.900.55 0.55 0.53 0.54 0.54 0.55 0.560.49 0.49 0.51 0.51 0.51 0.53 0.530.64 0.64 0.54 0.63 0.63 0.66 0.690.51 0.51 0.46 0.55 0.55 0.63 0.610.81 0.81 0.87 0.73 0.73 0.81 0.830.61 0.61 0.39 0.54 0.54 0.57 0.570.54 0.54 0.52 0.54 0.54 0.54 0.540.35 0.35 0.36 0.37 0.37 0.36 0.370.15 0.15 0.20 0.21 0.21 0.22 0.210.55 0.55 0.50 0.54 0.54 0.58 0.58