towards ci foundations włodzisław duch department of informatics, nicolaus copernicus university,...
TRANSCRIPT
Towards CI Foundations Towards CI Foundations Towards CI Foundations Towards CI Foundations
Włodzisław Duch
Department of Informatics, Nicolaus Copernicus University, Toruń, Poland
Google: W. DuchWCCI’08 Panel Discussion
QuestionsQuestions
• Nature of CI
• Current state of CI
• Promoting CI
• CI and Smart Adaptive Systems
• CI and Nature-inspiration
• Future of CI
CI definitionCI definitionCI definitionCI definitionComputational Intelligence. An International Journal (1984)+ 10 other journals with “Computational Intelligence”,
D. Poole, A. Mackworth & R. Goebel, Computational Intelligence - A Logical Approach. (OUP 1998), GOFAI book, logic and reasoning.
CI should: • be problem-oriented, not method oriented;• cover all that CI community is doing now, and is likely to do in future;• include AI – they also think they are CI ...
CI: science of solving (effectively) non-algorithmizable problems.
Problem-oriented definition, firmly anchored in computer sci/engineering.AI: focused problems requiring higher-level cognition, the rest of CI is more focused on problems related to perception/action/control.
Are we really Are we really so good?so good?
Surprise!
Almost nothing can be learned using current CI tools!
Ex: complex logic;natural language; natural perception.
How much can we learn?How much can we learn?Linearly separable or almost separable problems are relatively simple – deform or add dimensions to make data separable.
How to define “slightly non-separable”? There is only separable and the vast realm of the rest.
Boolean functionsBoolean functionsBoolean functionsBoolean functionsn=2, 16 functions, 12 separable, 4 not separable.
n=3, 256 f, 104 separable (41%), 152 not separable.
n=4, 64K=65536, only 1880 separable (3%)
n=5, 4G, but << 1% separable ... bad news!
Existing methods may learn some non-separable functions, but most functions cannot be learned !
Example: n-bit parity problem; many papers in top journals.No off-the-shelf systems are able to solve such problems.
For parity problems SVM may go below base rate! Such problems are solved only by special neural architectures or special classifiers – if the type of function is known.
But parity is still trivial ... solved by 1
cosn
ii
y b
kD casekD casekD casekD case3-bit functions: X=[b1b2b3], from [0,0,0] to [1,1,1]
f(b1,b2,b3) and f(b1,b2,b3) are symmetric (color change)
8 cube vertices, 28=256 Boolean functions.
0 to 8 red vertices: 1, 8, 28, 56, 70, 56, 28, 8, 1 functions.
For arbitrary direction W index projection W.X gives:
k=1 in 2 cases, all 8 vectors in 1 cluster (all black or all white)
k=2 in 14 cases, 8 vectors in 2 clusters (linearly separable)
k=3 in 42 cases, clusters B R B or W R W
k=4 in 70 cases, clusters R W R W or W R W R
Symmetrically, k=5-8 for 70, 42, 14, 2.
Most logical functions have 4 or 5-separable projections.
Learning = find best projection for each function. Number of k=1 to 4-separable functions is: 2, 102, 126 and 26126 of all functions may be learned using 3-separability.
RBF for XORRBF for XORRBF for XORRBF for XORIs RBF solution with 2 hidden Gaussians nodes possible?Typical architecture: 2 input – 2 Gaussians – 1 linear output, ML training
50% errors, but there is perfect separation - not a linear separation! Network knows the answer, but cannot say it ...
Single Gaussian output node may solve the problem. Output weights provide reference hyperplanes (red and green lines), not the separating hyperplanes like in case of MLP.
3-bit parity in 2D and 3D3-bit parity in 2D and 3D3-bit parity in 2D and 3D3-bit parity in 2D and 3DOutput is mixed, errors are at base level (50%), but in the hidden space ...
Conclusion: separability in the hidden space is perhaps too much to desire ... inspection of clusters is sufficient for perfect classification; add second Gaussian layer to capture this activity; train second RBF on the data (stacking), reducing number of clusters.
Spying on networksSpying on networksSpying on networksSpying on networksAfter initial transformation, what still needs to be done?
Conclusion: separability in the hidden space is perhaps too much to desire ... rules, similarity or linear separation, depending on the case.
Parity n=9Parity n=9Parity n=9Parity n=9
Simple gradient learning; quality index shown below.
More meta-learningMore meta-learningMore meta-learningMore meta-learning
Meta-learning: learning how to learn, replace experts who search for best models making a lot of experiments.Search space of models is too large to explore it exhaustively, design system architecture to support knowledge-based search.
• Abstract view, uniform I/O, uniform results management.
• Directed acyclic graphs (DAG) of boxes representing scheme
• placeholders and particular models, interconnected through I/O.
• Configuration level for meta-schemes, expanded at runtime level.
An exercise in software engineering for data mining!
Intemi, Intelligent MinerIntemi, Intelligent MinerIntemi, Intelligent MinerIntemi, Intelligent MinerMeta-schemes: templates with placeholders
• May be nested; the role decided by the input/output types.
• Machine learning generators based on meta-schemes.
• Granulation level allows to create novel methods.
• Complexity control: Length + log(time)
• A unified meta-parameters description ...
• InteMi, intelligent miner, coming “soon”.
Biological justificationBiological justificationBiological justificationBiological justification• Cortical columns may learn to respond to stimuli with complex logic
resonating in different way.
• The second column will learn without problems that such different reactions have the same meaning: inputs xi and training targets yj. are same => Hebbian learning Wij ~ xi yj => identical weights.
• Effect: same line y=W.X projection, but inhibition turns off one perceptron when the other is active.
• Simplest solution: oscillators based on combination of two neurons (W.X-b) – (W.X-b’) give localized projections!
• We have used them in MLP2LN architecture for extraction of logical rules from data.
• Note: k-sep. learning is not a multistep output neuron, targets are not known, same class vectors may appear in different intervals!
• We need to learn how to find intervals and how to assign them to classes; new algorithms are needed to learn it!