Transcript
Page 1: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Towards comprehensive foundations Towards comprehensive foundations of Computational Intelligenceof Computational Intelligence

Towards comprehensive foundations Towards comprehensive foundations of Computational Intelligenceof Computational Intelligence

Włodzisław Duch

Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

School of Computer Engineering, Nanyang Technological University, Singapore

Google: DuchICONIP HK 11/2006

Page 2: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

PlanPlanPlanPlan

• What is Computational intelligence (CI) ?

• What can we learn?

• Why solid foundations are needed.

• Similarity based framework.

• Transformations and heterogeneous systems.

• Meta-learning.

• Beyond pattern recognition.

• Scaling up intelligent systems to human level competence?

Page 3: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

What is Computational Intelligence?What is Computational Intelligence?What is Computational Intelligence?What is Computational Intelligence?

The Field of Interest of the Society shall be the theory, design, application, and development of biologically and linguistically motivated computational paradigms emphasizing neural networks, connectionist systems, genetic algorithms, evolutionary programming, fuzzy systems, and hybrid intelligent systems in which these paradigms are contained.

Artificial Intelligence (AI) was established in 1956! AI Magazine 2005, Alan Mackworth: In AI's youth, we worked hard to establish our paradigm by vigorously attacking and excluding apparent pretenders to the throne of intelligence, pretenders such as pattern recognition, behaviorism, neural networks, and even probability theory. Now that we are established, such ideological purity is no longer a concern. We are more catholic, focusing on problems, not on hammers. Given that we do have a comprehensive toolbox, issues of architecture and integration emerge as central.

Page 4: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

CI definitionCI definitionCI definitionCI definitionComputational Intelligence. An International Journal (1984)+ 10 other journals with “Computational Intelligence”,

D. Poole, A. Mackworth & R. Goebel, Computational Intelligence - A Logical Approach. (OUP 1998), GOFAI book, logic and reasoning.

CI should: • be problem-oriented, not method oriented;• cover all that CI community is doing now, and is likely to do in future;• include AI – they also think they are CI ...

CI: science of solving (effectively) non-algorithmizable problems.

Problem-oriented definition, firmly anchored in computer science.AI: focused problems requiring higher-level cognition, the rest of CI is more focused on problems related to perception and control.

Page 5: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

The future of computational intelligence ...

Page 6: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

What can we learn?What can we learn?What can we learn?What can we learn?Good part of CI is about learning.What can we learn?

Neural networks are universal approximators and evolutionary algorithms solve global optimization problems – so everything can be learned? Not quite ...

Duda, Hart & Stork, Ch. 9, No Free Lunch + Ugly Duckling Theorems:

• Uniformly averaged over all target functions the expected error for all learning algorithms is the same. • Averaged over all target functions no learning algorithm yields generalization error that is superior to any other. • There is no problem-independent or “best” set of features.

“Experience with a broad range of techniques is the best insurance for solving arbitrary new classification problems.”

Page 7: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Data mining packagesData mining packagesData mining packagesData mining packages

• No free lunch => provide different type of tools for knowledge discovery: decision tree, neural, neurofuzzy, similarity-based, SVM, committees, tools for visualization of data.

• Support the process of knowledge discovery/model building and evaluating, organizing it into projects.

• Many other interesting DM packages of this sort exists: Weka, Yale, Orange, Knime ... 168 packages on the-data-mine.com list!

• We are building Intemi, completely new tools.

Surprise! Almost nothing can be learned using such tools!

GhostMiner, data mining tools from our lab + Fujitsu: http://www.fqspl.com.pl/ghostminer/

• Separate the process of model building (hackers) and knowledge discovery, from model use (lamers) => GM Developer & Analyzer

Page 8: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

What DM packages do?What DM packages do?What DM packages do?What DM packages do?

Hundreds of components ... transforming, visualizing ...

Yale 3.3: type # components

Data preprocessing 74

Experiment operations 35

Learning methods 114

Metaoptimization schemes 17

Postprocessing 5

Performance validation 14

Visualization, presentation, plugin extensions ...

Visual “knowledge flow” to link components, or script languages (XML) to define complex experiments.

Page 9: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

What NN components really do?What NN components really do?What NN components really do?What NN components really do?

Vector mappings from the input space to hidden space(s) and to the output space + adapt parameters to improve cost functions.

Hidden-Output mapping done by MLPs:

T = {Xi} training data, N-dimensional.

H = {hj(T)} X image in the hidden space, j =1 .. NH-dim.

... more transformations in hidden layers

Y = {yk(H )} X image in the output space, k =1 .. NC-dim.

ANN goal:

data image H in the last hidden space should be linearly separable; internal representations will determine network generalization.

But we never look at them!

Page 10: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Why solid foundations are neededWhy solid foundations are neededWhy solid foundations are neededWhy solid foundations are needed

Hundreds of components ... thousands of combinations ...

Our treasure box is full! We can publish forever!

But what would we really like to have?

Press the button and wait for the truth!

Computer power is with us, meta-learning should find all interesting data models = sequences of transformations/procedures.

Many considerations: optimal cost solutions, various costs of using feature subsets; models that are simple & easy to understand; various representation of knowledge: crisp, fuzzy or prototype rules, visualization, confidence in predictions ...

Page 11: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Principles: information compressionPrinciples: information compressionPrinciples: information compressionPrinciples: information compression

Neural information processing in perception and cognition: information compression, or algorithmic complexity. In computing: minimum length (message, description) encoding.

Wolff (2006): cognition and computation as compression by multiple alignment, unification and search. Analysis and production of natural language, fuzzy pattern recognition, probabilistic reasoning and unsupervised inductive learning.

So far only models for sequential data and 1D alignment.

Information compression: encoding new information in terms of old has been used to define the measure of syntactic and semantic information (Duch, Jankowski 1994); based on the size of the minimal graph representing a given data structure or knowledge-base specification, thus it goes beyond alignment.

k

k

Page 12: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Graphs of consistent conceptsGraphs of consistent conceptsLearn new concepts in terms of old; using large semantic network and add new concepts linking them to known.

Disambiguate concepts by spreading activation and selecting those that are consistent with already active subnetworks.

Page 13: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Similarity-based frameworkSimilarity-based frameworkSimilarity-based frameworkSimilarity-based framework(Dis)similarity: • more general than feature-based description, • no need for vector spaces (structured objects), • more general than fuzzy approach (F-rules are reduced to P-rules), • includes nearest neighbor algorithms, MLPs, RBFs, separable

function networks, SVMs, kernel methods and many others.

Similarity-Based Methods (SBMs) are organized in a framework: p(Ci|X;M) posterior classification probability or y(X;M) approximators,models M are parameterized in increasingly sophisticated way.

A systematic search (greedy, beam, evolutionary) in the space of all SBM models is used to select optimal combination of parameters and procedures, opening different types of optimization channels, trying to discover appropriate bias for a given problem.Results: several candidate models, very limited version gives best results in 7 out of 12 Stalog problems.

Page 14: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

SBM frameworkSBM frameworkSBM frameworkSBM framework• Pre-processing: from objects (cases) O to features X or directly to

(diss)similarities D(O,O’). • Calculation of similarity between features d(xi,yi) and objects D(X,Y).• Reference (or prototype) vector R selection/creation/optimization. • Weighted influence of references vectors G(D(Ri,X)), i=1..k.• Functions/procedures to estimate p(C|X;M) or y(X;M). • Cost functions E[DT;M] and model selection/validation procedures. • Optimization procedures for the whole model Ma.• Search control procedures to create more complex models Ma+1.• Creation of ensembles of (local, competent) models.

• M={X(O), d(.,.), D(.,.), k, G(D), {R}, {pi(R)}, E[.], K(.), S(.,.)}, where:• S(Ci,Cj) is a matrix evaluating similarity of the classes;

a vector of observed probabilities pi(X) instead of hard labels.

The kNN model p(Ci|X;kNN) = p(Ci|X;k,D(.),{DT}); the RBF model: p(Ci|X;RBF) = p(Ci|X;D(.),G(D),{R}), etc.

Page 15: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Meta-learning in SBM schemeMeta-learning in SBM schemeMeta-learning in SBM schemeMeta-learning in SBM scheme

Start from kNN, k=1, all data & features, Euclidean distance, end with a model that is a novel combination of procedures and parameterizations.

k-NN 67.5/76.6%

+d(x,y); Canberra 89.9/90.7 %

+ si=(0,0,1,0,1,1); 71.6/64.4 %

+selection, 67.5/76.6 %

+k opt; 67.5/76.6 %

+d(x,y) + si=(1,0,1,0.6,0.9,1); Canberra 74.6/72.9 %

+d(x,y) + sel. or opt k; Canberra 89.9/90.7 %

k-NN 67.5/76.6%

+d(x,y); Canberra 89.9/90.7 %

+ si=(0,0,1,0,1,1); 71.6/64.4 %

+selection, 67.5/76.6 %

+k opt; 67.5/76.6 %

+d(x,y) + si=(1,0,1,0.6,0.9,1); Canberra 74.6/72.9 %

+d(x,y) + selection; Canberra 89.9/90.7 %

Page 16: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Transformation-based frameworkTransformation-based frameworkTransformation-based frameworkTransformation-based framework

Extend SBM adding fine granulation of methods and relations between them to enable meta-learning by search in the model space.

For example, first transformation (layer) after pre-processing:

• PCA networks, with each node computing principal component.

• LDA networks, each node computes LDA direction (including FDA).

• ICA networks, nodes computing independent components.

• KL, or Kullback-Leibler networks with orthogonal or non-orthogonal components; max. of mutual information is a special case

• 2 and other statistical tests for dependency to aggregate features.

• Factor analysis networks, computing common and unique factors.

• Matching pursuit networks for signal decomposition.

Evolving Transformation Systems (Goldfarb 1990-2006), unified paradigm for inductive learning and structural representations.

Page 17: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Heterogeneous systemsHeterogeneous systemsHeterogeneous systemsHeterogeneous systemsProblems requiring different scales or types.

2-class problems, two situations:

C1 inside the sphere, C2 outside.MLP: at least N+1 hyperplanes, O(N2) parameters. RBF: 1 Gaussian, O(N) parameters.

C1 in the corner defined by (1,1 ... 1) hyperplane, C2 outside.MLP: 1 hyperplane, O(N) parameters. RBF: many Gaussians, O(N2) parameters, poor approx.

Combination: needs both hyperplane and hypersphere!

Logical rule: IF x1>0 & x2>0 THEN C1 Else C2

is not represented properly by MLP and RBF!

Different types of functions in one model, first step beyond inspirations from single neurons => heterogenous models.

Page 18: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Heterogeneous everythingHeterogeneous everythingHeterogeneous everythingHeterogeneous everythingHomogenous systems: one type of “building blocks”, same type of decision borders, ex: neural networks, SVMs, decision trees, kNNs

Committees combine many models together, but lead to complex models that are difficult to understand.

Ockham razor: simpler systems are better. Discovering simplest class structures, inductive bias of the data, requires Heterogeneous Adaptive Systems (HAS).

HAS examples:NN with different types of neuron transfer functions.k-NN with different distance functions for each prototype.Decision Trees with different types of test criteria.

1. Start from large networks, use regularization to prune.2. Construct network adding nodes selected from a candidate pool.3. Use very flexible functions, force them to specialize.

Page 19: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Taxonomy of NN activation functionsTaxonomy of NN activation functionsTaxonomy of NN activation functionsTaxonomy of NN activation functions

Page 20: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Taxonomy of NN output functionsTaxonomy of NN output functionsTaxonomy of NN output functionsTaxonomy of NN output functions

Perceptron: implements logical rule x> for x with Gaussian uncertainty.

Page 21: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Taxonomy Taxonomy - TF- TFTaxonomy Taxonomy - TF- TF

Page 22: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

HAS decision treesHAS decision treesHAS decision treesHAS decision treesDecision trees select the best feature/threshold value for univariate and multivariate trees:

Decision borders: hyperplanes.

Introducing tests based on L Minkovsky metric.

or ; ,i k k i i ki

X T W X X W

For L2 spherical decision border are produced.

For L∞ rectangular border are produced.

Many choices, for example Fisher Linear Discrimination decision trees.

For large databases first clusterize data to get candidate references R.

1/; , R i i R

i

T X R

X R X R

Page 23: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

SSV HAS DT exampleSSV HAS DT exampleSSV HAS DT exampleSSV HAS DT example

SSV HAS tree in GhostMiner 3.0, Wisconsin breast cancer (UCI)699 cases, 9 features (cell parameters, 1..10)Classes: benign 458 (65.5%) & malignant 241 (34.5%).

Single rule gives simplest known description of this data:

IF ||X-R303|| < 20.27 then malignant

else benign coming most often in 10xCV

97.4% accuracy (18 errors); good prototype for malignant!

Simple thresholds, that’s what MDs like the most!

Best 10CV around 97.5±1.8% (Naïve Bayes + kernel, or SVM)

SSV without distances: 96.4±2.1%

C 4.5 gives 94.7±2.0%

Several simple rules of similar accuracy but different specificity or sensitivity may be created using HAS DT. Need to select or weight features and select good prototypes.

Page 24: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

How much can we learn?How much can we learn?Linearly separable or almost separable problems are relatively simple – deform or add dimensions to make data separable.

How to define “slightly non-separable”? There is only separable and the vast realm of the rest.

Page 25: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Difficult case: complex logicDifficult case: complex logicFor n bits there are 2n nodes; in extreme cases such as parity all neighbors are from the wrong class, so localized networks will fail.

Achieving linear separability without special architecture may be impossible.

Projection on 111 ... 111 gives clusters with 0, 1, 2 ... n bits.

Page 26: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Easy and difficult problemsEasy and difficult problemsEasy and difficult problemsEasy and difficult problemsLinear separation: good goal if simple topological deformation of decision borders is sufficient.Linear separation of such data is possible in higher dimensional spaces; this is frequently the case in pattern recognition problems. RBF/MLP networks with one hidden layer solve such problems.

Difficult problems: disjoint clusters, complex logic.Continuous deformation is not sufficient; networks with localized functions need exponentially large number of nodes.

Boolean functions: for n bits there are K=2n binary vectors that can be represented as vertices of n-dimensional hypercube. Each Boolean function is identified by K bits. BoolF(Bi) = 0 or 1 for i=1..K, for 2K Boolean functions.

Ex: n=2 functions, vectors {00,01,10,11}, Boolean functions {0000, 0001 ... 1111}, decimal numbers 0 to 15.

Page 27: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Boolean functionsBoolean functionsBoolean functionsBoolean functionsn=2, 16 functions, 12 separable, 4 not separable.

n=3, 256 f, 104 separable (41%), 152 not separable.

n=4, 64K=65536, only 1880 separable (3%)

n=5, 4G, but << 1% separable ... bad news!

Existing methods may learn some non-separable functions, but most functions cannot be learned !

Example: n-bit parity problem; many papers in top journals.No off-the-shelf systems are able to solve such problems.

For all parity problems SVM is below base rate! Such problems are solved only by special neural architectures or special classifiers – if the type of function is known.

But parity is still trivial ... solved by 1

cosn

ii

y b

Page 28: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Learning trajectoriesLearning trajectoriesLearning trajectoriesLearning trajectories• Take weights Wi from iterations i =1..K;

PCA on Wi covariance matrix usually captures 95-98% variance, so error function in 2D shows realistic learning trajectories.

Instead of local minima large flat valleys are seen – why?

Data far from decision borders has almost no influence, the main reduction of MSE is achieved by increasing ||W||, sharpening sigmoidal functions.

Papers by M. Kordos & W. Duch

Page 29: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

RBF for XORRBF for XORRBF for XORRBF for XORIs RBF solution with 2 hidden Gaussians nodes possible?Typical architecture: 2 input – 2 Gaussians – 1 linear output, EM training

50% errors, but there is perfect separation - not a linear separation! Network knows the answer, but cannot say it ...

Single Gaussian output node may solve the problem. Output weights provide reference hyperplanes (red and green lines), not the separating hyperplanes like in case of MLP.

Page 30: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

3-bit parity3-bit parity3-bit parity3-bit parityFor RBF parity problems are difficult; 8 nodes solution:

1) Output activity;2) reduced output,summing activity of 4 nodes.

3) Hidden 8D space activity, near ends of coordinate versors. 4) Parallel coordinate representation.

8 nodes solution has zero generalization, 50% errors in L1O.

Page 31: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

3-bit parity in 2D and 3D3-bit parity in 2D and 3D3-bit parity in 2D and 3D3-bit parity in 2D and 3DOutput is mixed, errors are at base level (50%), but in the hidden space ...

Conclusion: separability in the hidden space is perhaps too much to desire ... inspection of clusters is sufficient for perfect classification; add second Gaussian layer to capture this activity; train second RBF on the data (stacking), reducing number of clusters.

Page 32: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Goal of learningGoal of learningGoal of learningGoal of learningIf simple topological deformation of decision borders is sufficient linear separation is possible in higher dimensional spaces, “flattening” non-linear decision borders; this is frequently the case in pattern recognition problems. RBF/MLP networks with one hidden layer solve the problem.

For complex logic this is not sufficient; networks with localized functions need exponentially large number of nodes.

Such situations arise in AI problems, real perception, object recognition, text analysis, bioinformatics ...

Linear separation is too difficult, set an easier goal. Linear separation: projection on 2 half-lines in the kernel space: line y=WX, with y<0 for class – and y>0 for class +.

Simplest extension: separation into k-intervals. For parity: find direction W with minimum # of intervals, y=W.X

Page 33: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

k-separabilityk-separabilityk-separabilityk-separabilityCan one learn all Boolean functions?

Problems may be classified as 2-separable (linear separability); non separable problems may be broken into k-separable, k>2.

Blue: sigmoidal neurons with threshold, brown – linear neurons.

X1

X2

X3

X4

y=W.

X

+1

1

+11

(y+)

(y+)

+1

+1+1+1

(y+)

Neural architecture for k=4 intervals, or 4-separable problems.

Page 34: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

k-sep learningk-sep learningk-sep learningk-sep learningTry to find lowest k with good solution, start from k=2.

• Assume k=2 (linear separability), try to find good solution; • if k=2 is not sufficient, try k=3; two possibilities are C+,C,C+ and

C, C+, C this requires only one interval for the middle class;• if k<4 is not sufficient, try k=4; two possibilities are C+, C, C+, C and

C, C+, C, C+ this requires one closed and one open interval.

Network solution is equivalent to optimization of specific cost function.

2

1 2 1 2

2

1 2

, , , 1

, 1

E C C

C C

X

X

W X W X W X

X W X W X

Simple backpropagation solved almost all n=4 problems for k=2-5 finding lowest k with such architecture! Parity-like problems up to n=8 work fine.

Page 35: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

A better solution?A better solution?A better solution?A better solution?What is needed to learn Boolean functions?

• cluster non-local areas in the X space, use W.X

• capture local clusters after transformation, use G(W.X)

SVM cannot solve this problem! Number of directions W that should be

considered grows exponentially with size of the problem n.

Constructive neural network solution:

1. Train the first neuron using G(W.X) transfer function on whole

data T, capture the largest pure cluster TC .

2. Train next neuron on reduced data T 1=TTC

3. Repeat until all data is handled; they creates transform. X=>H4. Use linear transformation H => Y for classification.

Page 36: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Beyond pattern recognitionBeyond pattern recognitionBeyond pattern recognitionBeyond pattern recognition

A step towards problems requiring combinatorial reasoning: learning from partial observations. We have observed unicorns, and know/guess/infer that:

• If the unicorn is mythical, then it is immortal. • But if it is not mythical, then it is a mortal mammal. • If the unicorn is either immortal or a mammal, then it is horned. • The unicorn is magical if it is horned.

Can you draw any firm conclusions about unicorns?

Variables: mythical, mortal, mammal, horned, magical.

Using intuitive computing – search based on neural heuristics –the answer may be found quickly ...

Page 37: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Scaling-up to human level ...Scaling-up to human level ...Scaling-up to human level ...Scaling-up to human level ...

Recent discussions (see also our Friday panel):

• Roadmap to human level intelligence + special session – WCCI 2006• Cognitive Systems, ICANN panel 2007• Books: Challenges to CI (with J Mandziuk); Roadmap (with J Taylor)

The roadmap to human-level intelligence should define steps and challenges on the way to human level of competence.

Neuromorphic, mesoscopic, hybrid neuro-symbolic architectures? Designs for artificial brains, blueprints for billion neuron cortex models, scalable neuromorphic approaches.

Cognitive/affective architectures, artificial minds with human characteristics require integration of perception, affect and cognition, large-scale semantic memories, implementing control/attention.Is human-style creativity using CI possible?What is the role of CI in brain-like computing systems?

Page 38: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

SummarySummarySummarySummary

• CI is a branch of science dealing with problems for which effective algorithms do not exist; it includes AI, machine learning and all the rest.

• Similarity-based framework enables meta-learning as search in the model space, heterogeneous systems add fine granularity.

• Transformation-based learning extends that, formalizing component-based approach to DM, automating discovery of interesting models.

• Known and new learning methods result from such framework.

• No off-shelf classifiers are able to learn difficult Boolean functions.

• Visualization of activity of the hidden neurons shows that frequently perfect but non-separable solutions are found despite base-rate outputs.

• Linear separability is not the best goal of learning, other targets that allow for easy handling of final non-linearities should be defined.

• Simplest extension is to isolate non-linearity in form of k intervals, breaking the non-separable class of problems into k-separable classes.

Many interesting new methods arise from this line of thinking.

Page 39: Towards comprehensive foundations of Computational Intelligence Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Toruń, Poland

Thank Thank youyoufor for

lending lending your your ears ears

......

Google: Duch => Papers & presentations


Top Related