challenges in physically inspired machine learning (pialm task force) dymitr ruta, phd bt group...
TRANSCRIPT
![Page 1: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/1.jpg)
Challenges in Physically Inspired Machine Learning (PIALM Task Force)Dymitr Ruta, PhD
BT Group
Bogdan Gabrys, PhD
Bournemouth University
![Page 2: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/2.jpg)
© British Telecommunications plc
Agenda
• Motivation• The links between physics and information theory
• Data fields methodology for classification, clustering, data condensation and visualisation
• Information theoretic learning (ITL)
• Hybrid large scale optimisation (simulations)• Concluding remarks• Discussion, Q&A
![Page 3: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/3.jpg)
© British Telecommunications plc
Motivation (Business): Ability to Predict is the Key to Survival & Success
• DISCOVER the data driven problem that can be improved using intelligent learning techniques
• DESCRIBE the problem and its characteristics, prior knowledge, input and output data
• MODEL the relationship between inputs and outputs adopting existing algorithms or devise the new ones
• LEARN to reproduce outputs based on previously unseen inputs
• PREDICT the future outputs and save/earn money
Extract Clean Pre-process …… Tune Implement Productise Deploy Support …
![Page 4: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/4.jpg)
© British Telecommunications plc
Motivation (Personal): Physical phenomena guide artificial learning mechanisms
• Deep analogies between information and matter, energy and uncertainty, complexity and entropy
• Any knowledge can only be conveyed using certain amount of matter/energy
• Physics limits the ability to access, learn and know.• Convergence of matter and information at the
quantum level: “It From Bit”• Computational intelligence sciences are in chaos:
– Lack of unified theory of information and its processing– Vast amounts of data, yet mostly numericals are used– Many models, too many assumptions, poor performance
• Guidance of well established physical models
![Page 5: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/5.jpg)
© British Telecommunications plc
Key Analogies Between Physics and Information Theory
• Energy, Work → Uncertainty, Information• Law of the Total Energy/Uncertainty Preservation• Thermodynamic Entropy → Shannon Entropy• Matter, Space → Information, Knowledge Space• Heisenberg Principle of Uncertainty → Breiman
Principle of Uncertainty• Information exist only in the physical context• Physics and information theory converge at the
quantum level
![Page 6: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/6.jpg)
© British Telecommunications plc
Boundaries of information processing
• Physical constraints on information processing– Mass/energy, Speed of light, Location, Time
• Spatial bounds on information capacity– Quantum mechanics at the elementary particle level– Gravitational collapse into a black hole in the macro scale
• Communication is a dynamic process and requires certain energy transmitted with power P
GcR sb23
]/[ /AP
[S.Lloyd et al. Phys. Rev. 93(10) 2004]
![Page 7: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/7.jpg)
© British Telecommunications plc
Quantum effects about to emerge
Turing / von Neumann: Universal MachineWe can make computers
Landauer: Information is PhysicalComputers need cooling fans
Deutch: Information is QuantumComputers get weird
![Page 8: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/8.jpg)
© British Telecommunications plc
Where lies the problem: stop the atom
![Page 9: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/9.jpg)
© British Telecommunications plc
Quantum computingState Amplitude Probability
* (α+i β) (|α|2+|β|2)000 a = 0.37 + i 0.04 0.14001 b = 0.35 + i 0.43 0.31010 c = 0.09 + i 0.31 0.10011 d = 0.30 + i 0.30 0.18100 e = 0.11 + i 0.18 0.04101 f = 0.40 + i 0.01 0.16110 g = 0.09 + i 0.12 0.02111 h = 0.15 + i 0.16 0.05
11|0||
22
• Multitude of states, inherent parallelism• Applications:
– Search in the unsorted database– Factoring large numbers (cryptography)– Simulating quantum effects in complex
systems
1,0bit
500 qubit system:
2500 states at a single pulse
![Page 10: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/10.jpg)
© British Telecommunications plc
Information Thermodynamics & Complexity
XAx xp
xpxH)(
1log)()(
• Landauer: Any logical data processing must be accompanied by the corresponding entropy increase of the environment heat waste of at least kTln2/bit
• Equivalence between thermodynamic and information (Shannon) entropy
• Information complexity: size, dimensionality, structure• Complexity measures the cost of obtaining
information• Kolmogorov Algorithmic Complexity: the shortest
program code that can obtain the requested• Information distance: )|(),|(max),( xyKyxKyxDI
![Page 11: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/11.jpg)
© British Telecommunications plc
Data Particles – The Prime Inspiration
• Across different scales in Physics the two particle interaction paradigms are dominant:– Dynamic particle models where particles act upon each
other and/or environment and move accordingly– Static or statistic particle models where scale and
complexity forces statistical description of particles
• Both areas are now the field of our exploration towards the possibility of a synergic merger:– ITLDynamic data fields provide the whole methodology
for dealing with mobile data particles,,, while…– Kernel Machines, Information Theoretic Learning typically
treats data statically– Can a unified methodology be proposed?
![Page 12: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/12.jpg)
© British Telecommunications plc
Data Field Models
TS
TS
TSS XXNXXNXXD )1,(2),1( 11
• Distance matrix calculation
• Charged data points• Central, potential field• Attracting/repelling force
![Page 13: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/13.jpg)
© British Telecommunications plc
Electrostatic Field for Classification
![Page 14: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/14.jpg)
© British Telecommunications plc
Field generated clustering
• All the data points let free to merge in a single point• Data hierarchies arranged as time passes,• Data trajectories form dynamic clustering dendrograms
Gravity Field Lennard-Jones Potential
![Page 15: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/15.jpg)
© British Telecommunications plc
Dynamic Data Condensation
• Terabytes of complex corporate data – unexploited• State-of-the-art machine learning techniques – O(n2)• Real-time and adaptive models require frequent
retraining• Data are being condensed using:
– Random sub-sampling– Parzen density based methods– Multi-resolution spatial analysis– Hierarchical clustering models– …..
• …but dynamic data condensation is not approached• …but labelled data are not being condensed?
![Page 16: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/16.jpg)
© British Telecommunications plc
Soft Fixed Field Electrostatic Condensation Process
• Builds a soft Parzen density estimate for each class of data• Normalises and freezes the original class distribution• Electrostatic field with Gaussian relation on the distance is built:
• The data are let free to move and merge towards lower energy states yet the original field continuously guards the distribution
• Fast matrix implementation in Matlab
![Page 17: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/17.jpg)
© British Telecommunications plc
99% data reduction, 99% performance retention
![Page 18: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/18.jpg)
© British Telecommunications plc
Discriminant Function Visualisation
Quadratic
Decision Tree
)(maxarg1
xP j
C
jd
)(max1
xPD j
C
j
![Page 19: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/19.jpg)
© British Telecommunications plc
Visualisation of Classifier Fusion
• Mean
• Max
• Min
• Product
• Majority Vote
ijNi
C
jP1
1fus Fmaxarg
ij
N
i
C
jP
11max maxmaxarg
ij
N
i
C
jP
11min minmaxarg
N
i ij
C
jP11
prod maxarg
ijNi
C
jPvote 1
1vote maxarg
N
i ij
C
jP11
mean maxarg
![Page 20: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/20.jpg)
© British Telecommunications plc
Information Theoretic Learningfor data transformation
XAx
R xpxH
)(log
1
1)(
)|()(),( YCHCHYCI
BTWV
dyypcPycp
ALLV
dyypcP
INV
dyycpYCIc
Yc
Yc
Y )()(),(2)()(),(),( 222
• Mutual Information• Renyi’s Entropy• Information potentials:
ci
BTW
ci
ALL
ci
IN
i y
V
y
V
y
V
y
I
2
Information Forces
Linear Feature Transformation
[Torkkola NIPS 2001]
![Page 21: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/21.jpg)
© British Telecommunications plc
Information Theoretic Learningfor classification and clustering
• ITL Framework [Principe et al 1999]• Class label transmission [Archambeau et al 2004]: a
new generic method for classification based on ITL and Parzen density model
• Generalised information distances used for feature generation [Kaplan & Hafner 2006]
• Classification with unlabelled data using ITL-linked density divregence minimisation [Jeong et al 2005]
• Clustering by separating cell identities using MIM [Schneideman et at 2003]
• Unsupervised Clustering by MIM between data and parameters [Herschkowitz & Nadal 1999]
![Page 22: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/22.jpg)
© British Telecommunications plc
The Challenges to Tackle
• Data obesity and data quality issues• Model and data complexity control,• Multidimensional information uncertainty and fusion• Natural language processing
Zadeh’s Generalised Theory of Information Uncertainty:
Information is a generalised constraint
Most Swedes are tall: ))()(()( duuuhhGC tallbalikely
![Page 23: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/23.jpg)
© British Telecommunications plc
Particle Dynamics based Exploration Models
• Simulated Annealing – random particle exploration of the input space in the cooling environment gradually slowing particles velocity
• Stochastic Diffusion Search – random agent exploration with one-to-one communication
• Ant Colony Optimisation – spatial path optimisation inspired by ant laid pheromone trails
• Particle Swarm Optimisation – swarm dynamically led by the local best (one-to-all communication)
• Particle filters – a flexible sequential predictor based on sampling from a sequence of probability distributions using large number of particles
![Page 24: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/24.jpg)
© British Telecommunications plc
Business Vision for the Future
• Distributed data mining• Multimedia mining (voice, text, video)• Open and flexible data structures• Unified data processing framework• Online secured predictive services• Networked, evolving and adaptable software• Automated knowledge discovery• Artificial awareness: self-aware software (>2020)
![Page 25: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/25.jpg)
© British Telecommunications plc
Composite Optimisation Problem
• Each component treated separately• Lack of coordination• Modelling inconsistencies
• The challenge: Full Optimisation
![Page 26: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/26.jpg)
© British Telecommunications plc
Task Force Achievements
• Seminars given:– IDSIA, Switzerland, Prof. J. Schmidhuber– Birmingham University, Dr Peter Tino– Aston University, Prof. David Lowe
• Conferences:– ICCMSE’2007 with a paper published in American Institute
of Physics (AIP) Conference Proceeding Series
• Publications:– D. Ruta, B. Gabrys.
Reducing Spatial Data Complexity fort Classification Models. Accepted to the International Conference of Computational Methods in Sciences and Engineering ICCMSE 2007, American Institute of Physics Proceeding Series
– D. Ruta, B. Gabrys. A Framework for Machine Learning based on Dynamic Physical Fields. Accepted to the Special Issue of Natural Computing Journal on Nature-inspired Learning and Adaptive Systems
• Establishing an active group of about 20 researchers networking around the PIALM and related issues
![Page 27: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/27.jpg)
© British Telecommunications plc
• Proposal submitted to the EU 7th to merge different PIALM directions and follow up research cantred around Information Theoretical Learning and Dynamic Particle Models.
• Transforming the PIALM contacts into prospective project support group with regular meetings agenda, newsletter and closer collaborative ventures.
• Organisation of Special Sessions during related Conferences
• Further applications for networking/travel grants• Widening the scope of PIALM into several focus themes
to strengthen the link with other NISIS projects and better address changing needs of the society
PIALM Follow-up and Future Activities
![Page 28: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/28.jpg)
© British Telecommunications plc
Conclusions
• Business analytics quite disparate from state-of-the-art research in machine learning, pattern recognition etc.
• Over-complex black-box type models unusable in business applications
• Customer analytics gains on importance and the modelling tools for customer-centric service providers
• Online predictive and adaptable services soon to emerge
• Nature continues to provide inspirations for data-driven modelling and learning
![Page 29: Challenges in Physically Inspired Machine Learning (PIALM Task Force) Dymitr Ruta, PhD BT Group Bogdan Gabrys, PhD Bournemouth University](https://reader038.vdocuments.net/reader038/viewer/2022103023/56649ddf5503460f94ad8aa4/html5/thumbnails/29.jpg)
© British Telecommunications plc