title: time series forecasting with fcm dynamic learning ......the n-dimensional search space with...

Title: Time series forecasting with FCM dynamic learning

Title:Time series forecasting with FCM dynamic learning

Professor Jose L. Salmeron

University Pablo de Olavide (Seville, Spain)

August 10, 2016


Outline

1 FCM fundamentalsBasicsConstructionAnalysis

2 Time series forecasting proposalProblem and approachProposalResults

3 Open topics


FCM fundamentals

Outline



3 Open topics


FCM fundamentals

Basics

Fuzzy Cognitive Maps components

Representation

FCM is a soft computing technique, closely to a one-layer recurrent dynamicneural network.

FCMs consist of concepts, that illustrate different aspects in the system’s behaviorand these concepts interact with each other showing the dynamics of the system.

FCMs are represented by directed graphs capable of modelling relationships orcasualities existing between concepts. Concepts (ci) are represented by nodes andedges (eij) represent relationships between them.


FCM fundamentals

Basics

Introduction

Definitions

A FCM can be represented as a 4-tuple

Γ =< N,E, f, r >

where N is the set of nodes, E are the set of edges between nodes, f(·)the activation function and r the nodes’ range, r = {[0,+1]|[−1,+1]},N = {< ni >} where ni are the nodes.

E is represented as

E = {< eij , wij > | ni, nj ∈ V}

where eij is the edge from node ni to node nj , and wij is the weight ofthe edge eij .


FCM fundamentals

Basics

Topology

One-layer recurrentdynamical NN

FCM models asystem as anone-layer NN.

Note that it is nota neural network.Figure shows justan analogy.

Main difference with NN

Each node has ameaning in FCM


FCM fundamentals

Construction

Expert based

Experts issues

Expert-based construction is strongly dependent on theexperts’ selection and its knowledge.

Augmented Weighted FCM

Each expert could has a credibility weight (ei).

A∗ =E∑k=1

(ek∑Em=1 ek

)· Ak

Defuzzification

It is necessary to convert the fuzzy values into crisp onesfor further FCM processing. For instance, with theCenter of Gravity (CoG):

COGy =

∫x∈X

x · µA(x) dx∫x∈X

µA(x) dx


FCM fundamentals

Construction

Delphi/Augmented FCM

S. Bueno and J.L. Salmeron (2008). Fuzzy

modelling Enterprise Resource Planning toolselection. Computer Standards and Interfaces30(3), pp. 137-147.

J.L. Salmeron (2009). Augmented Fuzzy Cognitive

Maps for modelling LMS Critical Success Factors.Knowledge-Based Systems 22(4), pp. 275-278.

Delphi method

It is just for expert-based FCMs.

Experts design FCM in several rounds.

After the first one, each expert knowsthe overall data and he/she can adjusthis/her previous judgement.

Augmented FCM

It doesn’t need that experts changetheir initial judgement for consensusas Delphi methodology.

The augmented adjacency matrix(A∗ = [w∗ij ]m×m) is built adding theadjacency matrix of each data source.

If there are common nodes within theadjacency matrices, the element (w∗ij)

would be w∗ij = 1n·∑nk=1 wijk


FCM fundamentals

Construction

Data-driven

Goal

Improve the FCM model

Learning Human Data Activation LearningAlg. goal involvement type function type

DHL Adj. mat. No Single N/A HebbianBDA Adj. mat. No Single Binary modified HebbianNHL Adj. mat. Initial Single Continuous modified HebbianAHL Adj. mat. Initial Single Continuous modified HebbianGS Adj. mat. No Multiple Continuous GeneticPSO Adj. mat. No Multiple Continuous SwarmGA Initial vector N/A N/A Continuous GeneticRCGA Adj. mat. No Single Continuous Genetic

AHL/NHL human involvement

Initial human intervention is necessary, but later when applying the algorithm there is nohuman intervention needed


FCM fundamentals

Construction

Comparison

Expert-based vs. data-driven approaches

Expert-based Data-driven

Type of modeling Deductive Inductive

Main objective To create a model that isstructurally understandable

To create a model that pro-vides accurate simulations

Main application Static analysis Dynamic analysis

Main shortcoming Dynamic analysis could beinaccurate

Static analysis could be in-accurate and makes no sense


FCM fundamentals

Analysis

Static

Concept

Static analysis study the characteristics ofthe FCM weighted directed graph thatrepresent the model, using graph theorytechniques.

Density

Density (D) is the relation between thenodes (N) and the edges (E) of the model.

D =E

N · (N − 1)

High density indicates increased complexityin the model and respectively to theproblem that the model represents.


FCM fundamentals

Analysis

Static

Some measures

In-degree d+(cj)=∑ni=1 |eij |

Out-degree d−(ci)=∑nj=1 |eij |

Centrality c(ci) = d+(ci) + d−(ci)

Weighted d+w(cj)=∑ni=1 |wij |

in-degree

Weighted d−w(ci)=∑nj=1 |wij |

out-degree

Weighted cw(ci)= d+w(ci) + d−w(ci)centrality

Input node ni|d+ = 0

Output node ni|d− = 0


FCM fundamentals

Analysis

FCM Dynamics

Concept

FCM is a deterministic model. It predicts future stateevolution deterministically as shown in the updating ruleswith the expectation that the initial state vector is convergedfinally to a fixed point.

Updating nodes

ci(t + 1) = f

(n∑j=1

wji · cj(t))

ci(t + 1) = f

(n∑j=1

wji · cj(t) + ci(t)

)

ci(t + 1) = f

(k1 ·

n∑j=1

wji · cj(t) + k2 · ci(t))

1st iteration

c(0) =(c1(0) · · · c4(0)

)→ c(1) =

(f(·) · · · f(·)

)2nd iteration

c(1) =(c1(1) · · · c4(1)

)→ c(2) =

(f(·) · · · f(·)

)· · · · · ·Last iteration√∑N

i=1

(ci(t)− ci(t− 1)

)2<

tolerance︷︸︸︷ε


FCM fundamentals

Analysis

Activation functions

Bueno, S. and Salmeron, J.L. (2009). Benchmarking Main Activation Functions in Fuzzy Cognitive Maps.

Expert Systems with Applications 36(3 part 1) pp. 5221-5229.

Concept

The FCM inference process finish when the stability is reached. The FCMreaches either one of the following states following the iterations.

Fixed-point attractor c(t− 1) = c(t)

Limited cycle ∃t, k|(c(t− k) = c(t)) ∧ (c(t− k + 1) 6= c(t))

Chaotic attractor

Main activation functions

Bivalent f(ci) =

{0 if ci ≤ 01 if ci > 0

Trivalent f(ci) =

−1 if ci ≤ −0.50 if −0.5 < ci < 0.5

+1 if ci ≥ −0.5

Unipolar sigmoid f(ci) =1

1 + e−λ·ci

Hyperbolic tangent f(ci) =e2·λ·ci + 1

e2·λ·ci − 1


FCM fundamentals

Analysis

Forward-Backward

What-if analysis

Goal : Impact of c(0) over c(t)

Method : Running FCM dynamics

Population-based analysis

Method : Evolving a population of initial vectorstates ({c(0)}ni=1)

Fitness: min(err) where erri = |c(t)i − c(t)∗i |Forward analysis

Finding the optimum final vector state (c(t))

Backward analysisFinding c(0) that generates a specific c(t)


FCM fundamentals

Analysis

FCM extensions

FCM extensions

Fuzzy Grey Cognitive Maps

Rule-Based Fuzzy Cognitive Maps

Probabilistic Fuzzy CognitiveMaps

Intuitionistic Fuzzy CognitiveMaps

Dynamical Cognitive Networks

Belief Degree-Distributed FuzzyCognitive Maps

Rough Cognitive Maps

Dynamic Random FuzzyCognitive Maps

Fuzzy Cognitive Networks

Evolutionary FCMs

Fuzzy Time Cognitive Maps

Fuzzy Rules Incorporated withFCMs

Timed Automata-based FuzzyCognitive Maps


Time series forecasting proposal

Outline



3 Open topics



Model

Contributions

We propose retraining FCM dynamically, adapting it to the local characteristics of theforecasted time series.

We propose optimizing the FCM model by selecting which concepts of the FCM should beincluded within its structure.

We propose optimizing the selection of the FCM’s transformation function using a pool offunctions. After the function is selected, its parameters and thus its shape are optimized.



Problem and approach

Preliminaries

Problem

Let y ∈ < be a real-valued variable whose values are observed at a discrete time scalet ∈ [1, 2, . . . , n], where n ∈ ℵ is the length of the considered period. A time series is asequence {y(t)} = {y(1), y(2), . . . , y(n)}.

The goal of one-step ahead forecasting is to calculate y(t) = M({y(t− 1)}), where y(t)denotes the forecast and M is the forecasting model.

The challenge is to select and train such a model M that produces the lowest absolutevalues of forecasting errors calculated as e(t) = y(t)− y(t).

Approach

For learning and testing M , we use asliding window. During the periodt ∈ [start(L), t− 1], the predictive modelFCM is learned.

A single one-step ahead forecast is made asy(t). The value of start(L) is aparameter. The time step t = end(L) + 1,at which point the forecast is made, movesforward as time flows.



Problem and approach

Comparisons

State-of-art methods comparisons

Naive. The naive approach is a trivial forecasting method that assumes thatforecasts are assigned to the previously observed value, i.e.: y(t) = y(t− 1).

ARIMA. The Autoregressive Integrated Moving Average model is one of the mostpopular conventional statistic forecasting models.

ES. The Exponential Smoothing model is usually an effective method forforecasting stationary time series.

HW. The Holt-Winters method is an extended version of ES.

GARCH. Generalized Auto-Regressive Conditional Heteroscedastic is a nonlinearmodel.



Proposal

Machine Learning Algorithms

RCGA

Real-Coded GeneticAlgorithm (RCGA)generates solutions tooptimization problemsusing techniques inspiredby natural evolution, suchas inheritance, mutation,selection and crossover.

PSO

Particle Swarm Optimization(PSO) is a stochastic,population-based, andbio-inspired optimizationalgorithm, composed of aswarm of particles moving inthe n-dimensional searchspace with all the candidatesolutions

ABC

Artificial Bee Colony (ABC) is anoptimization algorithm inspired by thecollective behavior of social ant colonies.Conventional ABC algorithm uses threecontrol parameters: the number of foodsources, limit value (frequency of scout beesearch) and maximum cycle (iteration)number.

DE

Differential Evolution(DE) is a stochasticdirect search and globaloptimization algorithm.At each generation,transforms the populationinto another one whereinthe individuals are morelikely to minimize theobjective function.

SA

Simulated Annealing (SA) isa technique for solvingproblems both unconstrainedand bound-constrained. SAmodels the physical process ofheating a solid and thencooling it slowly to decreasedefects, thus minimizing thesystem energy.



Proposal

Machine Learning Algorithms

RCGA

Real-Coded GeneticAlgorithm (RCGA)generates solutions tooptimization problemsusing techniques inspiredby natural evolution, suchas inheritance, mutation,selection and crossover.

PSO

Particle Swarm Optimization(PSO) is a stochastic,population-based, andbio-inspired optimizationalgorithm, composed of aswarm of particles moving inthe n-dimensional searchspace with all the candidatesolutions

ABC

Artificial Bee Colony (ABC) is anoptimization algorithm inspired by thecollective behavior of social ant colonies.Conventional ABC algorithm uses threecontrol parameters: the number of foodsources, limit value (frequency of scout beesearch) and maximum cycle (iteration)number.

DE

Differential Evolution(DE) is a stochasticdirect search and globaloptimization algorithm.At each generation,transforms the populationinto another one whereinthe individuals are morelikely to minimize theobjective function.

SA

Simulated Annealing (SA) isa technique for solvingproblems both unconstrainedand bound-constrained. SAmodels the physical process ofheating a solid and thencooling it slowly to decreasedefects, thus minimizing thesystem energy.

Main algorithm

Require: {y(n)} = {y(1), . . . , y(n)}- historicaltime seriesreturn M - best modelwhile t ≤ n doM ← LearnModel({y(t − 1)})y(t) ← M(h = 1)) - 1-step ahead forecasting

e(t) ← ˆy(t) − y(t) - forecasting errort ← t + 1 - simulation of time flow

end whileCalculate

MAPE({e(n)}) ← 1n·∑nt=1

∣∣∣∣ e(t)y(t)

∣∣∣∣ · 100%



Proposal

Chromosome

The main idea is to optimize not only the FCM’s weights butalso the entire learning process and the other elements forincreasing the accuracy of forecasting.

FCM is applied to one-step ahead forecastingusing equation

ci(t + 1) = ft

(∑card(C)j=1,j 6=i wij · cj(t)

),

The transformation function is selecteddynamical before every forecast is made.



Results

Data

Datasets

The challenge that we faced was the mapping of FCM structure to the waterdemand time series.

The considered water demand time series involved 2254 days, with the datamissing for 78 of them. This was due to brakes in data transmission caused bydischarged batteries or other hardware problems in the data transmission channels.



Results

Acknowledgements

European Comission - FP7

The work was supported by ISS-EWATUS project which has received fundingfrom the European Union’s Seventh Framework Programme for research,technological development and demonstration under grant agreement no. 619228.


Open topics

Outline



3 Open topics


Open topics

Learning


Open topics

Learning

E.I. Papageorgiou and J.L. Salmeron (2011). Learning Fuzzy Grey Cognitive Maps using Nonlinear Hebbian-based approach.

International Journal of Approximate Reasoning, 53(1), pp. 54-65.

NHL-FGCM

NHL learning rule for FGCMs introduces a learning rate parameter, the determination ofinput and output nodes, and the termination conditions.

∆⊗ wji(k) = ηk · ⊗cj(k − 1) · (⊗ci(k − 1)−⊗cj(k − 1) · ⊗wji(k − 1))

where the coefficient ηk is a very small positive scalar factor called learning parameter.

This simple rule states that if ci(k) is the value of node ci at iteration k, and cj is the valueof the triggering node cj which triggers the node ci, the corresponding grey weight ⊗wjifrom node cj towards the node ci is increased proportional to their product multiplied withthe learning rate parameter minus the grey weight decay at iteration step k.

As a result

⊗wji(k) =

[ wji(k)︷︸︸︷wji(k − 1) + ηk · ⊗cj · (⊗ci(k)−⊗cj · ⊗wji(k − 1)),

wij(k − 1) + ηk · ⊗cj · (⊗ci(k)−⊗cj · ⊗wji(k − 1))︸︷︷︸wji(k)

]


Open topics

Learning


Open topics

Learning

W. Froelich and J.L. Salmeron (2014). Evolutionary Learning of Fuzzy Grey Cognitive Maps for the Forecasting of Multivariate,

Interval-Valued Time Series. International Journal of Approximate Reasoning 55(6), pp. 1319-1335.

Population

Data sources (final states’ setvs. time series)

The objective is to optimizethe matrix [⊗An×n] withrespect to the forecastingaccuracy.

Fitness

Fitness function depends onraw data


Open topics


Open topics

FGCM scenarios

TOPSIS-based rank

The closer scenario to thepositive-ideal scenario is thebest solution.(d+1 = d+4 ) ∧ (d−1 < d−4 )A4 � A1 � A3 � A2

TOPSIS

1 Determine the normalized decision matrix

⊗R =[⊗rij

]| ⊗ rij =

⊗xij√∑mi=1⊗x2

ij

2 Compute the weighted normalized matrix,⊗V =

[⊗vij

]| ⊗ vij = ⊗rij · ⊗wj

3 Define the positive-ideal C+ and negative-ideal C−

⊗C+ = {(maxni=1 ⊗ vij |j ∈ I+), (minni=1 ⊗ vij |j ∈ I

−)}

⊗C− = {(minni=1 ⊗ vij |j ∈ I+), (maxni=1 ⊗ vij |j ∈ I

−)}

4 Compute the distance measures.

⊗d+i =

√∑mj=1

(⊗vij −⊗v

+j

)2⊗d−i =

√∑mj=1

(⊗vij −⊗v

−j

)25 Compute the relative closeness to positive-ideal ⊗C⊕i =

⊗d−i

⊗d+i

+⊗d−i

Larger ⊗C⊕i means better scenario.


Open topics

More ...

Other ideas

FGCM learning and automatic construction with new algorithms

FGCM in control systems

FGCM in biomedical engineering

FGCM in environmental control

FGCM synaptic plasticity

. . .


Open topics

Title:Time series forecasting with FCM dynamic learning

Professor Jose L. Salmeron

University Pablo de Olavide (Seville, Spain)

August 10, 2016

title: time series forecasting with fcm dynamic learning ......the n-dimensional search space with...

Documents