system identification curve fitting
TRANSCRIPT
![Page 1: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/1.jpg)
System Identification and Curve Fitting with a
Genetic Algorithm Hierarchy
Alice E. Smith and Mehmet GulsenDepartment of Industrial Engineering
University of Pittsburgh
INFORMS Fall 1997
![Page 2: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/2.jpg)
Curve Fitting Process of approximating a closed form function to a
given data set of independent variables and dependent variable (variable selection, closed form function selection, coefficient estimation). Used for:– System identification– Judging the strength of relationship– Identifying main variables and interaction between variables– Interpolate/extrapolate to new data
![Page 3: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/3.jpg)
Conventional Approaches Various regression techniques Time series analysis Spline fitting Neural networks
![Page 4: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/4.jpg)
Genetic Algorithm Hierarchy
LowerModule
UpperModule
Function andVariable Selection
Coefficient Estimation
y c x c Cos c c x 1 1 2 3 4 22( )
y x Cos xSSE
9 234 2 123 0 093 4 8230 34627
1 22. . ( . . )
.
candidatefunctions
optimizedcoefficientsfor functions
![Page 5: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/5.jpg)
Search Structure
Lower GASearch
Data
n1 n2 n
111
Upper GASearch
Upper GAPopulation
Lower GAPopulation
![Page 6: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/6.jpg)
Genetic Search Process
InitialPopulation
Mutants
Offspring
InitialPopulation
Offspring
Mutants
FinalPopulation
( )n
( )n1
( )n2
best (n)
( )n
TopHalfSelection
UniformSelection
![Page 7: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/7.jpg)
Upper GA - Function Selection Explore the possible functional forms that could represent
the underlying relationship between independent and dependent variables of a data set
Objective Function: Minimize “adjusted” total error corresponding to the functional form. Adjustment is performed by penalizing more complex representations (more variables, higher order terms)
Stopping Criteria: Search is terminated when no improvement is observed for a specific number of generations
![Page 8: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/8.jpg)
Upper GAFunction Selection - Encoding
Tree Structure y C C x C C x C x x 1 2 13
3 4 2 5 1 2cos( )
C5
x2
+
+
*
*
1
x1 x1
C1
x1
C2
+
*
x1
cos
x2
C3
C4
![Page 9: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/9.jpg)
Upper GAFunction Selection - Penalty Function
C5
x2
+
+
*
*
1
x1 x1
C1
x1
C2
+
*
x1
cos
x2
C3
C4
[( )]number of nodesconstant
m
( ) ..145
1 05280 05
Penalty Factor = 0.05
![Page 10: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/10.jpg)
Upper GAFunction Selection - Crossover
y CC C x
C xC C x C x x 1
2 3 1
4 25 6 2 7 1 2
ln( )cos( )
C5
y C C x C C x C x x 1 2 13
3 4 2 5 1 2cos( ) y CC C x
C xC 1
2 3 1
4 25sin(
ln( ))
C5
x2
+
+
*
*
1
x1 x1
C1
x1
C2
+
*
x1
cos
x2
C3
C4
C3
+
/
x2
x1
1
sinC1
C2 C4
ln
crossover
y C C x C 1 2 13
3sin( )
Before:
After:
Parent 1 Parent 2
Offspring 1 Offspring 2
![Page 11: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/11.jpg)
Upper GAFunction Selection - Mutation
y C C x C C x C x x 1 2 13
3 4 2 5 1 2cos( )
C5
x2
+
+
*
*
1
x1 x1
C1
x1
C2
+
*
x1
cos
x2
C3
C4
mutation
y C C x C C x C x C x C x 1 2 13
3 4 2 5 1 6 1 7 12cos( ) exp( )
Before:
After:
C3
x1
+
x1
C1
C2
exp
x2
Parent 1
Mutant
randomly generated tree
![Page 12: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/12.jpg)
Lower GA - Coefficient Estimation Estimate the coefficients of a given closed form function
which minimize the total error over the set of data pointsObjective Function: Minimize total squared error
Minimize
K: number of data points
Stopping Criteria: Search is terminated when no improvement is observed for specific number of generations
Detailed results are published in “International Journal of Production Research”, Vol. 33, No. 7, 1995
( )y yi
K
actual model
1
2
![Page 13: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/13.jpg)
Lower GACoefficient Estimation - Encoding
y C C x C C x C x x 1 2 13
3 4 2 5 1 2cos( )
C1 C2 C3 C4 C5
![Page 14: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/14.jpg)
Lower GA - Selection/Breeding Parents are selected for breeding uniformly from the superior
half of the population The values of the offspring’s coefficients are determined by
calculating the arithmetic mean of the corresponding coefficients of two parents
Parent A: 45.876 32.958 12.098 -3.892 0.2356Parent B: 12.988 35.832 0.234 -12.984 2.4576
Offspring: 29.432 34.395 6.166 -8.438 1.3466
![Page 15: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/15.jpg)
Lower GA - Mutation Perturbing existing solutions to explore new regions of search
space Perturbation value is obtained by multiplying the current
population range with a random factor
C1 C2 C3 C4 C5
k C1 1 1 k C4 4 4 k C2 2 2 k C3 3 3 k C5 5 5
![Page 16: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/16.jpg)
Test Problem
C Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 MeanSd.Dv.
1 9.986 9.998 10.002 10.000 9.996 10.001 9.9970.005
2 9.999 10.000 10.000 10.000 10.000 10.000 10.0000.000
3 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
4 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
5 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
6 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
7 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
8 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
9 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
10 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
SE. 0.0017 0.000 0.0000 0.000 0.000 0.000 0.000 -
y C C x C x C x C x C x C x C x x C x x C x x 1 2 1 3 2 4 3 5 12
6 22
7 32
8 1 2 9 1 3 10 2 3
![Page 17: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/17.jpg)
Test ProblemDifferent Error Metrics
012345678
0 500 1000Number of Generations
Log1
0 of
Squ
ared
Err
or
1500
Squared ErrorAbsolute Error
Maximum Error
![Page 18: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/18.jpg)
Test Problem Different Numbers of Data Points
-8
-6
-4
-2
0
2
4
6
8
0 500 1000 1500 2000 2500 3000 3500Number of Generations
Log1
0 of
Squ
ared
Err
or
4000
25 Points
100 Points
![Page 19: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/19.jpg)
Empirical Data Sets
Five benchmark problems from the literature1. onion growth2. children growth3. sunspots4. chemical plant5. slip casting
Single variable/50 observations to 13 variables/1000 observations
Nonlinear regression, time series analysis, model identification
![Page 20: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/20.jpg)
Sunspot data from 1700 to 1995 Highly cyclic with peak and bottom values approximately
in every 11.1 years Cycle is not symmetric. The number of counts reaches to
maximum value faster than it drops to a minimum Training range: 1700-1979 Validation range: 1980-1995
Test Problem 3, Sunspot Data
![Page 21: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/21.jpg)
Functions IdentifiedM o d e l E q u a t i o n S S E
A 9)-0.2471(+2)-0.4585(-1)-1.1965( ttt 6 1 9 6 4
B2))-0.6271(-1))-p(-0.3263((-2.7260ex15.7476exp+
9))--0.3512(1.1989exp(-1)-0.8337(tt
tt 4 5 5 3 3
C9)-0.1148(+4)-0.1316(-1)-0.8064(+1)-2))(-0.8446(-
1))-0.4282(+4)-(1.4097(-0.6099cos1.2410exp(ttttt
tt 4 0 3 4 1
D
9)-0.1046(+4)-0.1413(-1)-0.8253(+1)-2)(-0.9362(-2))-0.7485(-2))-3.1756(-2))-2.8807(+
4))-(0.2561(-3.3442cos0.6979exp(+4)-(-1.4893(-0.5564cos1.6258exp(
tttttttt
tt 3 8 7 1 5
![Page 22: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/22.jpg)
Model D
0
1
2
3
4
5
6
7
8
9
10
1700 1750 1800 1850 1900 1950 2000
Year
20 x
Ann
ual N
umbe
r of S
unsp
ots
![Page 23: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/23.jpg)
Extrapolation of Model D
0
1
2
3
4
5
6
7
8
9
1980 1985 1990 1995Year
DataFitted Function
![Page 24: System Identification Curve Fitting](https://reader030.vdocuments.net/reader030/viewer/2022012909/577cce0c1a28ab9e788d295c/html5/thumbnails/24.jpg)
ConclusionsA unique approach for curve fitting problems
Provides closed form function for the given data setCan handle non-linear, discontinuous functionsFlexible in terms of error metricCan be used separately for function selection and coefficient optimizationComputationally intensive and needs a priori setting of search parameters
and penalty function componentsForthcoming paper : “A hierarchical genetic algorithm for system
identification and curve fitting with a supercomputer implementation,” Mehmet Gulsen and Alice E. Smith, Institute for Mathematics and its Applications, Volumes in Mathematics and its Applications, Volume on Evolutionary Computing.