lecture notes for enee660 fall 2008

8/3/2019 Lecture Notes for ENEE660 Fall 2008

1/274

Lecture Notes

in

System Theory

by

Professor John S. Baras1

Notes Used in the course

ENEE 663: System TheoryDepartment of Electrical EngineeringUniversity of Maryland, College Park

Copyright c 1980, John S. Baras. All Rights Reserved

1Electrical Engineering Department, Computer Science Department, and Institute for SystemsResearch, University of Maryland, College Park, Maryland 20742.


2/274

2 Copyright c1980, and 1989, John S. Baras. All Rights Reserved


3/274

Contents

1 GENERAL SYSTEMS 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 (B) Basic Mathematical Structures . . . . . . . . . . . . . . . . . . . . . . . 111.3 Realization Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.4 Discrete Time Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.5 Differential Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . 38

2 LINEAR SYSTEMS 452.1 Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.2 (B) Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.3 Linear Discrete Time Dynamical Systems . . . . . . . . . . . . . . . . . . . . 542.4 Linear Differential Dynamical Systems . . . . . . . . . . . . . . . . . . . . . 552.5 Transition Matrices and Their Properties . . . . . . . . . . . . . . . . . . . . 562.6 (B) Changing Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622.7 Contributions From the Input . . . . . . . . . . . . . . . . . . . . . . . . . . 65

2.8 Periodic Linear Continuous Time Systems . . . . . . . . . . . . . . . . . . . 662.9 Linear Matrix Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3 Controllability, Reachability, Observability, Reconstructibility of LCTS 773.1 Reachability and Controllability . . . . . . . . . . . . . . . . . . . . . . . . . 773.2 (B) Cayley-Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 823.3 Reachability and Controllability of LTIS . . . . . . . . . . . . . . . . . . . . 863.4 Observability and Reconstructibility . . . . . . . . . . . . . . . . . . . . . . . 90

4 REALIZATION THEORY FOR LCTS 994.1 (B) Computational Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 99

4.2 Realization Theory for LCTS . . . . . . . . . . . . . . . . . . . . . . . . . . 1064.3 Realization Theory for LTIS . . . . . . . . . . . . . . . . . . . . . . . . . . . 1164.4 (B) Reduction of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1254.5 Minimal Realizations for LTIS . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5 Polynomial Matrices and Transfer Function Methods 1395.1 Preliminary Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1395.2 (B) Polynomial Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1445.3 Polynomial Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Copyright c1980, and 1989, John S. Baras. All Rights Reserved 3


4/274

CONTENTS CONTENTS

5.4 Multivariable Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1745.4.1 Multivariable Companion Forms . . . . . . . . . . . . . . . . . . . . . 174

5.5 Matrix Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

6 Design Problems 227

6.1 Controllability, Feedback and Pole Assignment . . . . . . . . . . . . . . . . . 2276.2 Asymptotic State Estimation and Dynamic

Observers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2376.3 Dynamic Compensation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249



5/274

Chapter 1

GENERAL SYSTEMS

1.1 Introduction

In this chapter we attempt to introduce the students to some fundamental but abstractconcepts of general systems theory. These concepts play a significant role in modern mathe-matical studies in system theory and provide the underlying uniform framework for variousclasses of systems. Although abstract and general, they help clarify some standard con-structions and lead to connections with other fields, notably Automata Theory. Most of thematerial in this first chapter will not be used directly in the course, and it is therefore mainlypresented for the student interested in further deep study in structural properties of systems.Nevertheless, we will build up a few ideas introduced here in later chapters. The student isadvised that a complete understanding of some of these ideas will be impossible at the firstintroduction, and that a frequent return to this first chapter, as the course progresses, willprove to be highly beneficial.

1.1.1 Definition: A system , is an entity that accepts inputs and produces outputs.

This is a remarkable vague definition! We usually think of inputs as variables we canmanipulate, while outputs represent observations.

input output

Figure 1.1.1: A general system as a black box

Inputs (stimuli) are functions on a parameter set T (usually having the interpretationof time) taking values in a set U which we shall call the input value set. We shall denoteby U the set of all inputs that our system accepts. Typically for this course, T will beeither a subset of the integers (discrete time), or an interval (a, b) from the real numbers(continuous time). The input value set U can be quite general, but again typically for thiscourse will be either:

i. a finite set



6/274

1.1. INTRODUCTION CHAPTER 1. GENERAL SYSTEMS

ii. a subset of the real numbers Riii. a subset of the complex numbers C

iv. a subset of the real n-dimensional Euclidean space Rn

v. a subset of the complex n-dimensional Euclidean space Cn

In cases ii) and iii), we have single-input systems, while in cases iv) and v), we havemulti-input systems. When we are facing a continuous time problem, the inputs (inputfunctions) u() are assumed to be at least piecewise continuous functions on T with valuesin U.

Some typical examples of inputs are: input voltage of an amplifier, time variation ofa traffice light, the current in the magnetron of a radar transmitter, time variation of thecontrol rod setting in a nuclear reactor, the n-vector of input currents in a 2n-port network,the priority settings in a network switch.

Outputs (effects, responses) are similarly functions on a parameter set T taking valuesin a set Y which we shall call the output value set. We shall denote by Ythe set of alloutputs that the system can produce. Our comments on inputs are valid for outputs aswell.

Some typical examples of outputs are: output voltage of an amplifier, traffic flow detectordata, radar return, neutron flux in a nuclear reactor, the n-vector of output currents in a2n-port network, the total throughput from the network switch.

To completely characterize a system, we generally need a table that lists all outputs forall possible inputs. That is, we need to know the map

f :U Y

which assigns an output to every input that the system accepts.

1.1.2 Definition: The map f is called the input-output map of the system .

1.1.3 Definition: The tripleI/O = {U, Y, f} consisting of the set of admissible inputsU,resulting outputsYand the input-output map f is called the input-output or externaldescription of the system . That is, the external description ignores the underlying structure

of the system and corresponds exactly to the traditional black box description of systems.Some remarks are now in order. It is obvious that the external description cannot be

completely obtained from experimental data for nontrivial systems. Simply put, it asks fortoo much. Typically, we know the response of the system to some characteristic test in-puts, or we have some input-output data base. It is a highly nontrivial and still basicallyunresolved problem to obtain f from such a partial description. The solution typicallyemploys interpolation, additional information from the underlying discipline that character-izes the system (i.e., is it a network, a chemical reactor, a biological system, an economicssystem, etc.) and gives at best an approximate input-output map. The construction of



7/274

CHAPTER 1. GENERAL SYSTEMS 1.1. INTRODUCTION

input-output descriptions that fit experimental data is the basic question in modeling.It is hard to overemphasize the difficulty associated with the construction of input-outputmodels from input-output data. To illustrate the point, consider the relatively simple prob-lem of computing the impedance of an L,R,C, network (a standard experimental problem inundergraduate Electrical Engineering Laboratories). You typtically use as test inputs sinu-

soidal currents at various frequencies and you record the amplitude and phase of the outputvoltage osciallation. Finally, to obtain the impedance you interpolate between data points sothat you can fit a specific type of frequency domain function. It is also important to realizethe dependence of the input-output data base (and therefore of the resulting input-outputmap) on the initialization or preparation of the system for the experiments that produce theinput-output data base. To obtain a consistent input-output data base the same preparationmust be followed prior to each experiment. Put it simply, a system is not described by asingle input-output map, but rather by a family of input-output maps parametrized by theinitial preparation procedure.

The final point that we must remember is that there are many input-output models

that can be built from a given input-output database. The efficient utilization of additionalinformation, engineering judgment (or intuition from other sciences related to the system)and careful consideration of the purpose for the particular model building task, influencedramatically the further success and progress of a control systems engineers work.

Often, we are forced to consider inputs and outputs in more general function spaces fora variety of reasons. Although this is not the case for this course it is useful to realize thatthere are two requirements on any candidate for Uor Y. First observe that any part of anadmissible input should also be admissible. That is if the input (or output) in Figure 1.1.2a)is admissible then the input (or output) in Figure 1.1.2b) should be also.

t ta) b)

Figure 1.1.2: Illustrating slicing inputs or outputs

So U and Ymust be closed under slicing. Second if u1, u2 are two inputs with finitetime record we should obtain an admissible input if we first apply u1 and then follow with u2.This operation is called concatenation and is denoted by u1

u2. That is if the inputs (our

outputs) in Figure 1.1.3a) are admissible then so is the input (or output) in Figure 1.1.3b).

u l

t i1 t e1 t i2 t e2

u2

t i1 te1 i2t + (t -t )e1 e2

u2ul

b)a)

Figure 1.1.3: Illustrating concatenation of inputs or outputs



8/274


So Uand Ymust be closed under concatenation. By the usual abuse of notation weuse u for an element of U (a function) or an element of U and similarly for outputs. Themeaning will be clear from the context.

Note: The space of continuous functions is not closed under slicing. If we wantthe system to accept continous functions, real valued say, on the interval [0,1]we must make Uat least as big as piecewise continuous functions on [0,1].

In this course we will only consider input-output descriptions corresponding to the sameinitial preparation. That is we will only consider initialized systems. We have now thefollowing classification of input output maps of initialized systems:

1.1.4 Definition: Classification of input-output maps:

i. A. linear, if U,U, Y, Y, are vector spaces and f(u1 + u2) = f(u1) + f(u2)for all , in a field F and u1, u2U (for definitions of some of these mathematicalstructures, see Section 1.2(B)),

B. nonlinear, otherwise,

ii. A. memoryless, if f(u)(t) = g(u(t), t) for all tT,uU,B. with memory, otherwise,

iii. A. time-invariant, if y = f(u) implies y = f(u) for all T, where u(t) = u(t+).B. time-varying, otherwise,

iv. A. causal if u1(t) = u2(t) for t t implies y1(t) = y2(t) for t tB. noncausal, otherwise,

Although real world systems are causal, there are many examples of noncausal systemsin algorithmic, non real-time, data processing schemes.

In view of the difficulties associated with obtaining and storing the input-output mapof a system f, other ways of describing a system have been created. The most importantalternative is the state space model or internal model for the system . Critical here isthe notion of state variable (or internal parameters).

1.1.5 Definition: The state of a system is the totality of certain variables, called statevariables, which if known allow the determination of future outputs given future inputs.

There are several important remarks that should be made about this concept. The staterepresents the preparation of the system or its past history. For the definition of thestate it is necessary that the parameter set T has a total ordering according to which futureand past are defined. Many different sets of variables, of different cardinality, may qualifyas state variables.

Typical examples of state variables are: velocity and coordinate vectors in classical me-chanics, capacitor voltage and inductor current in electrical network theory, register contentor memory content in computer systems. It is often useful to observe that state variablesare always associated with memory components of complex systems.



9/274


1.1.6 Definition: A state space model or an internal description of a system is aquintupleM = {U, Y, X, , } whereU, Yare the input and output sets as before. X is aset whose elements are the states and is called the state space, is the state transitionmap and is theread out map. If we let R2+ denote the subset ofR2 : {(t0, t1); (t0 t1)}then

: R2+ X U Xand

(t1; t0, x0, ut0 ,t1 ) = state of the system at time t1, starting at state x

0 at time t0and applying input ut0 ,t1 for the time interval (t0, t1)]

where ut0 ,t1 is the segment of the input u for the interval t0, t1.

: R X U Yand

(t,x,u(t)) = output at time t due to state x and current input value u(t).

Finally the following properties must be valid:

i. (t0; t0, x0, u) = x0 for any u U(consistency).

ii. (t2; t1, (t1; t0, x0, ut0 ,t1 ), ut1 ,t2 ) = (t2; t0, x0, ut0 ,t2 ) (semigroup property orcomposition rule).

iii. (t1; t0, x0, u1t0 ,t1) = (t1; t0, x

0, u2t0 ,t1) whenever u1(t) = u2(t) for t (t0, t1) (deter-

minism).

The internal description is an attempt to exploit the inner structure of a system. Usu-ally we obtain state space models from the description of a system components and theirfunctional interconnections via the underlying physical, mathematical, or other theories thatprovide models for the elemental components of the system. This is the classical way of ar-riving at state space models and relies heavily on the discipline of the particular system (i.e.,

is it an electrical network or an aircraft). State space models constructed in this fashion al-ways utilize state variables that have an explicit physical identity (like voltage or a particularcapacitor). The modern point of view is that a state space model is just a codification ofthe input-output description of a system. As a result states are treated as abstract entitieswhich may not have direct physical interpretation. So a state space model is just a morecompact representation of input-output data and may actually be completely different fromthe true internal structure of a system as derived from the physical theories that describe itselemental components. There are many advantages in this abstraction and we mention justthe most important ones according to our opinion: a) in design problems the designer can



10/274


choose a convenient abstract state space model which facilitates the solution and then decideabout the technology to be utilized in the implementation, b) aggregation can be achievedby building abstract state space models for input-output observations from a small numberof input, output terminals of a very complex system, c) once ways of coding a given f in astate space model have been understood, efficient design procedures result.

1.1.7 Definition: The function (; t0, x0, u) is called the state trajectory (or flow, ororbit) under control (input) u.

Notice that the output, given the state x0 at time t0 and the input after t0, is expressedas

y(t0, x0, u)(t) = (t, (t; t0, x

0, ut0 ,t ), u(t)),

which is defined for t t0. Observe also that according to our Definition 1.1.6, the statevariables must be chosen so as to make the readout map memoryless (c.f. Defini-tion 1.1.4 ii) A.).

We have now the following classificiation:

1.1.8 Definition: Classification of state space models:

i. A. linear, if U,U, Y, Y, X are vector spaces and , are linear in x and u,B. nonlinear, otherwise

ii. A. memoryless, if X is empty,B. with memory, otherwise

iii. A. time-invariant, if does not depend explicitly on t and (t + ; t0 + , x0, u) =(t; t0, x

0, u) for all T, where u(t) = u(t + ).B. time-varying, otherwise.

It should be evident that there are many state space models for a given system . Inorder to be able to choose the most appropriate one for the problem at hand and in orderto understand the structure and construction of state space models we now describe theirfundamental qualitative properties.

1.1.9 Definition: A state x1X is reachable from (x0, t0) if finite t1 t0 and u() Usuch that

(t1; t0, x0, ut0 ,t1 ) = x

1

1.1.10 Definition: The state space model is reachable from (x0, t0) if every state x1 is

reachable from (x0, t0).

1.1.11 Definition: A state x1X is controllable to (x0, t0) if finite t1 t0 and u() Usuch that



11/274


(t0; t1, x1, ut1 ,t0 ) = x

0

1.1.12 Definition: The state space model is controllable to (x0, t0) if every state x1 is

controllable to (x0, t0).

The student is warned that the properties of reachability and controllabilityare not the same!!!

1.1.13 Definition: The state space model M is connected if given any pair x0, x1 X

we can find t0 t1 and u() Usuch that

(t1; t0, x0, ut0 ,t1 ) = x

1

Here are some practical questions related to controllability (as we shall see):

a. drive a particular nuclear reactor from a given excitation level to another excitationlevel.

b. drive a spacecraft from a given orbit around the earth to another orbit around theearth.

c. space craft rendezvous problems.

d. make the output of the system follow a predetermined time history.

e. design a control law that will eliminate the effects of exogenous disturbances on the

behavior of the system.

These questions are all related to analyzing the effects on the state as we vary the controls(inputs). Often we have available information about inputs and outputs and want to inferabout the state of the system. Questions like these are dual to a) - e) above.

1.1.14 Definition: x1, x2X are at t0 in the same observation class with respect toinput uUif y(t0, x1, u) = y(t0, x2, u).1.1.15 Definition: x1, x2X are at t0 in the same observation class if they are in thesame observation class with respect to any input function u

U.

In the first case by using u() and observing the output record in the future we cannotdistinguish between x1, x2. In the second case this is true for any input.

A natural question is: Which is a larger set, observation classes or observation classeswith respect to a specific input u? The answer is immediate if we note that an observationclass with respect to a specific input can be further subdivided by other inputs.

1.1.16 Definition: M is observable at t0 with input uU if no pairs of states x1, x2

belong at t0, to the same observation class with respect to u.



12/274


1.1.17 Definition: M is observable at t0 if no pairs of states x1, x2 belong at t0, to the

same observation class.

The following remarks should facilitate the understanding of these notions.

M observable at t0 with input u

given any two states x1, x2X the output segmentsdue to u are different (this property is also calledfixed input observability). That is, we can distinguishbetween any two states using the same fixed inputfunction u for any pair of states.

M observable at t0 given any two states x1, x2 X some input u such that thecorresponding output segments y(t0, x

1, u) = y(t0, x2, u). It is importantto realize that the input u used can be different for different pairs(i.e., u can depend on the particular pair of states).

Therefore we obviously have

M observable at t0 M observable

with fixed input u at t0

(1.1.1)

Here is a straightforward proof of (1.1.1):

If M not observable a pair x1, x2 which belongs to the same observation class forany input.

this same pair x1, x2 belongs to the same observation class withrespect to u.

M not observable with respect ot u.

The meaning of (1.1.1) is that observability with fixed input is a stronger property (moredifficult to be satisfied) than observability. From the practical application point of view it ismuch easier to distinguish states in a state space model having the former property than thelatter. All you need to do is observe the outputs due to the same input u, while initializingthe system in a different way.

A simple example should illustrate these ideas. Consider the system described by the

diagram below. Here U = {1, 2}, Y = {a, b}, X = {1, 2, 3}. The arrows indicate the statetransition that occurs when the input indicated on the arrow (i.e., the number) is applied.The letters or the arrows indicate the resulting outputs.



13/274


21

3

2,a

2,b1,b

1,a2,b

1,a

Figure 1.1.4: Diagram for system in example

Is observable? It is easy to show that it is. What about fixed input observability?That is, does input function (here a sequence) that distinguishes between any pair ofstates?

Let us look at an input string, see what elements it should contain, and in what order in

order to satisfy the requirements for fixed input observability.. . . . . . . . . .Cannot be 1 because no matter what the rest is, states 1,3 are indistinguishable.

So the first element has to be 2.

Input string looks like

2. . . . . . . . . .

Let us look at 2 1. . . . . . . . . .

If initial state is 23 output string is b ab a and state transition is 2 13 1. Therefore no matterwhat the rest is, states 2,3 are indistinguishable. It is easy to show that if the input string is2 2 ... then states 2,3 are again indistinguishable. Since every u is either 1u

or 2u there

does not exist fixed u which can distinguish between any pair of states.

A question related to the ideas above is to determine the current state knowing pasthistories of input and output.

1.1.18 Definition: x1, x2X, belong at t0 in the same reconstruction class with re-spect to input u U if y(, x, u) = y(, x, u), on t t0, for all x such that(t0; , x

, u) = x1 and x such that (t0; , x, u) = x2.

1.1.19 Definition: x1, x2X belong at t0 in thesame reconstruction class if they belongin the same reconstruction class with respect to any u U.

In the first case we cannot determine current state from past output observations, usingu as input. In the second case we cannot do it, no matter what input we use.

1.1.20 Definition: M is reconstructible at t0 with input u if no pair of states

x1, x2 belong at t0 to the same reconstruction class with respect to u.



14/274


1.1.21 Definition: M is reconstructible at t0 if no pair x1, x2 belong at t0 to the same

reconstruction class. It is again easy to show that

M reconstructible at t0 M reconstructible (1.1.2)

with respect to fixed input u

at t0

meaning that the former property is stronger.

Here are some practical questions related to observability and reconstructibility:

a. Given a system, design another dynamical system, which estimates the state of thefirst (needed in many control laws where we cannot directly observe the state) e.g.observer theory.

b. In output feedback controllers, eliminate the effect of output disturbances.



15/274

CHAPTER 1. GENERAL SYSTEMS 1.2. (B) BASIC MATHEMATICAL STRUCTURES

1.2 (B) Basic Mathematical Structures

There are certain mathematical structures that play a very important role in recent develop-ments in system theory and particularly in algebraic system theory. In this section we havecollected some introductory material about these algebraic structures and in later sections

the relations to some fundamental system theoretic questions will be briefly illustrated. Moredetails can be found in the appropriate references at the end of this chapter.

1.2.1 Definition: A group G is a set together with a binary operation

: G G G(a, b) a b

the group multiplication which has the following properties:

i. it is associative; (a b) c = a (b c), a,b,c Gii. G has an identity element e; e a = a e = a, a G

iii. a G, there exists an inverse element a1; a1 a = e.

Whenever a b = b a, i.e., the multiplication is commutative, the group is called abelianor commutative.

1.2.2 Definition: A group-homomorphism of a group

Ginto another group

G is a map

such f(a b) = f(a) f(b) for alla, b G and such that the identity element of G is mappedin the identity elements of G . The kernel of a group-homomorphism is the inverse imageof the identity element of the image group. A one-one onto group homomorphism is calleda group-isomorphism.

We now give some representative examples:

a. Z, the integers with addition (abelian).b. The set of all n n nonsignular real matrices denoted by GL(n, R) (nonabelian).

c. The set of all n n orthogonal matrices with determinant 1, denoted by SO(n) (non-abelian).d. Let S be any set. A(S) is the set of all one to one, onto (or bijective) mappings of S

onto itself. A(S) is a group. Note that if S has finite number of elements, then one toone implies onto and vice versa.

e. A particular example of the above is the case where S has n elements. Then A(S)has n! elements and is denoted by Sn; the symmetric or permutation group ofdegree n.



16/274

1.2. (B) BASIC MATHEMATICAL STRUCTURES CHAPTER 1. GENERAL SYSTEMS

f. When a group has a finite number of elements, then it is called finite. The number ofits elements is the order of the group. An example is provided by Zn = Zmod n; theset of numbers we obtain from Zif we identify two integers x, y, whenever x y = kn(an integer multiple of n). So Zn = {0, 1, 2, , n 1}. Zn is a finite group withaddition mod n. The multiplication table for

Z3 =

{0, 1, 2

}is for example:

0 1 20 0 1 21 1 2 02 2 0 1

1.2.3 Definition: Let A be a set and R a subset of A A. We callR a relation and wewrite aRb (meaning a is related to b) whenever (a, b) R. R is an equivalence relationwhenever:

i. a

Ra; reflexivity

ii. aRb implies bRa; symmetryiii. aRb and bRc imply aRc; transitivity. Note that we usually replace aRb by a b.

1.2.4 Definition: Let A be a set with an equivalence relation R. Theequivalence classof an element a A under R is the subset {b A such that bRa} and will be denotedby [a]R.

Then we have the following well known.

1.2.5 Theorem: The distinct equivalence classes of an equivalence relation R on a set Aprovide a decomposition ofA as a union of mutually disjoint subsets. Conversely, to everydecomposition of A as a union of mutually disjoint, nonempty subsets, there correspondsan equivalence relation of A for which (equivalence relation) these subsets are the distinctequivalence classes.

Some examples are: Z with a b i f f n divides (a b), R2 with x y i f f x R2 = y R2 (where x R2 =

x21 + x

22).

Although the concept of a group is of fundamental importance for many engineering,mathematics and physics problems, the structure that plays a major role in system theoryis that of a semigroup.

1.2.6 Definition: A semigroup Sis a set together with a binary operation : S S S

(a, b) a bwhich is associative, i.e. (a b) c = a (b c). If Shas an identity (c.f. Definition 1.2.1),we callSamonoid. A monoid-homomorphism and monoid-isomorphism are definedexactly as group homomorphisms and group isomorphisms (see 1.2.2 Definitions).



17/274


Some examples are:

a. Zwith the operation being integer multiplication.b. Zn with the operation being integer multiplication mod n. For example, the multipli-

cation table forZ

3 is:

0 1 20 0 0 01 0 1 22 0 2 1

c. {n Z,s.t. n > 1}, form a semigroup with integer multiplication, which is not amonoid.

1.2.7 Definition: A subset H of a group G is a subgroup of G, if under the operation ofG, H is a group. Similarly, a subsemigroup S

of a semigroup Sis a subset of Swhich isa semigroup under the operation ofS.1.2.8 Definition: An element r of a semigroup S is called a left (or right) identity ifr.s = s(s.r = s) for all s S. It is an idempotent if r2 = r.1.2.9 Definition: A subset A of a semigroup S is a left (or right) ideal of S ifS A A(AS A)). We sayA is a proper ideal ifA = S.1.2.10 Exercise: Show that a semigroup Sis a group iff it has no proper left or right ideals.

1.2.11 Definition: A ringR is a set together with two binary operations called ring mul-tiplication and ring addition,

: R R R(a, b) a b

+ : R R R(a, b) a + b

satisfying the following conditions:

i. with respect to addition, R is an abelian group,ii. the multiplication is associative and has a unit element,

iii. for all a,b,c, R, we have (a + b) c = a c + b c and a (b + c) = a b + a c (this iscalled distributivity).

1.2.12 Definition: The group of units of R denoted by R, is the subset of all elementsofR which have both left and right multiplicative inverses.



18/274


1.2.13 Definition: A ringR is commutative if a b = b a for all a, b R.1.2.14 Definition: Elements r, s R are zero divisors if r = 0, s = 0 (0 here is theidentity element of addition) and r s = 0. A commutative ring with no zero-divisors iscalled entire (or integral domain).

1.2.15 Definition: A ring is adivision ring if its nonzero elements are all invertible (i.e.,form a multiplicative group).

1.2.16 Definition: A field Fis a commutative division ring.1.2.17 Definition: A subring B of a ringR is a subset ofR which is an additive subgroupofR, contains the multiplicative identity and is such that a, b R implies a b B.1.2.18 Definition: A left (right) ideal A of a ringR is an additive subgroup of R suchthat R A A(A R R). A two-sided ideal is a subset ofR which is both a left and rightideal.

1.2.19 Definition: If R is a ring and a R, then Ra is a left ideal A, called principal.We say that a is thegenerator of A over R. A commutative ring such that every ideal isprincipal is called aprincipal ring.

1.2.20 Definition: A map f from a ringR to a ringR is a ring-homomorphism if:i. f(a + a) = f(a) + f(a),

ii. f(a a) = f(a) f(a),

iii. f(1) = 1,

iv. f(0) = 0.

for all a, a R.The kernel of a ring-homomorphism is its kernel viewed as an additive homomorphism.

A one-one, onto ring homomorphism is a ring-isomorphism.

1.2.21 Theorem: Let R be an entire ring and consider the set of all pairs (a, b) wherea, b R and b = 0. The relation (a, b) (c, d) iff ad = bc, is an equivalence relation onthis set and let (

R)1

Rdenote the set of equivalence classes. Define two operations on

(R)1R via:i. [(a, b)] + [(c, d)] = [(ad + bc, bd)].

ii. [(a, b)] [(c, d)] = [(ac, bd)].Then (R)1R is a field, usually called the field of quotients of RProof: See Lang, pp. 66-69.

We now give some examples of rings and related concepts:



19/274


1. R is the set of integers, positive, negative and 0, + is the usual addition and theusual multiplication of integers. R is a commutative ring.

2. R is the set of rational numbers under the usual addition and multiplication of rationalnumbers. This is the field of quotients of the previous ring.

3. R is Z7 under addition and multiplication mod 7. This is a finite field.4. R is Z6 under addition and multiplication mod 6. This is a commutative ring but not

a field. For example, 2.3 = 0 mod 6. So it has zero divisors.

5. Let Q be the set of all symbols 0 + 1i + 2j + 3k where 0, 1, 2, 3 are realnumbers. The operations are defined via

(a) (0 + 1i + 2j + 3k) + (0 + 1i + 2j + 3k)= (0 + 0) + (1 + 1)i + (2 + 2)j + (3 + 3)k

(b) (0 + 1i + 2j + 3k)(0 + 1i + 2j + 3k)= (00 11 22 33) + (01 + 10 + 23 32)i+(02 + 20 + 31 13)j + (03 + 30 + 12 21)k

Notice that the formula for multiplication results from multiplying out formally andcollecting terms using the relations: i2 = j2 = k2 = ijk = 1, ij = ji = k,jk =kj = i,ki = ik = j. This is the ring of real quaternions, very useful in applica-tions in mechanics of rotational motion of rigid bodies. It is a division ring, but thecorresponding multiplicative group is nonabelian, so it is not a field.

6. Let R be the set of all polynomials (z) in the indeterminate z with coefficients in afield K, with the usual addition and multiplication of polynomials. This is denoted byK[z] and is an entire ring. The units of K[z] are the polynomials of degree 0.

7. The field of quotients of the previous ring is the field of rational functions over Kdenoted by K[z].

8. A formal power series with coefficients in K and indeterminate z is an infinite sequence

(0, 1, )

which we denote formally as

o

izi

Defining addition and multiplication as in the polynomial case, we obtain a ring whichis denoted by K[[z]]. The units of K[[z]] are power series with 0 = 0.



20/274


1.2.22 Definition: A vector space X over a field F is a set with a binary operation +called vector addition

+ :X X X(x, y) x + y

and an operation called scalar multiplication

: F X X(, x) x

such that:

i. (X, +) is an abelian groupii. (x + y) = x + y

iii. ( + ) x = x + xiv. ( x) = () xv. l x = x

for all , F, x , y X.Usually we refer to elements of

Xas vectors and to elements of

Fas scalars. We

note also that there are four binary operations in the definition above: multiplication andaddition in the field F, vector addition and scalar multiplication. Vector spaces will be themost frequent structure in this course. We give some representative examples.

1) Let F be a field and K a field which contains F as a subsfield. We can consider Kas a vector space over F, using as vector addition the addition of elements of K anddefining the scalar multiplication operation x for F, x K as the product of and x under the multiplication operation of K.

2) Let Fbe a field and let Vbe the totality of all ordered n-tuples (1, , n) with i F.We define vector addition by (

1,

, n

) + (1,

, n

) = (1

+1,

2+

2,

n

+n

),and scalar multiplication by (1, , n) = (1, , n) for F.We will use the notations Fn for this vector space. The most common ones of thistype are the real and complex Euclidean spaces IRn and Cn.

3) Let F be a field and consider F[z] the set of polynomials in the indeterminate zwith coefficients from F. Vector addition is usual addition of polynomials and scalarmultiplication results in a polynomial with all its coefficients multiplied by the scalarfrom F. This makes F[z] a vector space over F.



21/274


1.2.23 Definition: A subset Wof a vector space X over F is a subspace of X if it is avector space overFunder the operations ofX.1.2.24 Definition: Let X, Ybe vector spaces over the same field F. A mappingf from Xinto Yis a vector space homomorphism ifi) f(x1 + x2) = f(x1) + f(x2)

ii) f(x1) = f(x1)

for all x1, x2 X, F If f is in addition one-to-one, onto it is called a vector spaceisomorphism.

If we view scalar multiplication as defining an operation of Fon X, we arrive at a veryuseful generalization of the notion of a vector space.

1.2.25 Definition: Let R be a ring. A left module over R or a left R-module M, isan abelian group under an operation + called vector addition, together with an operation

of R on M, (i.e., for every r R and m in M, there is an element denoted by rm in Mwhich represents the action of r on m) such that:

i. r(m1 + m2) = rm1 + rm2

ii. r(sm) = (rs)m

iii. (r + s)m = rm + sm

for all r, s R and m, m1, m2 M. Similarly, we define a right R-module.We note that a left (or right) module over a field is a vector space. Modules play a very

important role in modern developments of system theory and appear quite naturally in manyapplications. We give some representative examples:

1. Every abelian group G is a module over Z (the ring of integers). Vector addition isthe group multiplicatin in G, and we define na for n Zand a G to be the productof a with itself n times under the multiplication ofG.

2. Let R be any subring of n n matrices and M be Rn. Vector addition is addition inRn and the operation ofR on Rn is the usual matrix operation on vectors from Rn.

1.2.26 Definition: An R-submodule Nof M is an additive subgroup of M such that forr R and n N then rn N.1.2.27 Definition: Let M and N be left R-modules. An R-module homomorphism(or R-linear map) is a map f from M into Nwhich is an additive group homomorphismand such that

f(rm) = rf(m)

for all r R, m M. Similarly, for right R-modules. Iff is one-to-one is called R-moduleisomorphism.



22/274

1.3. REALIZATION THEORY CHAPTER 1. GENERAL SYSTEMS

1.3 Realization Theory

In this section we give an introductory discussion of a subject that is of fundamental im-portance in modern system theory and which was initiated through the pioneering work ofR.E. Kalman in the early 60s. If we have a description of a system

via a state space

model M, then it is very easy to describe the input-output map f of this system. Indeedfrom Section 1.1 it follows that the output value at time t due to input u is given by

y(t) = (t, (t; t0, x0, ut0,t), u(t)) (1.3.1)

and belongs to Y. The map defined thus from input segments to outputs segements

ut0, yt0, (1.3.2)

is the input-output map f of the system. Therefore to construct I/O from M is veryeasy.

The question of interest to us is the converse one. That is how to construct

M fromI/O ? This is the subject of realization theory and has far reaching consequences in such

diverse areas as: modeling, practical design of electronic control systems, stochastic pro-cesses, detection and filtering, analysis of complexity of algorithms, mathematical physics,computer science. Most of modern algebraic or mathematical systems theory is occupiedby questions related to realization theory. For further details we refer to Kalman, Falb andArbib or to S. Eilenberg in the references at the end of this chapter.

It is obvious (to illustrate again a point made in Section 1.1) that M can produce afamily of input-output maps parametrized by the initial state. We shall assume as was men-tioned before that we have an initialized system, i.e. fixed preparation for experimentationor existence of a reset, and therefore a unique input-output map. Realization Theory hasmany points in common with Automata Theory and in particular Algebraic Theory ofMachines. We find it instructive to give a brief exposition of the simplest kind of realizationtheory in order to indicate the goals of more complex theories. The simplest above refersto the case of finite (state) automata or machines.

1.3.1 Definition: A machine (or automaton) is a quintuple

M = {U,Y,X,a,c}

where, U is a finite set, input value setY is a finite set, output value setX is a set, the set of statesa : (X U) X the next state mapc : (X U) Y the next output map



23/274

CHAPTER 1. GENERAL SYSTEMS 1.3. REALIZATION THEORY

with a, c satisfying properties identical to , in Definition 1.1.6. When the set of states Xis finite, we say that we have a finite state automaton or machine.

Clearly, comparing with Definition 1.1.6 we see that a machine is a special case of a statespace model. In more familiar terms we have a difference equation:

x(k + 1) = a(x(k), u(k)) (1.3.3)

y(k + 1) = c(x(k), u(k))

Let U denote the set of all sequences from U. Clearly U = Uthe set of all admissibleinputs. Concatenation turns U into a semigroup (see Definition 1.2.6) and since the emptystring, denoted by , is the identity for concatenation, U becomes a monoid (see Definition1.2.6). In the sequel we will denote concatenation by juxtaposing consecutiveinputs for ease of notation. Notice that a and c can be immediately extended to maps

X U XX U Y

respectively, via

a(x, u1u2) = a(a(x, u1), u2), etc. (1.3.4)

c(x, u1u2) = c(a(x, u1), u2), etc.

c(x, ) is not defined and a(x, ) = x.We use the same letters for the new maps, since the meaning should be clear from the

context.Every finite initialized machine induces thus an input-output map:

f : U Y (1.3.5)

which is described by the following

f(u1u2) = c(a(x0, u1), u2), etc. (1.3.6)

Realization theory for this type of systems considers the problem: Given an input-outputmap f = U

Y when can we find an initialized finite (state) machine to realize it?

1.3.2 Definition: (X, x0, a , c) is called a realization of f.

1.3.3 Definition: A subset R of U is realizable if and only if the characteristic function

R = U {0, 1} is realizable.

This is related to language theory. How many of the subsets of U are realizable?



24/274


Notice also that f can be thought of as a map

f : U Y (1.3.7)via f(u) = (f(a), f(ab), f(abc), f(abcd)), when u = abcd

which is exactly the input-output map introduced in 1.1. To answer the realization questionwe associate with f an equivalence relation on U via.

1.3.4 Definition: u1, u2U, are equivalent in the sense of Nerode if and only iff(u1) = f(u2) for all U. Then we write u1Efu

2. The length of u1, u2 need notbe the same. We thus can talk about the set of Nerode equivalence classes (of U underthe equivalence relation Ef, see Definition 1.2.3, 1.2.4, and Theorem 1.2.5) which is denotedby U/Ef. We denote as usual, by [x]Ef an element of U

/Ef. Notice also that if xEfythen xuEfyu for all uU

. Given now an input-output map f : U Y we not only wanta realization of f, but we also want a simple realization of f. If we think of f as theobserved data, then we want a realization which accounts for the observed data in

as simple a way as possible. Caution should be exercised since simple may have differentinterpretations:

i. One idea is to have as simple state set as possible. In our particular setting this meansno redundant memory devices. Let us first give an intuitive interpretation of this kind ofsimplicity and then we will formalize it mathematically. First of all we do not need anystates that cannot be reached from our initial state x0 (compare to reachability). Wethink here of our input-output map as describing outcomes of experiments, providedwe each time bring our system at the same pre-experiment preparation. Thus thereset-button interpretation of initialized machines becomes clear. On the other hand,

the state is used as a memory device, which allows us to predict future outputs giventhe past history and future inputs. Suppose now that for two states x1, x2 we have theproperty that c(x1, u) = c(x2, u) uU. Then certainly we do not need both x1, x2,since we cannot distinguish them from input-output observations. Compare the abovewith the discussion of controllability, reachability, observability, reconsitructibility ofSection 1.1.

ii. Another idea about simplicity is to have the state transition in a particularly simpleform.

In the language of automata theory when two states x, x belong to the same obervation

class (c.f. Definition 1.1.15), are called equivalent. For each state x we let fx be themap (U Y) given via fx(u) = c(x, u). A machine M is in reduced form if and only ifthe map x fx is one-to-one, i.e. no two states are equivalent. All states of a reducedmachine are inequivalent. The idea is very simple. If two states x, x are equivalent thenc(x, u) = c(x, u) for all uU. There is no way we can distinguish these two states fromoutput observations. So we cannot identify them. In our terminology a machine inreduced form is an observable state space model (c.f. Definition 1.1.17). Since weare interested in simple models, simplicity must be defined precisely. In the frameworkof automata theory simplicity of a machine is characterized by the size of its memory or



25/274


in mathematical terms by the number of elements of the state set X (assumed finite inthis exposition). We need to develop notions for comparing two machines that realize thesame input-output map, to find whether or not there exist a simplest possible realization,whether or not it is unique and ways of constructing it given the input-output map f.

1.3.5 Definition: Two machines M, M1 are strictly equivalent if and only if for eachxX, x1X1 such that fx = fx1 and conversely.1.3.6 Definition: Let f = U Y be an input-output map. A minimal realization off as the input-output map of a finite-state-initialized machine is a quadruple {X,a,c,x0}such that f(u) = c(a( , a(x0, u1), u2), , un1), un) for all u = u1 u2 unU, and whichhas X with minimum cardinality.

Note that a minimal realization has necessarily the property (the way to reduce a machine,which does not have this property, to one which does is obvious) that every state can bereached from x0. Also obvious is the fact that any two states in a minimal realization are

inequivalent. Conversely every realization, in which every state is reachable from x0 and isreduced, it is minimal. So a minimal realization is an observable and reachable state spacemodel of a given input-output map in our terminology (c.f. Definitions 1.1.10 and 1.1.17).We now give a brief and concise summary of some basic theorems about the realizationtheory of this simple class of systems (that is finite state automata). Any realization theorymust contain at least similar basic theorems as appropriate to the class of systems for whichit is being developed.

1.3.7 Theorem: (Realizability criterion) There exists a finite state initialized machinewhich realizes f if and only if U/Ef is a finite set.

Proof:

Suppose U/Efis finite.

Then let X = U/Ef

xo = []Efa = X U X

be given via ([x]Ef, u) [xu]Ef,c : X U Y

be given via ([x]Ef, u) f(xu),The maps a, c are well defined by the properties of Nerode equivalence. Then

c(a( , a(x0, u1), u2), un1, un) == c(a( , a([u1]Ef, u2), un1, un) == c([u1u2 un1]Ef, un) = f(u1u2 un) = f(u), u1, unU.

So the above is a realization. It is called the Nerode realization.



26/274


Conversely, supppose that f is realizable by a finite state initialized machine. Thisrealization may or may not be minimal. We assume with no loss of generality that it has theproperty that all states are reachable from x0. Let X be the state set of this new realization.The number of states in a minimal realization is obviously #X (cardinality of X), andalso equals the number of inequivalent states in X. Now x1, x2 are equivalent if and only

if c(x1, u) = c(x2, u), uU. This is so if and only if f(u1u) = f(u2u) for any uU andall u1, u2U such that a(x0, u1) = x1, a(x0, u2) = x2. Therefore we conclude that x1, x2

equivalent if and only if u1Efu2 for any u1, u2 reaching x1, x2 from x0. Since obviously all

u1 reaching x1 are equivalent in the sense of Nerode we have that the inequivalent states arein a one-to-one correspondence with the Nerode equivalence classes. Therefore

U/Ef #X,thus #U/Ef is a finite set.

This simple theorem (or its appropriate generalizations) which gives constructively a real-ization is the starting point for several more complex realization theories. Let us not forgethowever that the finiteness restriction imposed on the state space models under consideration(i.e. U,Y,Xfinite) is the crucial hypothesis which makes the results so elegant and relativelystraightforward. Just mimicing the construction given here in more general situations willnot suffice. Additional hard work is necessary to get a respectable theory.

Let f be an input-output map which satisfies our realizability Theorem 1.3.7. The Neroderealization has state set U/Ef and has the following two properties (by construction):

i. every state can be reached from x0 (i.e. it is reachable).

ii. it is in reduced form (i.e. it is observable).

Now by definition (almost) a minimal realization has these two properties. The conversederives from the following.

1.3.8 Theorem: Any observable and reachable realization of the same input-output mapf, has the same number of states. This number is the cardinality of U/Ef.

Proof: It suffices to show that this is the case between the Nerode realization and any othersuch realization. Let {X, x0, a , c} be such a realization. IfxX, then by reachability uUsuch that x = a(x0, u). Let

Tbe the map:

T : X U/EfTx = [u]Ef

Observe that T does not depend on u, since if u1, u2 have the property a(x0, u1) =a(x0, u2) = x. Then f(u1u) = c(a(x0, u1), u) = c(x, u) = c(a(x0, u2), u) = f(u2u), uU.So u1Efu

2. We claim T is one-one. Indeed suppose Tx1 = Tx2 = [u]Ef then u, u [u]Efsuch that x1 = a(x0, u), x2 = a(x0, u). Then c(x1, u) = f(uu), c(x2, u) = f(uu), which



27/274


are equal for all uU since uEfu. This implies x1 = x2 by observability. We claim T is

onto also. Let [u]EfU/Ef. Then xX such that Tx = [u]Ef. Indeed let x = a(x0, u).

Then Tx = [u]Ef and this proves the claim. The conclusion is that any realization which isreachable and observable has # states = U/Ef. This completes the proof of the theorem.

As a result of the theorem, if we let m = # of minimal state set, then m is uniqueand equals #U/Ef. Moreover a realization is minimal if and only if it is reachable andobservable.

1.3.9 Theorem: (Uniqueness of minimal realization) The Nerode realization is minimaland any other minimal realization differs from the Nerode realization by a relabeling of thestates.

Proof: Given [u]EfU/Ef we have [x]Ef = a([]Ef, x), and ifc([x]Ef, u) = c([y]Ef, u) uU,

then f(xu) = f(yu), or xEfy which implies [x]Ef = [y]Ef.

That is any two minimal realizations of the same input-output map are strictly equivalent

(c.f. Definition 1.3.5).We would like to give a natural interpretation to the states in the Nerode realization.

So let uU and consider the map fu : U Y given via fu (u) = f(uu). Obviously

fu = fu if and only ifuEfu

. So the Nerode equivalent classes (or the states in the Neroderealization) are all distinct maps fu that are induced from U

into Y by elements of U.The number of these distinct maps equals the cardinality of the minimal state set. We canactually give the same natural interpretation of the Nerode equivalent classes utilizing anyrealization, not necessarily a minimal one.

An example should clarify these concepts: Let U = {0, 1} and R be the subset

{0, 01, 012,

}, where 1k denotes a sequence of k 1s. Is R realizable? The input-output

map is

f = R

where R

() is the characteristic function of R. We have three equivalence classes inU/Ef:

{},R and U (R {}).To show this, note that xEfy means that xuR if and only ifyuR. Now ifuU

(R{})then it cannot happen that uu belongs to R. Therefore f(uu) = 0 for all uU. So all

elements ofU(R{}) are equivalent. Now ifu1, u2 R, let ui = 01ni ; n1, n2 = 0, 1, 2, .Then if u = 1111 1, uiu R, and for any u of other form uiu does not belong to R. Soelements of R are equivalent. Certainly 0 R, 1 U (R {}) and are inequivalent. Sothese two equivalent classes are distinct. At last R, U (R {}) and f() = 0.Obviously is inequivalent to any element of R or U (R {}). Let us construct theNerode realization (or state space model). We have U = {0, 1}, Y = {0, 1}. We havethree states call them x0, x1, x2. We identify

x0 {}, x1 R, x2 U (R {})



28/274


a is given by c is given by

a 0 1x0 x1 x2

x1 x2 x1

x2 x2 x2

c 0 1x0 1 0x1 0 1

x2 0 0

The diagram for the automaton is

x0

x2

x1

11

1

1

0

0

Figure 1.3.1: Diagram for automaton in example

The state x2 is a trap state. A sequence is accepted if it sends the machine from x0 to x1.

We come now to the discussion of our second idea about simplicity of a machine. This isrelated to the question: Can we have a multiplication in U/Ef? Let us try; with w,xU

,we would like

[w]Ef

[x]Ef = [w x]Ef (1.3.8)

For this to make sense the right hand side has to be independent of the representatives.This is not true. We need a congruence to be able to do that.

1.3.10 Definition: An equivalence relation R in a semigroup Sis acongruence relationif it is both right and left invariant. That is if xRy implies xzRyz, zS, and zxRzy, zS.Then, if R is a congruence we can make the set of distinct equivalence classes S/R asemigroup via the multiplication defined in (1.3.8) above. This object then is called the

factor semigroup.

1.3.11 Definition: The equivalence relation introduced by f via

x f y if and only if f(wxv) = f(wyv)

for all w, v belonging to U, is called the Myhill equivalence relation. U

f is calledthe Myhill semigroup of f.



29/274


Note that #U/ f U/Ef because if x f y then xEfy. Or if you like provingx f y we have to do more tests. So in a Nerode equivalence class we can have Myhillinequivalent elements.

We would like to give a natural interpretation to the Myhill semigroup of f (and conse-quently to the Myhill semigroup of a finite state initialized machine M). We give first an

interpretation in terms of f. Consider f and U/Ef. We think of the elements of U/Ef asmaps f(u ) : U Y. A given uU induces a map

tu : U/Ef U/Ef

via tu (f(u)) = f(uu)Now u f u f(uuv) = f(uuv), u,vU

f(uu) = f(uu), uU tu = tu

(as functions between U and Y).tu can be thought of as a state transition map induced by u

on the minimum state set of theNerode realization. Therefore a very important remark is that the elements of the Myhillsemigroup can be identified by all the distinct state transition functions on theNerode state set, induced by strings in U. (Notice that in the Nerode realizationa(f(u), u) = f(uu)).

Now suppose we have a state space model M, which is reachable and observable. Weknow that U/Ef is the set of all distinct maps fu , or more precisely all distinct mapsc(a(x0, u), ) where a, c are the extended maps on U. Equivalently all distinct maps c(x, )for xX (provided all states are reachable from x0). Consider the set of all functions X

X;

it is a monoid under composition. So if 1, 2 are two such functions we define their productby 1 2 = 12 = their composition = 1(2()). Now for each u let u : X X, whereu(x) = a(x, u). We claim u = u if and only ifu

f u. Indeed

u = u a(x, u) = a(x, u), xX a(a(x0, u), u) = a(a(x0, u), u), uU a(x0, uu) = a(x0, uu), uU c(a(x0, uu), v) = c(a(x0, uu), v), u,vU

f(uuv) = f(uuv),

u,vU

u f u

and conversely by reversing the arguments.

Therefore we can think of the elements of U/ f as functions of the forma( (a(a(u), v, , w), where u,v, ,wU, and this is the second interpretation in termsof a state space model.

1.3.12 Theorem: The Myhill semigroup is finite if and only if the set of Nerode equivalenceclasses is finite.



30/274


Proof: We have obviously, as mentioned before,

#U/ f #U/EfSince U/ f can be identified as a subset of the functions of U/Ef into itself we have also

#U/ f (#U/Ef)#U/Ef

and the proof is complete.

Some remarks are now in order. If #U/Ef = n and U/ f= n then we have the bounds

n n nn. Both bounds are possible. Indeed suppose U = {u1, u2, un1} = Y and letf() = u1, f(uuk) = uk uU, that is a reset. Then uEfu f(uu) = f(uu) uU,which equals the last element of u by definition. This is not going to alter if we put vin front of u or u. Therefore u f u and consequently n = n. For n = nn, letY = {y1, yn}, U = {u1, u2, unn} and {g1, g2, gnn} be the set of functions of Y intoY. Define now an input-output map via

f() = y1Yf(vuk) = gk(f(v)).

This is a recursive definition. For example:

f(u1u2u3) = g3(f(u1u2)) = g3(g2(f(u1))) = g3(g2(g1(y1))).

Let us find the Nerode equivalence classes:

uEfu f(uv) = f(uv), vU gk(f(u)) = gk(f(u)), gk f(u) = f(u).

So the elements of U/Ef are in one to one correspondence with the distinct images ofelements of U, which are obviously n, since Y has n elements. The elements of U/ f aretherefore the maps of Y into itself induced by elements of U. And by definition of f theseinclude all of gks.

If we denote by SM the Myhill semigroup of the finite state initialized machine M (or withinput-output map f) we saw that we can view the elements of SM as all the state transitionfunctions induced by elements of U

. Elements ofSM have a natural multiplication definedon them by composition. Let T be this correspondence between U and the elements ofSM.Then

T(u1u2) = u1u2 () = u2 (u1 ()) (1.3.9)We see that T is a monoid antihomorphism between U and the set of functions:f : U/Ef U/Ef, which we denote by F(U/Ef). Moreover we see that we can in-deed identify SM as a subsemigroup of F(U/Ef).



31/274


We can now explicitly construct a realization with a very simple state transition mech-anism. Given an input-output map f we know from Theorems 1.3.7 and 1.3.12 that theMyhill semigroup is finite if and only if f has a finite state automaton realization. Thefollowing realization is called the Myhill realization:

X = SMx0 = identity of SM = []faM([x]f, u) = [xu]fcM([x]f, u) = f(xu) (1.3.10)

On the other hand if we view SM as a subsemigroup of F(U/Ef) the maps aM, cM take theextremely simple form

aM(u , u) = u u

cM(u , u) = f(uu) = cM(uu, x

0) (1.3.11)

where denotes composition of functions. To see that (1.3.10) is a finite state automatonrealizing f, observe that the output due to input u = u1u2u3 un is given bycM(aM(aM (aM([]f, u1), u2)u3), , un1), un) = cM(aM([u1u2 un1]f, un)) = f(u).This simple construction has far reaching implications for finite state automata and systemsin general. It establishes the fact that finite state automata are in one-to-one corre-spondence with finite semigroups. Therefore the classification, analysis of structure anddecomposition of finite state automata can be derived by the mathematical theory of finitesemigroups. This leads to an elegant, completely algebraic structure theory of finite statemachines, the well-known Krohn and Rhodes decomposition theory of finite state automata,

details of which can be found in M.A. Arbibs part of Kalman-Falb and Arbib. The sameidentification holds true for very general systems, but unfortunately the mathematical theoryof the corresponding semigroups is not yet available in the same detail. A fact that preventedfor example rapid development of nonlinear realization theory. Recently, encouraging resultshave appeared, as is described in the notes at the end of this chapter.

It is important to notice the simple form of the dynamics in the Myhill realization. Indeed

aM([x]f, u) = [x]f [u]f (1.3.12)i.e. just right multiplication of the state by the element ofSM corresponding to the input.This is very close to a difference equation. To see this we view

SM as a subsemigroup of

F(U/Ef), and we let s(k) denote u1u2uk1 , i.e. the function corresponding to the inputstring up to time k (not including k). Then since the Myhill realization has initial state theidentity function the next state is given by

s(k + 1) = uk s(k) = a(, uk) s(k) (1.3.13)where denotes composition of functions and a is the next state map of any state spacemodel of f. If c is the next output map we have

y(k + 1) = cM(u1,uk1 , uk)



32/274


= f(u1u2 uk1uk)= c(a(a( (a(x0, u1), uk1), uk)= c(u1u2,uk1 (x

0), uk)

= c(u1u2,uk (x0)) = c(s(k + 1)x0) (1.3.14)

Through (1.3.13) and (1.3.14) we can construct the Myhill realization directly from anyrealization:

X = SM = all functions of the form a(a( (a(, u), v), ), w),u,v, ,wUs(k + 1) = a(, uk) s(k)

y(k) = c(s(k)x0) (1.3.15)

So instead of studying the difficult original equations we can study the above simple dif-ference model. However the difficulty is all hidden in computing and understanding thestructure of SM this way, and in the complexity of the product in SM. To illustrate theseconcepts we compute the Myhill semigroup of the same input-output map in the previousexample on Nerode realization:

f = R

: U YU = Y = {0, 1}R = {0, 01, , 01k, }.

Then we have shown that U/Ef = {}, R , U (R {}). We have to subdivide U intoequivalence classes so that the transitions on U/Ef induced by elements of each class arethe same. We know that a in the Nerode realization is given by

a 0 1x0 x1 x2

x1 x2 x1

x2 x2 x2

Using a, we can find the transitions induced by inputs

0 1 00 00 0 01 010 011 010 any 011 1x0 x1 x2 x2 x2 x0 x1 x2 x1 x2 x1

x1 x2 x1 x2 x2 x1 x2 x2 x2 x2 x2

x2 x2 x2 x2 x2 x2 x2 x2 x2 x2 x2

11 11 1 11 10 any 10 10 anyx2 x2 x2 x2 x2

x1 x1 x2 x2 x2

x2 x2 x2 x2 x2



33/274


s :0

x0

x1

x2

x0

x1

x2

s :2

x0

x1

x2

x0

x1

x2

s :1

x0

x1

x2

x0

x1

x2

s :3

x0

x1

x2

x0

x1

x2

So the only possible transitions are

Obviously s0 is induced by u in {} = x0s1 is induced by u in R = x1

s2 is induced by u in {1m}s3 is induced by u in x2

{1m

}

The multiplication table for the Myhill semigroup is (where we have to recall that S1 S2 =S2 S1, the semigroup product, composition of functions).

s0 s1 s2 s3

s0 s0 s1 s2 s3

s1 s1 s3 s1 s3

s2 s2 s3 s2 s3

s3 s3 s3 s3 s3

Note also that s0 = []fs1 = [0]fs2 = [1]fs3 = [10]f

Finally the Myhill realization for this input-output map is given by:

aM 0 1s0 s1 s2

s1 s3 s1

s2 s3 s2

s3 s3 s3

cM 0 1s0 1 0s1 0 1s2 0 0s3 0 0

The diagram representing this automation is shown in Figure 1.3.2, where the readout is 0from every state except s1.



34/274


S0

S2

S1

S3

1

1

0

0

1

0

0,1

Figure 1.3.2: Diagram for Myhill automaton in example

This concludes this introductory discussion on realization theory. In this course we willestablish a similar realization theory for linear input-output maps, with linear state spacemodels, with X a finite dimensional vector space. Frequently we will return to this sectionin order to compare the two resulting theories.



35/274

CHAPTER 1. GENERAL SYSTEMS 1.4. DISCRETE TIME LINEAR SYSTEMS

1.4 Discrete Time Linear SystemsM in general is a codification of f, a way of transmitting the enormous information

required to describe f in a compact way. In the discrete time case T=integers. Definex(t) = (t; t0, x0, u()), then x(t + 1) = (t + 1; t, x(t), u(t)). So a Discrete Time DynamicalSystem has the familiar state space model:

x(t0) = x0 = given

x(t + 1) = (t + 1; t, x(t), u(t)) =: f(t, x(t), u(t))

y(t) = (t, x(t), u(t)) =: h(t, x(t), u(t))

t = 0, 1, 2, (1.4.1)

where in general f, h, are nonlinear.

We are going to be concerned primarily with linear time invariant causal systems, in theclassification provided by 1.1.4. To be specific let us first consider single input single output

such systems, with real valued inputs and outputs. Then the input and output functionspaces are nothing else but sets of sequences of real numbers:

U= Y= {{(i)}i=; (i) IR, i} (1.4.2)

We will denote all integers by Z= {i = , , 1, 0, 1, , +}, the nonnegative onesby Z+ {i = 0, 1, 2, }.

Let us first consider the input output or external model of such systems. It is clear thatthe input and output function spaces can be made into a vector space by defining vector

addition and scalar multiplication pointwise. Since the input-output map is linear, we can,at least formally, represent it by an infinite matrix F

0th row

...y(2)y(1)y(0)y(1)y(2)

...

+

=

F1,1 F1,0 F1,1 F1,2 F0,1 F0,0 F0,1 F0,2

F1,1 F1,0 F1,1 F1,2 ...

+

...u(2)u(1)u(0)u(1)u(2)

...

0th column (1.4.3)

However the system is time invariant. Recalling the Definition 1.1.4, A. we see that thisimplies symbolically

Fij = Fij, i , j = , , +. (1.4.4)



36/274

1.4. DISCRETE TIME LINEAR SYSTEMS CHAPTER 1. GENERAL SYSTEMS

Indeed consider the input function

u = { 0, 0, 1, 0, 0, }, (1.4.5)

(jth) positionwhere j is an arbitary integer. The corresponding output function is

y = f(u) = { , F1,j, F0,j, F1,j, }

with

y(i) = Fi,j (1.4.6)

Let k be an arbitrary integer and shift the input function by k units:

uk(i) := u(i + k). (1.4.7)

Therefore

uk = { 0, 0, 1, 0, 0, }. (1.4.8)

(j-k)th positionBy definition of time invariance the output corresponding to uk should be

yk(i) := y(i + k) = Fi+k,j

or

yk :=

{ F1,j, F0,j, F1,j,

}. (1.4.9)

(-k)th position

On the other hand the output corresponding to uk is also f(uk) and in view of (1.4.3)

fuk(i) := Fi,jk (1.4.10)

Time invariance thus implies



37/274


Fi+k,j = Fi,jk for all i,j,kZ (1.4.11)

Now letting

= i+k

= j-(i+k)

(1.4.11) reads

F,+ = Fi,i+ for all i,,,Z (1.4.12)

which is the same as (1.4.4).

A matrix with the property (1.4.12), which is usually described as being constant alongthe diagonals, has a special name:

1.4.1 Definition: An infinite matrix Fij, i , j Z(or i, j Z+) with the property (1.4.12) iscalled a Toeplitz matrix.

We can now state the following:

1.4.2 Theorem: The input-output map of a linear discrete time, time invariant system withinput and output function spaces real valued sequences, can be represented by a Toeplitzmatrix.

As a consequence the elements of the matrix representing the input-output map, dependonly on the difference of the row index minus the column index: a fact indicated by thenotation of (1.4.4). We can introduce thus a sequence T(i), iZ, via:

Fij =: T(i j) (1.4.13)

Note that the values T(i) are the elements appearing along the diagonals of the matrixrepresenting the input-output map F, as indicated pictorially below:

F =

. . . . . . . . .T(0) T(1) T(2)T(1) T(0) T(1)

T(1) T(0) T(1). . . . . . . . .

0th row

0th column (1.4.14)



38/274


1.4.3 Definition: The sequence T(i), iZ, is called the weighting pattern or impulseresponse of the single input, single output discrete time, linear, time invariant system.

Note that time invariance, implies a reduction in the information needed to describe theinput output map of a linear system. Namely instead of describing all elements of the matrix,F, we need only describe the diagonal elements via T.

In view of this notation the input-output relationship of a single input, single outputlinear system is given by the familiar convolution:

y(t) =

j=

T(t j)u(j) =: [Tu](t) (1.4.15)

Note also that no information is lost by considering the behavior of the system for non-negative time inputs and outputs only. That is by describing the input-output map by thetruncated matrix

(1.4.16)

y(0)y(1)y(2)

=

T(0) T(1)T(1) T(0) T(1)T(2) T(1) T(0) T(1)

u(0)u(1)

We shall use the representation (1.4.16) in the sequel.

We will primarily be interested in causal systems in these notes. We thus want to under-stand next the implications of causality on the input output map and thus on the weightingpattern T. We will apply the classification 1.1.4 (iv) to (1.4.16). So let u1, u2 be two inputsequences which have the same values for i = 0, 1, , , where is an arbitrary integer fromZ+:

u1(i) = u2(i) , i = 0, 1, , (1.4.17)The two inputs u1, u2, may differ for i > . Causality implies that the corresponding outputsy1 = f(u

1) and y2 = f(u2) must also have the same values for i = 0, 1, , . In view of

(1.4.15), (1.4.16),

(1.4.18)y1(i) =

j=0 T(i j)u1(j)y2(i) =

j=0 T(i j)u2(j)

; i=0,1,

and therefore

[y1 y2](i) =

j=0

T(i j)[u1 u2](j) ; i = 0, 1, 2, . (1.4.19)



39/274


Since

[u1 u2](j) = 0 ; j = 0, 1,

because of (1.4.17), (1.4.19) results in

[y1 y2](i) =

j=+1

T(i j)[u1 u2](j) ; i = 0, 1, 2, (1.4.20)

Again causality implies that for arbitrary integer , and two arbitrary inputs u1, u2 satisfying(1.4.17)

[y1 y2](i) = 0 ; i = 0, 1, . (1.4.21)

It is immediately seen from (1.4.20), that this implies

T(k) = 0, k < 0 (1.4.22)

1.4.4 Theorem: The input output map of a single input, single output, linear, discretetime, time-invariant, causal system can be represented by a lower traingular Toeplitz matrix.Equivalently the weighting pattern is identically zero for negative arguments.

We turn now to the state space model for a linear, time invariant, discrete time system. Theinput and output function spaces will be again described by (1.4.2). As state space X weare interested primarily in the case where

X = IRn, for some n. (1.4.23)

Now (1.4.23) represents a restriction on the system, the implications of which we willstudy in the sequel. First according to the classification 1.1.8 (i), a linear, discrete time,state space model will be described, in view of (1.4.1) by

x(t + 1) = A(t)x(t) + b(t)u(t) (1.4.24)

y(t) = c(t)Tx(t) + d(t)u(t)

x(t0) = x0

where A is n n, b and c are n 1, d a scalar. For time invariant systems, according to1.1.8 (iii) A,b,c,d must be constant (not depending on time).

A digital simulation diagram for a discrete time system is:



40/274


unit

delay

lines

Tc

d

A

u(k) x(k+1)

x(k)

y(k)b +

x(k)+

Figure 1.4.1: Digital simulation of linear discrete time system

Notice the number of memory elements equals the dimension of X (the state space). Theupper path (that envolving d) is often called direct feedforward path.

In section 1.3 we discussed the so called realization problem for finite automata. Wediscuss now the realization problem for linear time invariant systems. First observe that(1.4.24) for A,b,c,d constant can be readily solved to give

x(t) = Atx0 +t1k=0

At1kbu(k) , tZ+ (1.4.25)

y(t) = cTAtx0 +t1k=0

cTAt1kbu(k) + du(t)

We thus see that a linear, discrete time, time invariant state space model, induces an input-output map, via the second of (1.4.25). This input-output map will be linear only if x0 = 0.As a result when we refer from now on to the input-output map of a linear system, we shallalways understand the system initialized to zero. Thus the state space model

x(t + 1) = Ax(t) + bu(t)

y(t) = cTx(t) + du(t)

(1.4.26)

induces the input output map

y(t) =t1k=0

cTAt1kbu(k) + du(t) , tZ+ (1.4.27)

Clearly the input-output map (1.4.27) is linear, causal and time invariant according tothe classification 1.1.4. Thus we have concluded that any linear, time invariant, discretetime state space model (like (1.4.26)) induces a linear, causal, time invariant input output

map (like (1.4.27)). We can thus associate to the input-output map (1.4.27) according toTheorem 1.4.4 a weighting pattern T. Clearly by comparison between

y(t) =t

j=0

T(t j)u(j)

and (1.4.27) we deduce

T(0) = d (1.4.28)

T() = cTA1b , = 1, 2,



41/274


We see now that (1.4.28) prescribes in a precise way the weighting pattern associated with(1.4.27) directly from the parameters of the state space model (1.4.26).

1.4.5 Definition: Given a weighting pattern T(i), iZ+, a quadruple{A,b,c,d}, where Ais n n,b,c are n 1, d scalar, will be called a realization of T if (1.4.28) hold.

Note that since (1.4.26) (1.4.27) are linked together, according to the above discussionwe can call either model a linear, discrete time, time invariant, causal system. As we alreadyindicated it is rather trivial to pass from a state space model to the input-output map forsystems in this class. The harder question is how to pass from the input-output map, namelythe weighting pattern T, to the state space model. This is the realization problem for linear,discrete time, causal, time invariant input-output maps. Two important questions shouldbe addressed in reference to this problem: (1) characterize all weighting patterns that admitrealizations like (1.4.26); (ii) examine the question of uniqueness of realizations leading tothe same input-output map.

The second question has an intricate theory, because as is easily seen there are infinitelymany state space models like (1.4.26) that lead to the same weighting pattern. We shalladdress this question, together with algorithmic solutions to (1.4.28) in a later section.



42/274

1.5. DIFFERENTIAL DYNAMICAL SYSTEMS CHAPTER 1. GENERAL SYSTEMS

1.5 Differential Dynamical Systems

In continuous time t, we are more used to systems that are described by differential equations.How can we get from

M a differential equation description of the system? To do this we

have to make some assumptions about the dependence of on t. The result we are going to

describe can be generalized easily to more general input and output spaces.

1.5.1 Theorem: Assume that

a) U IRm,Ucontains the space of continuous functions from T to U and X IRn.

b) for each t0, x0 and u() in Cm(I) (where I is a closed and bounded subinterval of

T and Cm(I) is the space of continuous IRm-valued functions on I) (; t0, x0, u())

is a continuously differentiable function of time from I to IRn such that the deriva-tive d

dt(t; t0, x

0, u()) is continuous for each t, in t0, x0, u() simultaneously. Then(.; t0, x

0

, u()), for uCm(I), is the solution of

dx(t)dt = f(t, x(t), u(t))

x(t0) = x0

for t0,tIin C1n(I) and f is continuous in t,x, and u.

Proof: Left as an exercise.

This gives us the opportunity to discuss some interesting spaces of functions. Denote byCm[t0, t1] the set ofm-tuples whose elements are continuous functions of time defined on theinterval t0 t t1:

u(t) =

u1(t)u2(t)

um(t)

Cm[t0, t1] is a vector space over IR.

u(t) =

u1(t)

um(t)

; (u + v)(t) =

ui(t) + vi(t)

(1.5.1)

How do we measure distance of two functions in Cm(t0, t1)?



43/274

CHAPTER 1. GENERAL SYSTEMS 1.5. DIFFERENTIAL DYNAMICAL SYSTEMS

d(u(), v()) = supt[t0,t1]

((u(t) v(t))T(u(t) v(t))) 12 (1.5.2)

= sup

t[t0,t1]{

m

i=1(ui(t) vi(t))

2

}

12

For notation we shall write u() v() C for d(u(), v()) and x for (ni=1 x2i ) 12 .Having distance we can define convergence via:

un() u() if d(un(), u()) 0 as n (1.5.3)We typically work in function spaces, with distance d such that if d(un(), um()) 0, asn, m , then u() such that un() u().

In IR

n

everybody knows what x

n

x means. In C[t0, t1] we have to be careful.So consider a sequence of functions un()C[t0, t1]. Then un() u() pointwise iflimn u

n(t) = u(t) for each t[t0, t1]. On a set of functions it is important to real-ize that we can have various modes of convergence (or various topologies). Forsimplicity consider C[0, 1]. Suppose xn xm C 0 as n, m (i.e. {xn} is a Cauchysequence of functions in C[0, 1]). Fix t[0, 1] then | xn(t) xm(t) | xn xm C so xn(t)converges as a sequence of real numbers, say to x(t). That is xn() converge pointwise tox(). Now given > 0 choose N() such that xn xm C< /2 for n,m > N(). Then

|xn(t)

x(t)

| |xn(t)

xm(t)

|+

|xm(t)

x(t)

|(1.5.4)

xn xm C + | xm(t) x(t) |For n > N() we can choose m large enough to make r.h.s. < . So xn x uniformly.This is a well known fact about continuous functions, as is the fact that the limiting functionx is also continuous. To see this fix > 0. Then for every , t and n

| x(t + ) x(t) || x(t + ) xn(t + ) | + | xn(t + ) xn(t) | + | xn(t) x(t)|The first part of the r.h.s. is small, we can make the second small by choosing appropriatelyand the third by choosing n large. So x is continuous. So now we have the following

description of a Differential Dynamical System:

x(t0) = x0 = given

dx(t)dt = f(t, x(t), u(t))

y(t) = (t, x(t), u(t))= h(t, x(t), u(t))

(1.5.5)As a result we are interested in looking at differential equations of this type (this is infact a family of differential equations parametrized by the input). We give now some basic



44/274

1.5. DIFFERENTIAL DYNAMICAL SYSTEMS CHAPTER 1. GENERAL SYSTEMS

results about existence, uniqueness of solutions, meaning of solutions in ordinary differentialequations of this type. If we substitute u(t) as a function of time then the above becomes adifferential equation of the form

x(t0) = x0

dx(t)dt

= F(t, x(t)) (1.5.6)

For each t, F(t, ) : IRn IRn, and assigns a vector to every x IRn which shows thetangential direction at the corresponding point of the solution of (1.5.6). This is a simpleand well known geometric interpretation of (1.5.6). We want to find a curve so that thetangent of the curve differential at x(t) is F(t, x(t)). F(t, ) is called a vector field in thelanguage of differential geometry. Here is where control theory and ordinary differentialequations theory differ drastically. For the qualitative properties of the dynamical systemwe need to study the family of vector fields f(t, x(t), u(t)) parametrized by the controlsu(

). When we apply control u(

) the trajectory follows the tangential direction indicated

by f(t, x(t), u(t)). Differential geometric ideas have been applied very successfully to thestudy of Nonlinear Control Systems in the work of R. Brockett, H. Sussman, A. Krener,R. Hermann and many others. However as far as existence and uniquess is concerned whatneeds to be done is to restrict the controls u(), so that standard results on existence anduniquess of differential equations can be applied. One of the best references is J. HalesOrdinary Differential Equations and in particular Chapter 1. By a local solution of(1.5.6) through x0, t0 we mean that we can find an open interval I containing t0 and acontinuously differentiable function on I to IRn such that (1.5.6) is satisfied.

1.5.2 Theorem: (Peano): If F is continuous on I D, I an open interval containing t0, Dan open set containing x

0

, then there exists at least one solution of (1.5.6) passing through(x0, t0).

Proof: We only give a sketch, the details can be found in Hales book. Approx

lecture notes for enee660 fall 2008

Documents