bidirectional associative memory

27
Bidirectional associative memory (BAM) is a type of recurrent neural network . BAM was introduced by Bart Kosko in 1988. [1] There are two types of associative memory, auto-associative and hetero- associative. BAM is hetero-associative, meaning given a pattern it can return another pattern which is potentially of a different size. It is similar to the Hopfield network in that they are both forms of associative memory . However, Hopfield nets return patterns of the same size. Topology A BAM contains two layers of neurons , which we shall denote X and Y. Layers X and Y are fully connected to each other. Once the weights have been established, input into layer X presents the pattern in layer Y, and vice versa. Procedure Learning Imagine we wish to store two associations, A1:B1 and A2:B2. A1 = (1, 0, 1, 0, 1, 0), B1 = (1, 1, 0, 0) A2 = (1, 1, 1, 0, 0, 0), B2 = (1, 0, 1, 0) These are then transformed into the bipolar forms: X1 = (1, -1, 1, -1, 1, -1), Y1 = (1, 1, -1, -1) X2 = (1, 1, 1, -1, -1, -1), Y2 = (1, -1, 1, -1) From there, we calculate where denotes the transpose. So,

Upload: krishnabiharishukla

Post on 10-May-2017

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bidirectional Associative Memory

Bidirectional associative memory (BAM) is a type of recurrent neural network. BAM was introduced by Bart Kosko in 1988.[1] There are two types of associative memory, auto-associative and hetero-associative. BAM is hetero-associative, meaning given a pattern it can return another pattern which is potentially of a different size. It is similar to the Hopfield network in that they are both forms of associative memory. However, Hopfield nets return patterns of the same size.

TopologyA BAM contains two layers of neurons, which we shall denote X and Y. Layers X and Y are fully connected to each other. Once the weights have been established, input into layer X presents the pattern in layer Y, and vice versa.

Procedure

Learning

Imagine we wish to store two associations, A1:B1 and A2:B2.

A1 = (1, 0, 1, 0, 1, 0), B1 = (1, 1, 0, 0) A2 = (1, 1, 1, 0, 0, 0), B2 = (1, 0, 1, 0)

These are then transformed into the bipolar forms:

X1 = (1, -1, 1, -1, 1, -1), Y1 = (1, 1, -1, -1) X2 = (1, 1, 1, -1, -1, -1), Y2 = (1, -1, 1, -1)

From there, we calculate where denotes the transpose. So,

Recall

To retrieve the association A1, we multiply it by M to get (4, 2, -2, -4), which, when run through a threshold, yields (1, 1, 0, 0), which is B1. To find the reverse association, multiply this by the transpose of M.

Page 2: Bidirectional Associative Memory

Recurrent neural networkFrom Wikipedia, the free encyclopedia

A recurrent neural network (RNN) is a class of neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition, where they have achieved the best known results.[1]

Contents 1 Architectures

o 1.1 Fully recurrent network o 1.2 Hopfield network o 1.3 Elman networks and Jordan networks o 1.4 Echo state network o 1.5 Long short term memory network o 1.6 Bi-directional RNN o 1.7 Continuous-time RNN o 1.8 Hierarchical RNN o 1.9 Recurrent multilayer perceptron o 1.10 Second Order Recurrent Neural Network o 1.11 Pollack’s sequential cascaded networks

2 Training o 2.1 Gradient descent o 2.2 Global optimization methods

3 Related fields 4 Issues with recurrent neural networks 5 References 6 External links

Architectures

Fully recurrent network

This is the basic architecture developed in the 1980s: a network of neuron-like units, each with a directed connection to every other unit. Each unit has a time-varying real-valued activation. Each connection has a modifiable real-valued weight. Some of the nodes are called input nodes, some output nodes, the rest hidden nodes. Most architectures below are special cases.

For supervised learning in discrete time settings, training sequences of real-valued input vectors become sequences of activations of the input nodes, one input vector at a time. At any given time step, each non-input unit computes its current activation as a nonlinear function of the weighted sum of the activations of all units from which it receives connections. There may be teacher-

Page 3: Bidirectional Associative Memory

given target activations for some of the output units at certain time steps. For example, if the input sequence is a speech signal corresponding to a spoken digit, the final target output at the end of the sequence may be a label classifying the digit. For each sequence, its error is the sum of the deviations of all target signals from the corresponding activations computed by the network. For a training set of numerous sequences, the total error is the sum of the errors of all individual sequences. Algorithms for minimizing this error are mentioned in the section on training algorithms below.

In reinforcement learning settings, there is no teacher providing target signals for the RNN, instead a fitness function or reward function is occasionally used to evaluate the RNN's performance, which is influencing its input stream through output units connected to actuators affecting the environment. Again, compare the section on training algorithms below.

Hopfield network

The Hopfield network is of historic interest although it is not a general RNN, as it is not designed to process sequences of patterns. Instead it requires stationary inputs. It is a RNN in which all connections are symmetric. Invented by John Hopfield in 1982, it guarantees that its dynamics will converge. If the connections are trained using Hebbian learning then the Hopfield network can perform as robust content-addressable memory, resistant to connection alteration.

A variation on the Hopfield network is the bidirectional associative memory (BAM). The BAM has two layers, either of which can be driven as an input, to recall an association and produce an output on the other layer.[2]

Elman networks and Jordan networks

The Elman SRN

Page 4: Bidirectional Associative Memory

The following special case of the basic architecture above was employed by Jeff Elman. A three-layer network is used (arranged vertically as x, y, and z in the illustration), with the addition of a set of "context units" (u in the illustration). There are connections from the middle (hidden) layer to these context units fixed with a weight of one.[3] At each time step, the input is propagated in a standard feed-forward fashion, and then a learning rule is applied. The fixed back connections result in the context units always maintaining a copy of the previous values of the hidden units (since they propagate over the connections before the learning rule is applied). Thus the network can maintain a sort of state, allowing it to perform such tasks as sequence-prediction that are beyond the power of a standard multilayer perceptron.

Jordan networks, due to Michael I. Jordan, are similar to Elman networks. The context units are however fed from the output layer instead of the hidden layer. The context units in a Jordan network are also referred to as the state layer, and have a recurrent connection to themselves with no other nodes on this connection.[3] Elman and Jordan networks are also known as "simple recurrent networks" (SRN).

Echo state network

The echo state network (ESN) is a recurrent neural network with a sparsely connected random hidden layer. The weights of output neurons are the only part of the network that can change and be trained. ESN are good at reproducing certain time series.[4] A variant for spiking neurons is known as Liquid state machines.[5]

Long short term memory network

The Long short term memory (LSTM) network, developed by Hochreiter & Schmidhuber in 1997,[6] is an artificial neural net structure that unlike traditional RNNs doesn't have the problem of vanishing gradients (compare the section on training algorithms below). It works even when there are long delays, and it can handle signals that have a mix of low and high frequency components. LSTM RNN outperformed other methods in numerous applications such as language learning[7] and connected handwriting recognition.[8]

Bi-directional RNN

Invented by Schuster & Paliwal in 1997,[9] bi-directional RNN or BRNN use a finite sequence to predict or label each element of the sequence based on both the past and the future context of the element. This is done by adding the outputs of two RNN, one processing the sequence from left to right, the other one from right to left. The combined outputs are the predictions of the teacher-given target signals. This technique proved to be especially useful when combined with LSTM RNN.[10]

Continuous-time RNN

A continuous time recurrent neural network (CTRNN) is a dynamical systems model of biological neural networks. A CTRNN uses a system of ordinary differential equations to model the effects on a neuron of the incoming spike train. CTRNNs are more computationally efficient

Page 5: Bidirectional Associative Memory

than directly simulating every spike in a network as they do not model neural activations at this level of detail[citation needed] .

For a neuron in the network with action potential the rate of change of activation is given by:

Where:

: Time constant of postsynaptic node : Activation of postsynaptic node : Rate of change of activation of postsynaptic node : Weight of connection from pre to postsynaptic node

: Sigmoid of x e.g. . : Activation of presynaptic node

: Bias of presynaptic node

: Input (if any) to node

CTRNNs have frequently been applied in the field of evolutionary robotics, where they have been used to address, for example, vision,[11] co-operation[12] and minimally cognitive behaviour.[13]

Hierarchical RNN

There are many instances of hierarchical RNN whose elements are connected in various ways to decompose hierarchical behavior into useful subprograms.[14][15]

Recurrent multilayer perceptron

Generally, a Recurrent Multi-Layer Perceptron (RMLP) consists of a series of cascaded subnetworks, each of which consists of multiple layers of nodes. Each of these subnetworks is entirely feed-forward except for the last layer, which can have feedback connections among itself. Each of these subnets is connected only by feed forward connections.[16]

Second Order Recurrent Neural Network

Second order RNNs use higher order weights instead of the standard weights, and inputs and states can be a product. This allows a direct mapping to a finite state machine both in training and in representation[17][18] Long short term memory is an example of this.

Pollack’s sequential cascaded networks

Page 6: Bidirectional Associative Memory

Training

Gradient descent

To minimize total error, gradient descent can be used to change each weight in proportion to its derivative with respect to the error, provided the non-linear activation functions are differentiable. Various methods for doing so were developed in the 1980s and early 1990s by Paul Werbos, Ronald J. Williams, Tony Robinson, Jürgen Schmidhuber, Sepp Hochreiter, Barak Pearlmutter, and others.

The standard method is called "backpropagation through time" or BPTT, and is a generalization of back-propagation for feed-forward networks,[19][20] and like that method, is an instance of Automatic differentiation in the reverse accumulation mode or Pontryagin's minimum principle. A more computationally expensive online variant is called "Real-Time Recurrent Learning" or RTRL,[21][22] which is an instance of Automatic differentiation in the forward accumulation mode with stacked tangent vectors. Unlike BPTT this algorithm is local in time but not local in space.[23][24]

There also is an online hybrid between BPTT and RTRL with intermediate complexity,[25][26] and there are variants for continuous time.[27] A major problem with gradient descent for standard RNN architectures is that error gradients vanish exponentially quickly with the size of the time lag between important events.[28] [29] The Long short term memory architecture together with a BPTT/RTRL hybrid learning method was introduced in an attempt to overcome these problems.[6]

Global optimization methods

Training the weights in a neural network can be modeled as a non-linear global optimization problem. A target function can be formed to evaluate the fitness or error of a particular weight vector as follows: First, the weights in the network are set according to the weight vector. Next, the network is evaluated against the training sequence. Typically, the sum-squared-difference between the predictions and the target values specified in the training sequence is used to represent the error of the current weight vector. Arbitrary global optimization techniques may then be used to minimize this target function.

The most common global optimization method for training RNNs is genetic algorithms, especially in unstructured networks.[30][31][32]

Initially, the genetic algorithm is encoded with the neural network weights in a predefined manner where one gene in the chromosome represents one weight link, henceforth; the whole network is represented as a single chromosome. The fitness function is evaluated as follows: 1) each weight encoded in the chromosome is assigned to the respective weight link of the network ; 2) the training set of examples is then presented to the network which propagates the input signals forward ; 3) the mean-squared-error is returned to the fitness function ; 4) this function will then drive the genetic selection process.

Page 7: Bidirectional Associative Memory

There are many chromosomes that make up the population; therefore, many different neural networks are evolved until a stopping criterion is satisfied. A common stopping scheme is: 1) when the neural network has learnt a certain percentage of the training data or 2) when the minimum value of the mean-squared-error is satisfied or 3) when the maximum number of training generations has been reached. The stopping criterion is evaluated by the fitness function as it gets the reciprocal of the mean-squared-error from each neural network during training. Therefore, the goal of the genetic algorithm is to maximize the fitness function, hence, reduce the mean-squared-error.

Other global (and/or evolutionary) optimization techniques may be used to seek a good set of weights such as Simulated annealing or Particle swarm optimization.

Page 8: Bidirectional Associative Memory
Page 9: Bidirectional Associative Memory

Hopfield network

Stages of Memory - Encoding Storage and Retrieval

by Saul McLeod published 2007

“Memory is the process of maintaining information over time.” (Matlin, 2005)

“Memory is the means by which we draw on our past experiences in order to use this information in the present’ (Sternberg, 1999).

Page 10: Bidirectional Associative Memory

Memory is the term given to the structures and processes involved in the storage and subsequent retrieval of information.

Memory is essential to all out lives. Without a memory of the past, we cannot operate in the present or think about the future. We would not be able to remember what we did yesterday, what we have done today or what we plan to do tomorrow.  Without memory we could not learn anything.

Memory is involved in processing vast amounts of information. This information takes many different forms, e.g. images, sounds or meaning.

For psychologists the term memory covers three important aspects of information processing:

1. Memory EncodingWhen information comes into our memory system (from sensory input), it needs to be changed into a form that the system can cope with, so that it can be stored. (Think of this as similar to changing your money into a different currency when you travel from one country to another).

Page 11: Bidirectional Associative Memory

For example, a word which is seen (on the whiteboard) may be stored if it is changed (encoded) into a sound or a meaning (i.e. semantic processing).

There are three main ways in which information can be encoded (changed):

1. Visual (picture)

2. Acoustic (sound)

3. Semantic (meaning)

For example, how do you remember a telephone number you have looked up in the phone book? If you can see it then you are using visual coding, but if you are repeating it to yourself you are using acoustic coding (by sound).

Evidence suggests that this is the principle coding system in short term memory (STM) is acoustic coding.  When a person is presented with a list of numbers and letters, they will try to hold them in STM by rehearsing them (verbally).  Rehearsal is a verbal process regardless of whether the list of items is presented acoustically (someone reads them out), or visually (on a sheet of paper).

The principle encoding system in long term memory (LTM) appears to be semantic coding (by meaning). However, information in LTM can also be coded both visually and acoustically.

2. Memory StorageThis concerns the nature of memory stores, i.e. where the information is stored, how long the memory lasts for (duration), how much can be stored at any time (capacity) and what kind of information is held. The way we store information affects the way we retrieve it.  There has been a significant amount of research regarding the differences between Short Term Memory (STM ) and Long Term Memory (LTM).

Most adults can store between 5 and 9 items in their short-term memory. Miller (1956) put this idea forward and he called it the magic number 7. He though that short-term memory capacity was 7 (plus or minus 2) items because it only had a certain number of “slots” in which items could be stored. However, Miller didn’t specify the amount of information that can be held in each slot. Indeed, if we can “chunk” information together we can store a lot more information in our short-term memory. In contrast the capacity of LTM is thought to be unlimited.

Information can only be stored for a brief duration in STM (0-30 seconds), but LTM can last a lifetime.

3. Memory Retrieval

Page 12: Bidirectional Associative Memory

This refers to getting information out storage. If we can’t remember something, it may be because we are unable to retrieve it. When we are asked to retrieve something from memory, the differences between STM and LTM become very clear.

STM is stored and retrieved sequentially. For example, if a group of participants are given a list of words to remember, and then asked to recall the fourth word on the list, participants go through the list in the order they heard it in order to retrieve the information.

LTM is stored and retrieved by association. This is why you can remember what you went upstairs for if you go back to the room where you first thought about it.

Organizing information can help aid retrieval.  You can organize information in sequences (such as alphabetically, by size or by time).  Imagine a patient being discharged form hospital whose treatment involved taking various pills at various times, changing their dressing and doing exercises. If the doctor gives these instructions in the order which they must be carried out throughout the day (i.e. in sequence of time), this will help the patient remember them.

Criticisms of Memory ExperimentsA large part of the research on memory is based on experiments conducted in laboratories. Those who take part in the experiments - the participants - are asked to perform tasks such as recalling lists of words and numbers. Both the setting - the laboratory - and the tasks are a long way from everyday life. In many cases, the setting is artificial and the tasks fairly meaningless. Does this matter?

Psychologists use the term ecological validity to refer to the extent to which the findings of research studies can be generalized to other settings. An experiment has high ecological validity if its findings can be generalized, that is applied or extended/ to settings outside the laboratory.

It is often assumed that if an experiment is realistic or true-to-life, then there is a greater likelihood that its findings can be generalized.  If it is not realistic - if the laboratory setting and the tasks are artificial - then there is less likelihood that the findings can be generalized. In this case, the experiment will have low ecological validity.

Many experiments designed to investigate memory have been criticized for having low ecological validity.  First, the laboratory is an artificial situation. People are removed from their normal social settings and asked to take part in a psychological experiment.  They are directed by an 'experimenter' and may be placed in the company of complete strangers.  For many people, this is a brand new experience, far removed from their everyday lives. Will this setting affect their actions, will they behave normally?

Often, the tasks participants are asked to perform can appear artificial and meaningless.  Few, if any, people would attempt to memorize and recall a list of unconnected words in their daily lives. And it is not clear how tasks such as this relate to the use of memory in everyday life. The artificiality of many experiments has led some researchers to question whether their findings can

Page 13: Bidirectional Associative Memory

be generalized to real life.  As a result, many memory experiments have been criticized for having low ecological validity.

Memory has the ability to encode, store and recall information. Memories give an organism the capability to learn and adapt from previous experiences as well as build relationships. Encoding allows the perceived item of use or interest to be converted into a construct that can be stored within the brain and recalled later from short term or long term memory. Working memory stores information for immediate use or manipulation which is aided through hooking onto previously archived items already present in the long-term memory of an individual.

Visual encoding

Visual encoding is the process of encoding images and visual sensory information. This means that people can convert the new information that they stored into mental pictures (Harrison, C., Semin, A.,(2009). Psychology. New York p. 222) Visual sensory information is temporarily stored within our iconic memory [1] and working memory before being encoded into permanent long-term storage.[2][3] Baddeley’s model of working memory states that visual information is stored in the visuo-spatial sketchpad.[1] The amygdala is a complex structure that has an important role in visual encoding. It accepts visual input in addition to input from other systems and encodes the positive or negative values of conditioned stimuli.[4]

Elaborative Encoding

Elaborative Encoding is the process of actively relating new information to knowledge that is already in memory. Memories are a combination of old and new information, so the nature of any particular memory depends as much on the old information already in our memories as it does on the new information coming in through our senses. In other words, how we remember something depends in how we think about it at the time. Many studies have shown that long-term retention is greatly enhanced by elaborative encoding.[5]

Acoustic encoding

Acoustic encoding is the encoding of auditory impulses. According to Baddeley, processing of auditory information is aided by the concept of the phonological loop, which allows input within our echoic memory to be sub vocally rehearsed in order to facilitate remembering.[1] When we hear any word, we do so by hearing to individual sounds, one at a time. Hence the memory of the beginning of a new word is stored in our echoic memory until the whole sound has been perceived and recognized as a word.[6] Studies indicate that lexical, semantic and phonological factors interact in verbal working memory. The phonological similarity effect (PSE), is modified by word concreteness. This emphasizes that verbal working memory performance cannot exclusively be attributed to phonological or acoustic representation but also includes an interaction of linguistic representation.[7] What remains to be seen is whether linguistic representation is expressed at the time of recall or whether they[clarification needed] participate in a more fundamental role in encoding andpreservation.[

Page 14: Bidirectional Associative Memory

There are three or four main types of encoding:

Acoustic encoding is the processing and encoding of sound, words and other auditory input for storage and later retrieval. This is aided by the concept of the phonological loop, which allows input within our echoic memory to be sub-vocally rehearsed in order to facilitate remembering.

Visual encoding is the process of encoding images and visual sensory information. Visual sensory information is temporarily stored within the iconic memory before being encoded into long-term storage. The amygdala (within the medial temporal lobe of the brain which has a primary role in the processing of emotional reactions) fulfills an important role in visual encoding, as it accepts visual input in addition to input from other systems and encodes the positive or negative values of conditioned stimuli.

Tactile encoding is the encoding of how something feels, normally through the sense of touch. Physiologically, neurons in the primary somatosensory cortex of the brain react to vibrotactile stimuli caused by the feel of an object.

Semantic encoding is the process of encoding sensory input that has particular meaning or can be applied to a particular context, rather than deriving from a particular sense.

Fuzzy logicFrom Wikipedia, the free encyclopedia

Fuzzy logic is a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and exact. Compared to traditional binary sets (where variables may take on true or false values), fuzzy logic variables may have a truth value that ranges in degree between 0 and 1. Fuzzy logic has been extended to handle the concept of partial truth, where the truth value may range between completely true and completely false.[1] Furthermore, when linguistic variables are used, these degrees may be managed by specific functions. Irrationality can be described in terms of what is known as the fuzzjective.[citation needed]

The term "fuzzy logic" was introduced with the 1965 proposal of fuzzy set theory by Lotfi A. Zadeh.[2][3] Fuzzy logic has been applied to many fields, from control theory to artificial intelligence. Fuzzy logics however had been studied since the 1920s as infinite-valued logics notably by Łukasiewicz and Tarski.[4]

Contents 1 Overview

o 1.1 Applying truth values o 1.2 Linguistic variables

2 Early applications 3 Example

o 3.1 Hard science with IF-THEN rules 4 Logical analysis

??? Did You Know ???

When presented with a visual stimulus, the part of the brain which is activated the most depends on the nature of the image.A blurred image, for example, activates the visual cortex at the back of the brain most.An image of an unknown face activates the associative and frontal regions most.An image of a face which is already in working memory activates the frontal regions most, while the visual areas are scarcely stimulated at all.

Page 15: Bidirectional Associative Memory

o 4.1 Propositional fuzzy logics o 4.2 Predicate fuzzy logics o 4.3 Decidability issues for fuzzy logic o 4.4 Synthesis of fuzzy logic functions given in tabular form

5 Fuzzy databases 6 Comparison to probability 7 Relation to ecorithms 8 See also 9 References 10 Bibliography 11 External links

OverviewClassical logic only permits propositions having a value of truth or falsity. The notion of whether 1+1=2 is absolute, immutable, mathematical truth. However, there exist certain propositions with variable answers, such as asking various people to identify a color. The notion of truth doesn't fall by the wayside, but rather a means of representing and reasoning over partial knowledge is afforded, by aggregating all possible outcomes into a dimensional spectrum.

Both degrees of truth and probabilities range between 0 and 1 and hence may seem similar at first. For example, let a 100 ml glass contain 30 ml of water. Then we may consider two concepts: Empty and Full. The meaning of each of them can be represented by a certain fuzzy set. Then one might define the glass as being 0.7 empty and 0.3 full. Note that the concept of emptiness would be subjective and thus would depend on the observer or designer. Another designer might equally well design a set membership function where the glass would be considered full for all values down to 50 ml. It is essential to realize that fuzzy logic uses truth degrees as a mathematical model of the vagueness phenomenon while probability is a mathematical model of ignorance.

Applying truth values

A basic application might characterize subranges of a continuous variable. For instance, a temperature measurement for anti-lock brakes might have several separate membership functions defining particular temperature ranges needed to control the brakes properly. Each function maps the same temperature value to a truth value in the 0 to 1 range. These truth values can then be used to determine how the brakes should be controlled.

Page 16: Bidirectional Associative Memory

Fuzzy logic temperature

In this image, the meanings of the expressions cold, warm, and hot are represented by functions mapping a temperature scale. A point on that scale has three "truth values"—one for each of the three functions. The vertical line in the image represents a particular temperature that the three arrows (truth values) gauge. Since the red arrow points to zero, this temperature may be interpreted as "not hot". The orange arrow (pointing at 0.2) may describe it as "slightly warm" and the blue arrow (pointing at 0.8) "fairly cold".

Linguistic variables

While variables in mathematics usually take numerical values, in fuzzy logic applications, the non-numeric are often used to facilitate the expression of rules and facts.[5]

A linguistic variable such as age may have a value such as young or its antonym old. However, the great utility of linguistic variables is that they can be modified via linguistic hedges applied to primary terms. The linguistic hedges can be associated with certain functions.

Early applicationsThe Japanese were the first to utilize fuzzy logic for practical applications. The first notable application was on the high-speed train in Sendai, in which fuzzy logic was able to improve the economy, comfort, and precision of the ride.[6] It has also been used in recognition of hand written symbols in Sony pocket computers[citation needed], Canon auto-focus technology[citation needed], Omron auto-aiming cameras[citation needed], earthquake prediction and modeling at the Institute of Seismology Bureau of Metrology in Japan[citation needed], etc.

Example

Hard science with IF-THEN rules

Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in applying this is that the appropriate fuzzy operator may not be known. For this reason, fuzzy logic usually uses IF-THEN rules, or constructs that are equivalent, such as fuzzy associative matrices.

Rules are usually expressed in the form:IF variable IS property THEN action

For example, a simple temperature regulator that uses a fan might look like this:

IF temperature IS very cold THEN stop fanIF temperature IS cold THEN turn down fanIF temperature IS normal THEN maintain levelIF temperature IS hot THEN speed up fan

Page 17: Bidirectional Associative Memory

There is no "ELSE" – all of the rules are evaluated, because the temperature might be "cold" and "normal" at the same time to different degrees.

The AND, OR, and NOT operators of boolean logic exist in fuzzy logic, usually defined as the minimum, maximum, and complement; when they are defined this way, they are called the Zadeh operators. So for the fuzzy variables x and y:

NOT x = (1 - truth(x))x AND y = minimum(truth(x), truth(y))x OR y = maximum(truth(x), truth(y))

There are also other operators, more linguistic in nature, called hedges that can be applied. These are generally adverbs such as "very", or "somewhat", which modify the meaning of a set using a mathematical formula.

Logical analysisIn mathematical logic, there are several formal systems of "fuzzy logic"; most of them belong among so-called t-norm fuzzy logics.

Propositional fuzzy logics

The most important propositional fuzzy logics are:

Monoidal t-norm-based propositional fuzzy logic MTL is an axiomatization of logic where conjunction is defined by a left continuous t-norm, and implication is defined as the residuum of the t-norm. Its models correspond to MTL-algebras that are prelinear commutative bounded integral residuated lattices.

Basic propositional fuzzy logic BL is an extension of MTL logic where conjunction is defined by a continuous t-norm, and implication is also defined as the residuum of the t-norm. Its models correspond to BL-algebras.

Łukasiewicz fuzzy logic is the extension of basic fuzzy logic BL where standard conjunction is the Łukasiewicz t-norm. It has the axioms of basic fuzzy logic plus an axiom of double negation, and its models correspond to MV-algebras.

Gödel fuzzy logic is the extension of basic fuzzy logic BL where conjunction is Gödel t-norm. It has the axioms of BL plus an axiom of idempotence of conjunction, and its models are called G-algebras.

Product fuzzy logic is the extension of basic fuzzy logic BL where conjunction is product t-norm. It has the axioms of BL plus another axiom for cancellativity of conjunction, and its models are called product algebras.

Fuzzy logic with evaluated syntax (sometimes also called Pavelka's logic), denoted by EVŁ, is a further generalization of mathematical fuzzy logic. While the above kinds of fuzzy logic have traditional syntax and many-valued semantics, in EVŁ is evaluated also syntax. This means that each formula has an evaluation. Axiomatization of EVŁ stems from Łukasziewicz fuzzy logic. A generalization of classical Gödel completeness theorem is provable in EVŁ.

Predicate fuzzy logics

Page 18: Bidirectional Associative Memory

These extend the above-mentioned fuzzy logics by adding universal and existential quantifiers in a manner similar to the way that predicate logic is created from propositional logic. The semantics of the universal (resp. existential) quantifier in t-norm fuzzy logics is the infimum (resp. supremum) of the truth degrees of the instances of the quantified subformula.

Decidability issues for fuzzy logic

The notions of a "decidable subset" and "recursively enumerable subset" are basic ones for classical mathematics and classical logic. Thus the question of a suitable extension of these concepts to fuzzy set theory arises. A first proposal in such a direction was made by E.S. Santos by the notions of fuzzy Turing machine, Markov normal fuzzy algorithm and fuzzy program (see Santos 1970). Successively, L. Biacino and G. Gerla argued that the proposed definitions are rather questionable and therefore they proposed the following ones. Denote by Ü the set of rational numbers in [0,1]. Then a fuzzy subset s : S [0,1] of a set S is recursively enumerable if a recursive map h : S×N Ü exists such that, for every x in S, the function h(x,n) is increasing with respect to n and s(x) = lim h(x,n). We say that s is decidable if both s and its complement –s are recursively enumerable. An extension of such a theory to the general case of the L-subsets is possible (see Gerla 2006). The proposed definitions are well related with fuzzy logic. Indeed, the following theorem holds true (provided that the deduction apparatus of the considered fuzzy logic satisfies some obvious effectiveness property).

Theorem. Any axiomatizable fuzzy theory is recursively enumerable. In particular, the fuzzy set of logically true formulas is recursively enumerable in spite of the fact that the crisp set of valid formulas is not recursively enumerable, in general. Moreover, any axiomatizable and complete theory is decidable.

It is an open question to give supports for a Church thesis for fuzzy mathematics the proposed notion of recursive enumerability for fuzzy subsets is the adequate one. To this aim, an extension of the notions of fuzzy grammar and fuzzy Turing machine should be necessary (see for example Wiedermann's paper). Another open question is to start from this notion to find an extension of Gödel's theorems to fuzzy logic.

Synthesis of fuzzy logic functions given in tabular form

It is known that any boolean logic function could be represented using a truth table mapping each set of variable values into set of values {0,1}. The task of synthesis of boolean logic function given in tabular form is one of basic tasks in traditional logic that is solved via disjunctive (conjunctive) perfect normal form.

Each fuzzy (continuous) logic function could be represented by a choice table containing all possible variants of comparing arguments and their negations. A choice table maps each variant into value of an argument or a negation of an argument. For instance, for two arguments a row of choice table contains a variant of comparing values x1, ¬x1, x2, ¬x2 and the corresponding function value

f( x 2 ≤ ¬x1 ≤ x1 ≤ ¬x2 ) = ¬x1

Page 19: Bidirectional Associative Memory

The task of synthesis of fuzzy logic function given in tabular form was solved in.[7] New concepts of constituents of minimum and maximum were introduced. The sufficient and necessary conditions that a choice table defines a fuzzy logic function were derived.

Fuzzy databasesOnce fuzzy relations are defined, it is possible to develop fuzzy relational databases. The first fuzzy relational database, FRDB, appeared in Maria Zemankova's dissertation. Later, some other models arose like the Buckles-Petry model, the Prade-Testemale Model, the Umano-Fukami model or the GEFRED model by J.M. Medina, M.A. Vila et al. In the context of fuzzy databases, some fuzzy querying languages have been defined, highlighting the SQLf by P. Bosc et al. and the FSQL by J. Galindo et al. These languages define some structures in order to include fuzzy aspects in the SQL statements, like fuzzy conditions, fuzzy comparators, fuzzy constants, fuzzy constraints, fuzzy thresholds, linguistic labels and so on.

Much progress has been made to take fuzzy logic database applications to the web and let the world easily use them, for example: http://sullivansoftwaresystems.com/cgi-bin/fuzzy-logic-match-algorithm.cgi?SearchString=garia This enables fuzzy logic matching to be incorporated into a database system or application.

Comparison to probabilityFuzzy logic and probability are different ways of expressing uncertainty. While both fuzzy logic and probability theory can be used to represent subjective belief, fuzzy set theory uses the concept of fuzzy set membership (i.e., how much a variable is in a set), and probability theory uses the concept of subjective probability (i.e., how probable do I think that a variable is in a set). While this distinction is mostly philosophical, the fuzzy-logic-derived possibility measure is inherently different from the probability measure, hence they are not directly equivalent. However, many statisticians are persuaded by the work of Bruno de Finetti that only one kind of mathematical uncertainty is needed and thus fuzzy logic is unnecessary. On the other hand, Bart Kosko argues[citation needed] that probability is a subtheory of fuzzy logic, as probability only handles one kind of uncertainty. He also claims[citation needed] to have proven a derivation of Bayes' theorem from the concept of fuzzy subsethood. Lotfi A. Zadeh argues that fuzzy logic is different in character from probability, and is not a replacement for it. He fuzzified probability to fuzzy probability and also generalized it to what is called possibility theory. (cf.[8]) More generally, fuzzy logic is one of many different proposed extensions to classical logic, known as probabilistic logics, intended to deal with issues of uncertainty in classical logic, the inapplicability of probability theory in many domains, and the paradoxes of Dempster-Shafer theory.

Relation to ecorithmsHarvard's Dr. Leslie Valiant, co-author of the Valiant-Vazirani theorem, uses the term "ecorithms" to describe how many less exact systems and techniques like fuzzy logic (and "less robust" logic) can be applied to learning algorithms. Valiant essentially redefines machine

Page 20: Bidirectional Associative Memory

learning as evolutionary. Ecorithms and fuzzy logic also have the common property of dealing with possibilities more than probabilities, although feedback and feedforward, basically stochastic "weights," are a feature of both when dealing with, for example, dynamical systems.

In general use, ecorithms are algorithms that learn from their more complex environments (hence eco) to generalize, approximate and simplify solution logic. Like fuzzy logic, they are methods used to overcome continuous variables or systems too complex to completely enumerate or understand discretely or exactly. See in particular p. 58 of the reference comparing induction/invariance, robust, mathematical and other logical limits in computing, where techniques including fuzzy logic and natural data selection (ala "computational Darwinism") can be used to shortcut computational complexity and limits in a "practical" way (such as the brake temperature