finite state automata and simple recurrent networks

22
Communicated by Jeffrey Elman Finite State Automata and Simple Recurrent Networks Axel Cleeremans Department of Psychology, Carncgic-Mellon University, Pittsburgh, PA 15213 USA David Servan-Schreiber Departmmt of Computer Scienre, Carnegie-Mcllon University, Pittsburgh, PA 15213 USA James L. McClelland Department of Psychology, Cnmegir-hlcllon Universi ty, Pittsburgh, PA 15213 USA We explore a network architecture introduced by Elman (1988) for pre- dicting successive elements of a sequence. The network uses the pat- tern of activation over a set of hidden units from time-step t-1, together with element t, to predict element t + 1. When the network is trained with strings from a particular finite-state grammar, it can learn to be a perfect finite-state recognizer for the grammar. When the network has a minimal number of hidden units, patterns on the hidden units come to correspond to the nodes of the grammar, although this correspondence is not necessary for the network to act as a perfect finite-state recog- nizer. We explore the conditions under which the network can carry information about distant sequential contingencies across intervening elements. Such information is maintained with relative ease if it is relevant at each intermediate step; it tends to be lost when interven- ing elements do not depend on it. At first glance this may suggest that such networks are not relevant to natural language, in which dependen- cies may span indefinite distances. However, embeddings in natural language are not completely independent of earlier information. The final simulation shows that long distance sequential contingencies can be encoded by the network even if only subtle statistical properties of embedded strings depend on the early information. 1 Introduction Several connectionist architectures that are explicitly constrained to cap- ture sequential information have been proposed. Examples are Time De- lay Networks (for example, Sejnowski and Rosenberg 1986) - also called Neiirnl Computation 1, 372-381 (1989) @ 1989 Massachusetts Institute of Technology

Upload: caterina-carbone

Post on 29-Jan-2016

232 views

Category:

Documents


1 download

DESCRIPTION

Finite State Automata and Simple Recurrent Networks

TRANSCRIPT

Page 1: Finite State Automata and Simple Recurrent Networks

Communicated by Jeffrey Elman

Finite State Automata and Simple Recurrent Networks

Axel Cleeremans Department of Psychology, Carncgic-Mellon University, Pittsburgh, PA 15213 USA

David Servan-Schreiber Departmmt of Computer Scienre, Carnegie-Mcllon University, Pittsburgh, PA 15213 USA

James L. McClelland Department of Psychology, Cnmegir-hlcllon Universi ty, Pittsburgh, PA 15213 USA

We explore a network architecture introduced by Elman (1988) for pre- dicting successive elements of a sequence. The network uses the pat- tern of activation over a set of hidden units from time-step t-1, together with element t, to predict element t + 1. When the network is trained with strings from a particular finite-state grammar, it can learn to be a perfect finite-state recognizer for the grammar. When the network has a minimal number of hidden units, patterns on the hidden units come to correspond to the nodes of the grammar, although this correspondence is not necessary for the network to act as a perfect finite-state recog- nizer. We explore the conditions under which the network can carry information about distant sequential contingencies across intervening elements. Such information is maintained with relative ease if it is relevant at each intermediate step; it tends to be lost when interven- ing elements do not depend on it. At first glance this may suggest that such networks are not relevant to natural language, in which dependen- cies may span indefinite distances. However, embeddings in natural language are not completely independent of earlier information. The final simulation shows that long distance sequential contingencies can be encoded by the network even if only subtle statistical properties of embedded strings depend on the early information.

1 Introduction

Several connectionist architectures that are explicitly constrained to cap- ture sequential information have been proposed. Examples are Time De- lay Networks (for example, Sejnowski and Rosenberg 1986) - also called

Neiirnl Computation 1, 372-381 (1989) @ 1989 Massachusetts Institute of Technology

Page 2: Finite State Automata and Simple Recurrent Networks

Finite State Automata and Simple Recurrent Networks 373

Figure 1: The simple recurrent network (Elman 1988). In the SRN, the pattern of activation on the hidden units at time step t - 1, together with the new input pattern, is allowed to influence the pattern of activation at time step f . This is achieved by copying the pattern of activation on the hidden layer at time step t - 1 to a set of input units - called the "context units" - at time step t . All the forward connections in the network are subject to training via backpropagation.

"moving window" paradigms - or algorithms such as backpropagation in time (Rumelhart et al. 1986; Williams and Zipser 1988). Such architec- tures use explicit representations of several consecutive events, if not of the entire history of past inputs. Recently, Elman (1988) has introduced a simple recurrent network (SRN) that has the potential to master an infi- nite corpus of sequences with the limited means of a learning procedure that is completely local in time (Fig. 1).

In this paper, we show that the SRN can learn to mimic closely a finite-state automaton (FSA), both in its behavior and in its state repre- sentations. In particular, we show that it can learn to process an infinite corpus of strings based on experience with a finite set of training exem- plars. We then explore the capacity of this architecture to recognize and use nonlocal contingencies between elements of a sequence.

2 Discovering a Finite-State Grammar

In our first experiment, we asked whether the network could learn the contingencies implied by a small finite-state grammar (Fig. 2). The net- work was presented with strings derived from this grammar, and re- quired to try to predict the next letter at every step. These predictions

Page 3: Finite State Automata and Simple Recurrent Networks

374 Cleeremans, Servan-Schreiber, and McClelland

are context dependent since each letter appears twice in the grammar and is followed in each case by different successors.

A single unit on the input layer represented a given letter (six input units in total; five for the letters and one for a begin symbol "B) . Similar local representations were used on the output layer (with the "begin" symbol being replaced by an end symbol "E''). There were three hidden units.

2.1 Training. On each of 60,000 training trials, a string was generated from the grammar, starting with the "B." Successive arcs were selected randomly from the two possible continuations with a probability of 0.5. Each letter was then presented sequentially to the network. The activa- tions of the context units were reset to 0.5 at the beginning of each string. After each letter, the error between the network's prediction and the ac- tual successor specified by the string was computed and backpropagated. The 60,000 randomly generated strings ranged from 3 to 30 letters (mean, 7; SD, 3.3).

2.2 Performance. Three tests were conducted. First, we examined the network's predictions on a set of 70,000 random strings. During this test, the network is first presented with the "B," and one of the five letters or "E" is then selected at random as a successor. If that letter is predicted

S t a r t

S

X End

T

Figure 2: The small finite-state grammar (Reber 1967).

Page 4: Finite State Automata and Simple Recurrent Networks

Finite State Automata and Simple Recurrent Networks 375

by the network as a legal successor (that is, activation is above 0.3 for the corresponding unit), it is then presented to the input layer on the next time step, and another letter is drawn at random as its successor. This procedure is repeated as long as each letter is predicted as a legal successor until "E" is selected as the next letter. The procedure is inter- rupted as soon as the actual successor generated by the random procedure is not predicted by the network, and the string of letters is then consid- ered "rejected." A string is considered "accepted" if all its letters have been predicted as possible continuations up to "E." Of the 70,000 random strings, 0.3% happened to be grammatical and 99.7% were ungrammat- ical. The network performed flawlessly, accepting all the grammatical strings and rejecting all the others.

In a second test, we presented the network with 20,000 strings gener- ated at random from the grammar, that is, all these strings were grammat- ical. Using the same criterion as above, all of these strings were correctly "accepted."

Finally, we constructed a set of very long grammatical strings - more than 100 letters long - and verified that at each step the network cor- rectly predicted all the possible successors (activations above 0.3) and none of the other letters in the grammar.

2.3 Analysis of Internal Representations. What kinds of internal representations have developed over the set of hidden units that allow the network to associate the proper predictions to intrinsically ambigu- ous letters? We recorded the hidden units' activation patterns generated in response to the presentation of individual letters in different contexts. These activation vectors were then used as input to a cluster analysis program that groups them according to their similarity. Figure 3 shows the results of such an analysis conducted on a small random set of gram- matical strings. The patterns of activation are grouped according to the nodes of the grammar: all the patterns that are used to predict the succes- sors of a given node are grouped together independently of the current letter. This observation sheds some light on the behavior of the network: at each point in a sequence, the pattern of activation stored over the context units provides information about the current node in the gram- mar. Together with information about the current letter (represented on the input layer), this contextual information is used to produce a new pattern of activation over the hidden layer, that uniquely specifies the next node. In that sense, the network closely approximates the FSA that would encode the grammar from which the training exemplars were de- rived. However, a closer look at the cluster analysis reveals that within a cluster corresponding to a particular node, patterns are further divided according to the path traversed before the node is reached. For example, looking at the bottom cluster - node #5 - patterns produced by a "VV," "PS," "XS," or "SXS ending are grouped separately by the analysis: they are more similar to each other than to other examples of paths leading

Page 5: Finite State Automata and Simple Recurrent Networks

376 Cleeremans, Servan-Schreiber, and McClelland

to node #5. This tendency to preserve information about the path is not a characteristic of traditional finite-state automata.

It must be noted that an SRN can perform the string acceptance tests described above and still fail to produce representations that clearly de- lineate the nodes of the grammar as in the case shown in Figure 3. This tendency to approximate the behavior but not the representation of the grammar is exhibited when there are more hidden units than are absolutely necessary to perform the task. Thus, using representations that correspond to the nodes in the FSA shown in figure 2 is only one way to act in accordance with the grammar encoded by that machine: representations correspond to the nodes in the FSA only when resources are severely constrained. In other cases, the network tends to preserve the separate identities of the arcs, and may preserve other information about the path leading to a node, even when this is not task relevant.

3 Encoding Path Information

In a different set of experiments, we asked whether the network could learn to use the information about the path that is encoded in the hid- den units’ patterns of activation. In one of these experiments, we tested whether the network could master length constraints. When strings gen- erated from the small finite-state grammar may have a maximum of only eight letters, the prediction following the presentation of the same letter in position number six or seven may be different. For example, follow- ing the sequence “TSSSXXV,” “V” is the seventh letter and only another “V“ would be a legal successor. In contrast, following the sequence “TSSXXV,” both “V“ and “I”’ are legal successors. A network with 15 hidden units was trained on 21 of the 43 legal strings of length 3 to 8. It was able to use the small activation differences present over the context units - and due to the slightly different sequences presented - to mas- ter contingencies such as those illustrated above. Thus, after ”TSSXXV,” the network predicted “V“ or “I”’; but after “TSSSXXV,” it predicted only “V” as a legal successor.

How can information about the path be encoded in the hidden layer patterns of activation? As the initial papers about backpropagation point- ed out, the hidden unit patterns of activation represent an ”encoding” of the features of the input patterns that are relevant to the task. In the recurrent network, the hidden layer is presented with information about the current letter, but also - on the context layer - with an encoding of the relevant features of the previous letter. Thus, a given hidden layer pattern can come to encode information about the relevant features of two consecutive letters. When this pattern is fed back on the context layer, the new pattern of activation over the hidden units can come to encode information about three consecutive letters, and so on. In this

Page 6: Finite State Automata and Simple Recurrent Networks

Finite State Automata and Simple Recurrent Networks 377

-0.5 0.0 0.5 1 .o 1.5

Figure 3: Hierarchical cluster analysis of the hidden unit activation patterns after 60,000 presentations of strings generated at random from the finite-state grammar. A small set of strings was used to test the network. The single uppercase letter in each string shown in the figure corresponds to the letter actually presented to the network on that trial.

Page 7: Finite State Automata and Simple Recurrent Networks

378 Cleeremans, Servan-Schreiber, and McClelland

manner, the context layer patterns can allow the network to maintain prediction-relevant features of an entire sequence.

Learning progresses through three phases. During the first phase, the context information tends to be ignored because the patterns of activation on the hidden layer - of which the former are a copy - are changing continually as a result of the learning algorithm. In contrast, the network is able to pick up the stable association between each letter and all its pos- sible successors. At the end of this phase, the network thus predicts all the successors of each letter in the grammar, independently of the urc to which each letter corresponds. In the second phase, patterns copied on the context layer are now represented by a unique code designating which letter preceded the current letter, and the network can exploit this stability of the context information to start distinguishing between dif- ferent occurrences of the same letter - different arcs in the grammar. Finally, in a third phase, small differences in the context information that reflect the occurrence of previous elements can be used to differenti- ate position-dependent predictions resulting from length constraints (see Servan-Schreiber e f ul. 1988, for more details).

It is important to note that information about the path that is not locally relevant tends not to be encoded in the next hidden layer pattern. It may then be lost for subsequent processing. This tendency is decreased when the network has extra degrees of freedom (that is, more hidden units) so as to allow small and locally useless differences to survive for several processing steps.

4 Processing Embedded Sequences

This observation raises an important question about the relevance of this simple architecture to natural language processing. Any natural language processing system must have the ability to preserve information about long-distance contingencies in order to correctly process sentences con- taining embeddings. For example, in the following problem of number agreement, information about the head of the sentence has to be pre- served in order to make a correct grammatical judgment about the verb following the embedding:

The dog [that chased the cat] is playful The dogs [that chased the cat1 are playful

At first glance, however, information about the head of the sentence does not seem to be relevant for processing of the embedding itself. Yet, from the perspective of a system that is continuously generating expec- tations about possible succeeding events, information about the head is relevant within the embedding. For example, “itself” may follow the action verb only in the first sentence, whereas ”each other” is accept- able only in the second. There is ample empirical evidence to support

Page 8: Finite State Automata and Simple Recurrent Networks

Finite State Automata and Simple Recurrent Networks 379

the claim that human subjects d o generate expectations continuously in the course of natural language processing (see McClelland 1988, for a review).

To show that such expectations may contribute to maintaining infor- mation about a nonlocal context, we devised a new finite-state grammar in which the identity of the last letter depends on the identity of the first one (see Fig. 4). In a first experiment, contingencies within the embed- ded subgrammars were identical (0.5 on all arcs). Therefore, information about the initial letter is not locally relevant to predicting the successor of any letter in the embedding.

S

T

Figure 4: A complex finite-state grammar involving embedded sequences. The last letter is contingent on the first, and the intermediate structure is shared by the two branches of the grammar. In the first experiment, transition probabilities of all arcs were equal and set to 0.5. In the second experiment, the probabilities were biased toward the top arcs for the top embedding, and toward the bottom arcs for the bottom embedding. The numbers above each arc in the figure indicate the transition probabilities in the biased version of the grammar.

Page 9: Finite State Automata and Simple Recurrent Networks

380 Cleeremans, Servan-Schreiber, and McClelland

An SRN with 15 hidden units was trained on exemplars (mean length, 7; SD, 2.6) generated from this grammar for 2.4 million letter presen- tations. Even after such extensive training, the network indifferently predicted' both possible final letters, independently of the initial letter of the string.

However, when the contingencies within the embedded grammars depend, even very slightly, on the first letter of the string, the perfor- mance of the network can improve dramatically. For example, we ad- justed the transition probabilities in the subgrammars such that the top subgrammar was biased toward the top arcs and the bottom subgram- mar was biased toward the bottom arcs (see Fig. 4). An SRN trained on exemplars (mean length, 7; SD, 2.6) derived from this biased gram- mar performed considerably better. After 2.4 million presentations, the network correctly predicted the last letter of the string in 66.9% of the trials. There were 11.3% errors - that is, cases where the incorrect al- ternative was predicted - and 21.8% cases in which both alternatives were predicted about equally. Accuracy was very high for strings with a short embedding, and fell gradually to chance level at seven embed- ded elements. Note that we tested the network with strings generated from the unbiased version of the grammar. The network must therefore have learned to preserve information about the first letter throughout the embedding independently of whether the particular embedding is prob- able or not given the first letter. Moreover, with the biased grammar, some embeddings have exactly the same probability of occurring during training independently of whether the first letter is a "T" or a "I"' ( e g , "PVPXVV). As we have seen, training with such exemplars makes it very difficult' for the network to maintain information about the first let- ter. Yet, when such embeddings are part of an ensemble of embeddings that as a whole has a probability structure that depends on the first letter, information about this first letter can be preserved much more easily.

5 Conclusion

We have presented a network architecture first explored by Elman (1988) that is capable of mastering an infinite corpus of strings generated from a finite-state grammar after training on a finite set of exemplars with a learning algorithm that is local in time. When it contains only a min- imal set of hidden units, the network can develop internal representa-

'We considered that the final letter was correctly or mistakenly predicted if the Luce ratio of the activation of the corresponding unit to the sum of all activations on the output layer was above 0.6. Any Luce ratio below 0.6 was considered as a failure to respond (that is, a miss).

20ther experiments showed that when the embedding is totally independent of the head, the network's ability to maintain information about a nnnlocal context depends on the number of hidden units. For a given number of hidden units, learning time increases exponentially with the length of the embedding.

Page 10: Finite State Automata and Simple Recurrent Networks

Finite State Automata and Simple Recurrent Networks 381

tions that correspond to the nodes of the grammar, and closely approx- imates the corresponding minimal finite-state recognizer. We have also shown that the simple recurrent network is able to encode information about long-distance contingencies as long as information about critical past events is relevant at each time step for generating predictions about potential alternatives. Finally, we demonstrated that this architecture can be used to maintain information across embedded sequences, even when the probability structure of ensembles of embedded sequences depends only subtly on the head of the sequence. It appears that the SRN pos- sesses characteristics that recommend it for further study as one useful class of idealizations of the mechanism of natural language processing.

Acknowledgments

Axel Cleeremans was supported in part by a fellowship from the Bel- gian American Educational Foundation, and in part by a grant from the National Fund for Scientific Research (Belgium). David Servan-Schreiber was supported by an NIMH Individual Fellow Award MH-09696-01~ James L. McClelland was supported by an NIMH Research Scientist Career Development Award MH-00385. Support for computational re- sources was provided by NSF(BNS-86-09729) and ONR(N00014-86-G- 0146)

References

Elman, J.L. 1988. Finding Structure in Time. CRL Tech. Rep. 9901. Center for Research in Language, University of California, San Diego, CA.

McClelland, J.L. 1988. The case for interactionism in language processing. In Attention and Performaiice XII, M. Coltheart, ed. Erlbaum, London.

Reber, A.S. 1967. Implicit learning of artificial grammars. I. Verbal Leartiing Verbal Behau. 5, 855-863.

Rumelhart, D.E., Hinton, G.E., and Williams, R.J. 1986. Learning internal rep- resentations by backpropagating errors. Nature (London) 323, 533-536.

Sejnowski, T.J., and Rosenberg, C. 1986. NETtalk: A Parallel Network That Learns to Read Aloud. Tech. Rep. JHU-EECS-86-01, Johns Hopkins University.

Servan-Schreiber, D., Cleeremans, A,, and McClelland, J.L. 1988. Learnitig Se- quential Structure in Sirnple Recirrretit Netzuorks. Tech. Rep. CMU-CS-183, Carnegie-Mellon University.

Williams, R.J., and Zipser, D. 1988. A Learning Algorithm for Cotititiually Running Fully Recurrent Neural Netzuorks. ICS Tech. Rep. 8805. Institute for Cognitive Science, University of California, San Diego, CA.

Received 6 March 1989; accepted 20 April 1989.

Page 11: Finite State Automata and Simple Recurrent Networks

This article has been cited by:

1. Hari Mohan Pandey, Ankit Chaudhary, Deepti Mehrotra. 2016. Grammarinduction using bit masking oriented genetic algorithm and comparative analysis.Applied Soft Computing 38, 453-468. [CrossRef]

2. Cheng Hao Jin, Gouchol Pok, Yongmi Lee, Hyun-Woo Park, Kwang DeukKim, Unil Yun, Keun Ho Ryu. 2015. A SOM clustering pattern sequence-basednext symbol prediction method for day-ahead direct electricity load and priceforecasting. Energy Conversion and Management 90, 84-92. [CrossRef]

3. Luca Onnis, Arnaud Destrebecqz, Morten H. Christiansen, Nick Chater, AxelCleeremansImplicit learning of non-adjacent dependencies 213-246. [CrossRef]

4. Andrew Stevenson, James R. Cordy. 2014. A survey of grammatical inference insoftware engineering. Science of Computer Programming 96, 444-459. [CrossRef]

5. V. Vembarasan, P. Balasubramaniam, Chee Seng Chan. 2014. Non-fragile stateobserver design for neural networks with Markovian jumping parameters and time-delays. Nonlinear Analysis: Hybrid Systems 14, 61-73. [CrossRef]

6. Axel Cleeremans. 2014. Connecting Conscious and Unconscious Processing.Cognitive Science 38:10.1111/cogs.2014.38.issue-6, 1286-1315. [CrossRef]

7. Daniel O. Sales, Diogo O. Correa, Leandro C. Fernandes, Denis F. Wolf, FernandoS. Osório. 2014. Adaptive finite state machine based visual autonomous navigationsystem. Engineering Applications of Artificial Intelligence 29, 152-162. [CrossRef]

8. Branko Šter. 2013. Selective Recurrent Neural Network. Neural Processing Letters38, 1-15. [CrossRef]

9. Alexa R. Romberg, Jenny R. Saffran. 2013. All Together Now: ConcurrentLearning of Multiple Structures in an Artificial Language. Cognitive Science n/a-n/a. [CrossRef]

10. Daniel Oliva Sales, Fernando Santos OsorioVision-Based Autonomous TopologicalNavigation in Outdoor Environments 91-96. [CrossRef]

11. Franklin Chang, Marius Janciauskas, Hartmut Fitz. 2012. Language adaptation andlearning: Getting explicit about implicit learning. Language and Linguistics Compass6:10.1002/lnc3.2012.6.issue-5, 259-278. [CrossRef]

12. Urun Dogan, Johann Edelbrunner, Ioannis IossifidisAutonomous driving: Acomparison of machine learning techniques by means of the prediction of lanechange behavior 1837-1843. [CrossRef]

13. Xiaodi Li, Rajan Rakkiyappan. 2011. Robust asymptotic state estimation ofTakagi-Sugeno fuzzy Markovian jumping Hopfield neural networks with mixedinterval time-varying delays. Mathematical Methods in the Applied Sciences n/a-n/a. [CrossRef]

14. S. Lakshmanan, P. Balasubramaniam. 2011. New results of robust stability analysisfor neutral-type neural networks with time-varying delays and Markovian jumpingparameters 1 The work of authors was supported by Department of Science and

Page 12: Finite State Automata and Simple Recurrent Networks

Technology, New Delhi, India, under the sanctioned No. SR/S4/MS:485/07.Canadian Journal of Physics 89, 827-840. [CrossRef]

15. Wataru Hinoshita, Hiroaki Arie, Jun Tani, Hiroshi G. Okuno, Tetsuya Ogata.2011. Emergence of hierarchical structure mirroring linguistic composition in arecurrent neural network. Neural Networks 24, 311-320. [CrossRef]

16. RODRIGO MARTINS DA SILVA, NADIA NEDJAH, LUIZA DEMACEDO MOURELLE. 2011. HARDWARE IMPLEMENTATIONS OFMLP ARTIFICIAL NEURAL NETWORKS WITH CONFIGURABLETOPOLOGY. Journal of Circuits, Systems and Computers 20, 417-437. [CrossRef]

17. P. Balasubramaniam, V. Vembarasan, R. Rakkiyappan. 2011. Delay-dependentrobust exponential state estimation of Markovian jumping fuzzy Hopfield neuralnetworks with mixed random time-varying delays. Communications in NonlinearScience and Numerical Simulation 16, 2109-2129. [CrossRef]

18. Antoine Pasquali, Bert Timmermans, Axel Cleeremans. 2010. Know thyself:Metacognitive networks and measures of consciousness. Cognition 117, 182-190.[CrossRef]

19. Thomas Pronk, Ingmar Visser. 2010. The role of reversal frequency in learningnoisy second order conditional sequences. Consciousness and Cognition 19, 627-635.[CrossRef]

20. P. Balasubramaniam, S. Lakshmanan, S. Jeeva  Sathya  Theesar. 2010. Stateestimation for Markovian jumping recurrent neural networks with interval time-varying delays. Nonlinear Dynamics 60, 661-675. [CrossRef]

21. Todd M. Gureckis, Bradley C. Love. 2010. Direct Associations or InternalTransformations? Exploring the Mechanisms Underlying Sequential LearningBehavior. Cognitive Science 34, 10-50. [CrossRef]

22. Antonio Fernández-Caballero, María Teresa López, José Carlos Castillo,Saturnino Maldonado-Bascón. 2009. Real-Time Accumulative ComputationMotion Detectors. Sensors 9, 10044-10065. [CrossRef]

23. Randall K. Jamieson, D. J. K. Mewhort. 2009. Applying an exemplar model to theartificial-grammar task: String completion and performance on individual items.The Quarterly Journal of Experimental Psychology 63, 1014-1039. [CrossRef]

24. Yurong Liu, Zidong Wang, Jinling Liang, Xiaohui Liu. 2009. Stability andSynchronization of Discrete-Time Markovian Jumping Neural Networks WithMixed Mode-Dependent Time Delays. IEEE Transactions on Neural Networks 20,1102-1116. [CrossRef]

25. Subramoniam Perumal, Ali A. MinaiStable-yet-switchable (SyS) attractornetworks 2509-2516. [CrossRef]

26. Andrej Dobnikar, Branko Šter. 2009. Structural Properties of Recurrent NeuralNetworks. Neural Processing Letters 29, 75-88. [CrossRef]

Page 13: Finite State Automata and Simple Recurrent Networks

27. Randall Jamieson, D. J. K. Mewhort. 2009. Applying an exemplar model to theserial reaction-time task: Anticipating from experience. The Quarterly Journal ofExperimental Psychology 62, 1757-1783. [CrossRef]

28. Z WANG, Y LIU, X LIU. 2009. State estimation for jumping recurrent neuralnetworks with discrete and distributed delays☆. Neural Networks 22, 41-48.[CrossRef]

29. Ueruen Dogan, Hannes Edelbrunner, Ioannis IossifidisTowards a Driver Model:Preliminary Study of Lane Change Behavior 931-937. [CrossRef]

30. André Grüning. 2007. Elman Backpropagation as Reinforcement for SimpleRecurrent Networks. Neural Computation 19:11, 3108-3131. [Abstract] [PDF][PDF Plus]

31. Ingmar Visser, Maartje E. J. Raijmakers, Peter C. M. Molenaar. 2007.Characterizing sequence knowledge using online measures and hidden Markovmodels. Memory & Cognition 35, 1502-1517. [CrossRef]

32. Jeremy R. Reynolds, Jeffrey M. Zacks, Todd S. Braver. 2007. A ComputationalModel of Event Segmentation From Perceptual Prediction. Cognitive Science 31,613-643. [CrossRef]

33. Henrik Jacobsson. 2006. The Crystallizing Substochastic Sequential MachineExtractor: CrySSMEx. Neural Computation 18:9, 2211-2255. [Abstract] [PDF][PDF Plus]

34. Z WANG, Y LIU, L YU, X LIU. 2006. Exponential stability of delayed recurrentneural networks with Markovian jumping parameters☆. Physics Letters A 356,346-352. [CrossRef]

35. Branko Šter, Andrej Dobnikar. 2006. Modelling the Environment of a MobileRobot with the Embedded Flow State Machine. Journal of Intelligent and RoboticSystems 46, 181-199. [CrossRef]

36. Peter Tiňo, Ashely J. S. Mills. 2006. Learning Beyond Finite Memory in RecurrentNetworks of Spiking Neurons. Neural Computation 18:3, 591-613. [Abstract][PDF] [PDF Plus]

37. André Grüning. 2006. Stack-like and queue-like dynamics in recurrent neuralnetworks. Connection Science 18, 23-42. [CrossRef]

38. Jianghua Bao, P. MunroStructural Mapping with Identical Elements NeuralNetwork 870-874. [CrossRef]

39. Rick Dale, Michael J. Spivey. 2005. From apples and oranges to symbolicdynamics: a framework for conciliating notions of cognitive representation. Journalof Experimental & Theoretical Artificial Intelligence 17, 317-342. [CrossRef]

40. David Landy. 2005. Inside Doubt: On the Non-Identity of the Theory of Mind andPropositional Attitude Psychology. Minds and Machines 15, 399-414. [CrossRef]

Page 14: Finite State Automata and Simple Recurrent Networks

41. 2005. A multiobjective genetic algorithm for obtaining the optimal size ofa recurrent neural network for grammatical inference. Pattern Recognition 38,1444-1456. [CrossRef]

42. Masahiko Morita, Kouhei Matsuzawa, Shigemitsu Morokami. 2005. A modelof context-dependent association using selective desensitization of nonmonotonicneural elements. Systems and Computers in Japan 36:10.1002/scj.v36:7, 73-83.[CrossRef]

43. Henrik Jacobsson. 2005. Rule Extraction from Recurrent Neural Networks:ATaxonomy and Review. Neural Computation 17:6, 1223-1263. [Abstract] [PDF][PDF Plus]

44. Banshidhar Majhi ., Hasan Shalabi ., Mowafak Fathi .. 2005. FLANN BasedForecasting of S&P 500 Index. Information Technology Journal 4, 289-292.[CrossRef]

45. P Perruchet. 2004. The exploitation of distributional information in syllableprocessing. Journal of Neurolinguistics 17, 97-119. [CrossRef]

46. A. Vahed, C. W. Omlin. 2004. A Machine Learning Method for ExtractingSymbolic Knowledge from Recurrent Neural Networks. Neural Computation 16:1,59-71. [Abstract] [PDF] [PDF Plus]

47. P. Tino, M. Cernansky, L. Benuskova. 2004. Markovian Architectural Bias ofRecurrent Neural Networks. IEEE Transactions on Neural Networks 15, 6-15.[CrossRef]

48. S Chalup. 2003. Incremental training of first order recurrent neural networks topredict a context-sensitive language. Neural Networks 16, 955-972. [CrossRef]

49. Peter Tiňo, Barbara Hammer. 2003. Architectural Bias in Recurrent NeuralNetworks: Fractal Analysis. Neural Computation 15:8, 1931-1957. [Abstract][PDF] [PDF Plus]

50. P Sturt. 2003. Learning first-pass structural attachment preferences with dynamicgrammars and recursive neural networks. Cognition 88, 133-169. [CrossRef]

51. Garen Arevian, Stefan Wermter, Christo Panchev. 2003. Symbolic state transducersand recurrent neural preference machines for text mining. International Journal ofApproximate Reasoning 32, 237-258. [CrossRef]

52. I Gabrijel. 2003. On-line identification and reconstruction of finite automata withgeneralized recurrent neural networks. Neural Networks 16, 101-120. [CrossRef]

53. D PALMERBROWN, J TEPPER, H POWELL. 2002. Connectionist naturallanguage parsing. Trends in Cognitive Sciences 6, 437-442. [CrossRef]

54. Rahul Sarpeshkar, Micah O'Halloran. 2002. Scalable Hybrid Computation withSpikes. Neural Computation 14:9, 2003-2038. [Abstract] [PDF] [PDF Plus]

55. Jonathan Tepper, Heather Powell, Dominic Palmer-Brown. 2002. A corpus-basedconnectionist architecture for large-scale natural language parsing. ConnectionScience 14, 93-114. [CrossRef]

Page 15: Finite State Automata and Simple Recurrent Networks

56. Prahlad Gupta, Neal J. Cohen. 2002. Theoretical and computational analysis ofskill learning, repetition priming, and procedural memory. Psychological Review 109,401-448. [CrossRef]

57. A Blanco. 2001. Identification of fuzzy dynamic systems using Max–Min recurrentneural networks. Fuzzy Sets and Systems 122, 451-467. [CrossRef]

58. Paul Rodriguez. 2001. Simple Recurrent Networks Learn Context-Free andContext-Sensitive Languages by Counting. Neural Computation 13:9, 2093-2118.[Abstract] [PDF] [PDF Plus]

59. Peter Tiňo, Bill G. Horne, C. Lee Giles. 2001. Attractive Periodic Sets inDiscrete-Time Recurrent Networks (with Emphasis on Fixed-Point Stability andBifurcations in Two-Neuron Networks). Neural Computation 13:6, 1379-1414.[Abstract] [PDF] [PDF Plus]

60. A. Blanco, M. Delgado, M.C. Pegalajar. 2001. Fuzzy automaton inductionusing neural network. International Journal of Approximate Reasoning 27, 1-26.[CrossRef]

61. A. Paccanaro, G.E. Hinton. 2001. Learning distributed representations of conceptsusing linear relational embedding. IEEE Transactions on Knowledge and DataEngineering 13, 232-244. [CrossRef]

62. Stefan C. Kremer. 2001. Spatiotemporal Connectionist Networks: A Taxonomyand Review. Neural Computation 13:2, 249-306. [Abstract] [PDF] [PDF Plus]

63. SHOZO SATO, KAZUTOSHI GOHARA. 2001. FRACTAL TRANSITION INCONTINUOUS RECURRENT NEURAL NETWORKS. International Journalof Bifurcation and Chaos 11, 421-434. [CrossRef]

64. Hans Gleisner, Henrik Lundstedt. 2001. Auroral electrojet predictions withdynamic neural networks. Journal of Geophysical Research 106, 24541. [CrossRef]

65. Annette Kinder, Anja Assmann. 2000. Learning artificial grammars: No evidencefor the acquisition of rules. Memory & Cognition 28, 1321-1332. [CrossRef]

66. L Gupta. 2000. Investigating the prediction capabilities of the simple recurrentneural network on real temporal sequences. Pattern Recognition 33, 2075-2081.[CrossRef]

67. Felix A. Gers, Jürgen Schmidhuber, Fred Cummins. 2000. Learning to Forget:Continual Prediction with LSTM. Neural Computation 12:10, 2451-2471.[Abstract] [PDF] [PDF Plus]

68. L Gupta. 2000. Classification of temporal sequences via prediction using the simplerecurrent neural network. Pattern Recognition 33, 1759-1770. [CrossRef]

69. Rafael C. Carrasco, Mikel L. Forcada, M. Ángeles Valdés-Muñoz, Ramón P.Ñeco. 2000. Stable Encoding of Finite-State Machines in Discrete-Time RecurrentNeural Nets with Sigmoid Units. Neural Computation 12:9, 2129-2174. [Abstract][PDF] [PDF Plus]

Page 16: Finite State Automata and Simple Recurrent Networks

70. Faison P. Gibson. 2000. Feedback Delays: How Can Decision Makers Learn Notto Buy a New Car Every Time the Garage Is Empty?. Organizational Behavior andHuman Decision Processes 83, 141-166. [CrossRef]

71. A. Blanco, M. Delgado, M. C. Pegalajar. 2000. Extracting rules from a (fuzzy/crisp) recurrent neural network using a self-organizing map. International Journal ofIntelligent Systems 15:10.1002/(SICI)1098-111X(200007)15:7<>1.0.CO;2-S, 595-621. [CrossRef]

72. SHOZO SATO, KAZUTOSHI GOHARA. 2000. POINCARÉ MAPPINGOF CONTINUOUS RECURRENT NEURAL NETWORKS EXCITED BYTEMPORAL EXTERNAL INPUT. International Journal of Bifurcation andChaos 10, 1677-1695. [CrossRef]

73. K Arai. 2000. Stable behavior in a recurrent neural network for a finite statemachine. Neural Networks 13, 667-680. [CrossRef]

74. A. Blanco, M. Delgado, M.C. Pegalajar. 2000. A genetic algorithm to obtain theoptimal recurrent neural network. International Journal of Approximate Reasoning23, 67-83. [CrossRef]

75. S. Lawrence, C.L. Giles, S. Fong. 2000. Natural language grammatical inferencewith recurrent neural networks. IEEE Transactions on Knowledge and DataEngineering 12, 126-140. [CrossRef]

76. Mikel L. Forcada, Marco GoriNeural Nets, Recurrent . [CrossRef]77. C Lee Giles, Christian Omlin, K ThornberEquivalence in Knowledge

Representation . [CrossRef]78. Ivelin StoianovLessons From Language Learning . [CrossRef]79. C.L. Giles, C.W. Omlin, K.K. Thornber. 1999. Equivalence in knowledge

representation: automata, recurrent neural networks, and dynamical fuzzy systems.Proceedings of the IEEE 87, 1623-1640. [CrossRef]

80. D Rohde. 1999. Language acquisition in the absence of explicit negative evidence:how important is starting small?. Cognition 72, 67-109. [CrossRef]

81. MARCÍLIO C.P.DE SOUTO, PAULO J. L. ADEODATO, TERESA B.LUDERMIR. 1999. Sequential RAM-based Neural Networks: Learnability,Generalisation, Knowledge Extraction, And Grammatical Inference. InternationalJournal of Neural Systems 09, 203-210. [CrossRef]

82. Morten H Christiansen, Nick Chater. 1999. Toward a Connectionist Modelof Recursion in Human Linguistic Performance. Cognitive Science 23, 157-205.[CrossRef]

83. Theano Anastasopoulou, Nigel Harvey. 1999. Assessing Sequential Knowledgethrough Performance Measures: The Influence of Short-term Sequential Effects.The Quarterly Journal of Experimental Psychology Section A 52, 423-448. [CrossRef]

Page 17: Finite State Automata and Simple Recurrent Networks

84. P. Tino, M. Koteles. 1999. Extracting finite-state representations from recurrentneural networks trained on chaotic symbolic sequences. IEEE Transactions onNeural Networks 10, 284-302. [CrossRef]

85. Zoltán Dienes, Gerry T.M. Altmann, Shi-Ji Gao. 1999. Mapping across DomainsWithout Feedback: A Neural Network Model of Transfer of Implicit Knowledge.Cognitive Science 23, 53-82. [CrossRef]

86. A.B. Tickle, R. Andrews, M. Golea, J. Diederich. 1998. The truth will cometo light: directions and challenges in extracting the knowledge embedded withintrained artificial neural networks. IEEE Transactions on Neural Networks 9,1057-1068. [CrossRef]

87. Martin Redington, Nick Chater, Steven Finch. 1998. Distributional Information:A Powerful Cue for Acquiring Syntactic Categories. Cognitive Science 22, 425-469.[CrossRef]

88. L Teow. 1998. Effective learning in recurrent max–min neural networks. NeuralNetworks 11, 535-547. [CrossRef]

89. C.W. Omlin, K.K. Thornber, C.L. Giles. 1998. Fuzzy finite-state automata canbe deterministically encoded into recurrent neural networks. IEEE Transactions onFuzzy Systems 6, 76-89. [CrossRef]

90. Athanasios Tsadiras, Konstantinos Margaritis. 1998. Two neuron fuzzy cognitivemap dynamics. International Journal of Computer Mathematics 67, 47-75.[CrossRef]

91. Peter Tiňo, Bill G. Horne, C. Lee Giles, Pete C. CollingwoodFinite State Machinesand Recurrent Neural Networks — Automata and Dynamical Systems Approaches171-219. [CrossRef]

92. Sepp Hochreiter, Jürgen Schmidhuber. 1997. Long Short-Term Memory. NeuralComputation 9:8, 1735-1780. [Abstract] [PDF] [PDF Plus]

93. Alan D. Blair, Jordan B. Pollack. 1997. Analysis of Dynamical Recognizers. NeuralComputation 9:5, 1127-1142. [Abstract] [PDF] [PDF Plus]

94. Ephraim Nissan, Hava Siegelmann, Alex Galperin, Shuky Kimhi. 1997. Upgradingautomation for nuclear fuel in-core management: From the symbolic generation ofconfigurations, to the neural adaptation of heuristics. Engineering with Computers13, 1-19. [CrossRef]

95. Kevin Swingler. 1996. Financial prediction: Some pointers, pitfalls and commonerrors. Neural Computing & Applications 4, 192-197. [CrossRef]

96. Hava T. Siegelmann. 1996. RECURRENT NEURAL NETWORKSAND FINITE AUTOMATA. Computational Intelligence 12:10.1111/coin.1996.12.issue-4, 567-574. [CrossRef]

97. Wang DeLiang, Liu Xiaomei, Stanley C. Ahalt. 1996. On Temporal Generalizationof Simple Recurrent Networks. Neural Networks 9, 1099-1118. [CrossRef]

Page 18: Finite State Automata and Simple Recurrent Networks

98. S.C. Kremer. 1996. Comments on "Constructive learning of recurrent neuralnetworks: limitations of recurrent cascade correlation and a simple solution". IEEETransactions on Neural Networks 7, 1047-1051. [CrossRef]

99. Paolo Frasconi, Marco Gori, Marco Maggini, Giovanni Soda. 1996. Representationof finite state automata in Recurrent Radial Basis Function networks. MachineLearning 23, 5-32. [CrossRef]

100. C Omlin. 1996. Extraction of Rules from Discrete-time Recurrent NeuralNetworks. Neural Networks 9, 41-52. [CrossRef]

101. R. Alquézar, A. Sanfeliu. 1995. An Algebraic Framework to Represent Finite StateMachines in Single-Layer Recurrent Neural Networks. Neural Computation 7:5,931-949. [Abstract] [PDF] [PDF Plus]

102. B.A. Pearlmutter. 1995. Gradient calculations for dynamic recurrent neuralnetworks: a survey. IEEE Transactions on Neural Networks 6, 1212-1228.[CrossRef]

103. Veronique De Keyser. 1995. Time in ergonomics research. Ergonomics 38,1639-1660. [CrossRef]

104. Peter Tiňo, Jozef Šajda. 1995. Learning and Extracting Initial Mealy Automatawith a Modular Neural Network Model. Neural Computation 7:4, 822-844.[Abstract] [PDF] [PDF Plus]

105. C.L. Giles, Dong Chen, Guo-Zheng Sun, Hsing-Hen Chen, Yee-Chung Lee,M.W. Goudreau. 1995. Constructive learning of recurrent neural networks:limitations of recurrent cascade correlation and a simple solution. IEEETransactions on Neural Networks 6, 829-836. [CrossRef]

106. P. Frasconi, M. Gori, M. Maggini, G. Soda. 1995. Unified integration of explicitknowledge and learning by example in recurrent networks. IEEE Transactions onKnowledge and Data Engineering 7, 340-346. [CrossRef]

107. Mark W. Goudreau, C. Lee Giles. 1995. Using recurrent neural networks to learnthe structure of interconnection networks. Neural Networks 8, 793-804. [CrossRef]

108. C Lee Giles. 1995. Learning a class of large finite state machines with a recurrentneural network. Neural Networks 8, 1359-1365. [CrossRef]

109. S ROGERS. 1995. Neural networks for automatic target recognition. NeuralNetworks 8, 1153-1184. [CrossRef]

110. M Hirsch. 1995. Computing with dynamic attractors in neural networks. Biosystems34, 173-195. [CrossRef]

111. Itsuki Noda. 1995. A learning method for recurrent networks based onminimization of states of finite state machines. Systems and Computers in Japan26:10.1002/scj.v26:9, 50-60. [CrossRef]

112. Peter Manolios, Robert Fanelli. 1994. First-Order Recurrent Neural Networksand Deterministic Finite State Automata. Neural Computation 6:6, 1155-1173.[Abstract] [PDF] [PDF Plus]

Page 19: Finite State Automata and Simple Recurrent Networks

113. C.L. Giles, C.W. Omlin. 1994. Pruning recurrent neural networks for improvedgeneralization performance. IEEE Transactions on Neural Networks 5, 848-851.[CrossRef]

114. Zheng Zeng, R.M. Goodman, P. Smyth. 1994. Discrete recurrent neural networksfor grammatical inference. IEEE Transactions on Neural Networks 5, 320-330.[CrossRef]

115. M. Bianchini, M. Gori, M. Maggini. 1994. On the problem of local minima inrecurrent neural networks. IEEE Transactions on Neural Networks 5, 167-177.[CrossRef]

116. Luis Jiménez, Cástor Méndez, María José Lorda. 1994. Aprendizaje implícito: tresaproximaciones a la cuestión del aprendizaje sin conciencia. Estudios de Psicología15, 99-126. [CrossRef]

117. Zheng Zeng, Rodney M. Goodman, Padhraic Smyth. 1993. Learning Finite StateMachines With Self-Clustering Recurrent Networks. Neural Computation 5:6,976-990. [Abstract] [PDF] [PDF Plus]

118. M. Reczko. 1993. Protein Secondary Structure Prediction with Partially RecurrentNeural Networks. SAR and QSAR in Environmental Research 1, 153-159.[CrossRef]

119. Garrison Cottrell, Fu-Sheng Tsung. 1993. Learning Simple ArithmeticProcedures. Connection Science 5, 37-58. [CrossRef]

120. Francesca Albertini, Eduardo D. Sontag. 1993. For neural networks, functiondetermines form. Neural Networks 6, 975-990. [CrossRef]

121. C. LEE GILES, CHRISTIAN W. OMLIN. 1993. Extraction, Insertion andRefinement of Symbolic Rules in Dynamically Driven Recurrent Neural Networks.Connection Science 5, 307-337. [CrossRef]

122. J. Alvarez-Cercadillo, J. Ortega-Garcia, L.A. Hernandez-GomezContext modelingusing RNN for keyword detection 569-572 vol.1. [CrossRef]

123. A REBER. 1992. The cognitive unconscious: An evolutionary perspective.Consciousness and Cognition 1, 93-133. [CrossRef]

124. C. L. Giles, C. B. Miller, D. Chen, H. H. Chen, G. Z. Sun, Y. C. Lee. 1992.Learning and Extracting Finite State Automata with Second-Order RecurrentNeural Networks. Neural Computation 4:3, 393-405. [Abstract] [PDF] [PDFPlus]

125. Raymond L. Watrous, Gary M. Kuhn. 1992. Induction of Finite-State LanguagesUsing Second-Order Recurrent Networks. Neural Computation 4:3, 406-414.[Abstract] [PDF] [PDF Plus]

126. Hermann Moisl. 1992. Connectionist Finite State Natural Language Processing.Connection Science 4, 67-91. [CrossRef]

127. Morten Christiansen, Nick Chater. 1992. Connectionism, Learning and Meaning.Connection Science 4, 227-252. [CrossRef]

Page 20: Finite State Automata and Simple Recurrent Networks

128. Georg Schwarz. 1992. Connectionism, Processing, Memory. Connection Science 4,207-226. [CrossRef]

129. Nicholas J. SalesImplementing a Semantic Subsystem for a Model of LanguageAcquisition 1351-1355. [CrossRef]

130. David Servan-Schreiber, Axel Cleeremans, James L. Mcclelland. 1991. Gradedstate machines: The representation of temporal contingencies in simple recurrentnetworks. Machine Learning 7, 161-193. [CrossRef]

131. J.E.R. Staddon, J.L.O. Bueno. 1991. ON MODELS, BEHAVIORISM ANDTHE NEURAL BASIS OF LEARNING. Psychological Science 2:10.1111/psci.1991.2.issue-1, 3-11. [CrossRef]

132. Richard Maclin, Jude W. ShavlikRefining Domain Theories Expressed as Finite-State Automata 524-528. [CrossRef]

133. Christopher K.I. Williams, Geoffrey E. HintonMean field networks that learn todiscriminate temporally distorted strings 18-22. [CrossRef]

134. Patrick T.W. Hudson, R. Hans PhafORDERS OF APPROXIMATION OFNEURAL NETWORKS TO BRAIN STRUCTURE: LEVELS, MODULESAND COMPUTING POWER 1025-1028. [CrossRef]

135. J.C. ScholtesRecurrent Kohonen Self-Organization in Natural LanguageProcessing 1751-1754. [CrossRef]

136. Ronald J. Williams, Jing Peng. 1990. An Efficient Gradient-Based Algorithm forOn-Line Training of Recurrent Network Trajectories. Neural Computation 2:4,490-501. [Abstract] [PDF] [PDF Plus]

137. R. KamimuraApplication of temporal supervised learning algorithm to generationof natural language 201-207 vol.1. [CrossRef]

138. P. Tino, V. VojtekExtracting stochastic machines from recurrent neural networkstrained on complex symbolic sequences 551-558. [CrossRef]

139. I. NodaLearning method by overload 1357-1360. [CrossRef]140. C.W. Omlin, C.L. GilesPruning recurrent neural networks for improved

generalization performance 690-699. [CrossRef]141. M.L. Forcada, A.M. Corbi-Bellot, M. Gori, M. MagginiRecurrent neural networks

can learn simple, approximate regular languages 1527-1532. [CrossRef]142. K. Sundarakantham, S.M. ShalinieNatural language grammatical inference with

support vector machines 252-257. [CrossRef]143. A. Psarrou, Shaogang Gong, H. BuxtonModelling spatio-temporal trajectories and

face signatures on partially recurrent neural networks 2226-2231. [CrossRef]144. C.W. Omlin, C.L. Giles, C.B. MillerHeuristics for the extraction of rules from

discrete-time recurrent neural networks 33-38. [CrossRef]145. K. DoyaBifurcations in the learning of recurrent neural networks 2777-2780.

[CrossRef]

Page 21: Finite State Automata and Simple Recurrent Networks

146. Z. Zeng, R.M. Goodman, P. SmythSelf-clustering recurrent networks 33-38.[CrossRef]

147. M. CrucianuStructured representations developed by learning in recurrentnetworks 4405-4408. [CrossRef]

148. C.L. Giles, C.W. OmlinInserting rules into recurrent neural networks 13-22.[CrossRef]

149. J.C. ScholtesUnsupervised context learning in natural language processing107-112. [CrossRef]

150. M.L. HambadaOscillatory state machine 2179-2184. [CrossRef]151. J.-F. JodouinPutting the simple recurrent network to the test 1141-1146.

[CrossRef]152. K. Lindgren, A. Nilsson, M.G. Nordahl, I. RadeRegular language inference using

evolving neural networks 75-86. [CrossRef]153. R.J. WilliamsTraining recurrent networks using the extended Kalman filter

241-246. [CrossRef]154. C.W. Omlin, C.L. Giles, B.G. Horne, L.R. Leerink, T. LinTraining recurrent

neural networks with temporal input encodings 1267-1272. [CrossRef]155. S. Lawrence, C.L. Giles, Sandiway FongCan recurrent neural networks learn

natural language grammars? 1853-1858. [CrossRef]156. A. Vahed, C.W. OmlinRule extraction from recurrent neural networks using a

symbolic machine learning algorithm 712-717. [CrossRef]157. M. Kinouchi, M. HagiwaraMemorization of melodies by complex-valued recurrent

network 1324-1328. [CrossRef]158. M. Kinouchi, M. HagiwaraLearning temporal sequences by complex neurons with

local feedback 3165-3169. [CrossRef]159. T. Czernichow, A. Germond, B. Dorizzi, P. CaireImproving recurrent network load

forecasting 899-904. [CrossRef]160. P. Frasconi, M. Gori, M. Maggini, G. SodaAn unified approach for integrating

explicit knowledge and learning by example in recurrent networks 811-816.[CrossRef]

161. L.R. Leerink, M.A. JabriExtraction of high level sequential structure usingrecurrent neural networks and radial basis functions 69-72. [CrossRef]

162. D. Chen, C.L. Giles, G.Z. Sun, H.H. Chen, Y.C. Lee, M.W.GoudreauConstructive learning of recurrent neural networks 1196-1201.[CrossRef]

163. R. Kothari, K. AgyepongInduced specialization of context units for temporalpattern recognition and reproduction 131-140. [CrossRef]

164. S.C. KremerParallel stochastic grammar induction 1424-1428. [CrossRef]

Page 22: Finite State Automata and Simple Recurrent Networks

165. S. Wermter, G. ArevianModular preference Moore machines in news mining agents1792-1797. [CrossRef]