[studies in fuzziness and soft computing] innovative teaching and learning volume 36 ||

Innovative Teaching and Learning

Studies in Fuzziness and Soft Computing

Editor-in-chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw, Poland E-mail: [email protected]

Vol. 3. A. Geyer-Schulz Fuzzy Rule-Based Expert Systems and Genetic Machine Learning, 2nd ed. 1996 ISBN 3-7908-0964-0

Vol. 4. T. Onisawa and 1. Kacprzyk (Eds.) Reliability and Safety Analyses under Fuzziness, 1995 ISBN 3-7908-0837-7

Vol. 5. P. Bosc and 1. Kacprzyk (Eds.) Fuzziness in Database Management Systems, 1995 ISBN 3-7908-0858-X

Vol. 6. E. S. Lee and Q. Zhu Fuzzy and Evidence Reasoning, 1995 ISBN 3-7908-0880-6

Vol. 7. B. A. Juliano and W. Bandler Tracing Chains-(if-Thought, 1996 ISBN 3-7908-0922-5

Vol. 8. F. Herrera and 1. L. Verdegay (Eds.) Genetic Algorithms and Soft Computing, 1996 ISBN 3-7908-0956-X

Vol. 9. M. Sato et al. Fuzzy Clustering Models and Applications, 1997, ISBN 3-7908-1026-6

Vol. 10. L. C. Jain (Ed.) Soft Computing Techniques in Knowledgebased Intelligent Engineering Systems, 1997 ISBN 3-7908-1035-5

Vol. II. W. Mielczarski (Ed.) Fuzzy Logic Techniques in Power Systems, 1998, ISBN 3-7908-1044-4

Vol. 12. B. Bouchon-Meunier (Ed.) Aggregation and Fusion of Imperfect Information, 1998 ISBN 3-7908-1048-7

Vol. 13. E. Orlowska (Ed.) Incomplete l'1formation: Rough Set Analysis, 1998 ISBN 3-7908-1049-5

Vol. 14. E. Hisdal Logical Structures for Representation of Knowledge and Uncertainty, 1998 ISBN 3-7908-1056-8

Vol. 15. G.1. Klir and M.1. Wierman Uncertainty-Based Information, 2nd ed. 1999 ISBN 3-7908-1242-0

Vol. 16. D. Driankov and R. Palm (Eds.) Advances in Fuzzy Control, 1998 ISBN 3-7908-1090-8

Vol. 17. L. Reznik, V. Dimitrov and J. Kacprzyk (Eds.) Fuzzy Systems Design, 1998 ISBN 3-7908-1118-1

Vol. 18. L. Polkowski and A. Skowron (Eds.) Rough Sets in Knowledge Discovery I, 1998 ISBN 3-7908-1119-X

Vol. 19. L. Polkowski and A. Skowron (Eds.) Rough Sets in Knowledge Discovery 2, 1998 ISBN 3-7908-1120-3

Vol. 20. 1. N. Mordeson and P. S. Nair Fuzzy Mathematics, 1998 ISBN 3-7908-1121-1

Vol. 21. L. C. Jain and T. Fukuda (Eds.) Soft Computing for Intelligent Robotic Systems, 1998 ISBN 3-7908-1147-5

Vol. 22. J. Cardoso and H. Camargo (Eds.) Fuzziness in Petri Nets, 1999 ISBN 3-7908-1158-0

Vol. 23. P. S. Szczepaniak (Ed.) Computational Intelligence and Applications, 1999 ISBN 3-7908-1161-0

Vol. 24. E. Orlowska (Ed.) Logic at Work, 1999 ISBN 3-7908-1164-5

continued on page 335

Lakhmi C. J ain (Editor)

Innovative Teaching and Learning Knowledge-Based Paradigms

With 121 Figures and 18 Tables

Springer-Verlag Berlin Heidelberg GmbH

Professor Lakhmi C. Jain Director, KES Centre University of South Australia Adelaide Mawson Lakes South Australia 5095

Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Innovative teaching and learning: knowledged-based paradigms / Lakhmi C. Jain.

(Studies in Fuzziness and Soft Computing; Vol. 36)

This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer -Verlag Berlin Heidelberg GmbH . Violations a re liable for prosecution under the German Copyright Law.

© Springer-Verlag Berlin Heidelberg 2000 Originally published by Physic a-Verlag Heidelberg New York in 2000 Softcover reprint of the hardcover 1st edition 2000

The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protecti ve laws and regulations and therefore free for general use.

Hardcover Design: Erich Kirchner, Heidelberg

ISBN 978-3-7908-2465-0 ISBN 978-3-7908-1868-0 (eBook)DOI 10.1007/978-3-7908-1868-0

Dedication

This book is dedicated to all my students L.c. Jain

Preface

The engineers and scientists of tomorrow require a valid image of science and its interactions with technology and society to enable them to take an active informed role in society. Today's educational institutions are presented with the challenge of exposing students to ever-widening domains. Not only do mathematical techniques need to be addressed, but also computing techniques, and environmental and management aspects.

In the engineering field in particular, the rate of obsolescence is so high that curricula must be revised and updated much more frequently than ever before.

Traditional teaching methods cannot cope with this challenge, and hence there is a need to develop more effective teaching and learning strategies.

This book presents innovative teaching and learning techniques for the teaching of knowledge-based paradigms. The main knowledge-based intelligent paradigms are expert systems, artificial neural networks, fuzzy systems and evolutionary computing. Expert systems are designed to mimic the performance of biological systems. Artificial neural networks can mimic the biological information processing mechanism in a very limited sense. Evolutionary computing algorithms are used for optimization applications, and fuzzy logic provides a basis for representing uncertain and imprecise knowledge.

The first chapter, by Tedman and Jain, presents an introduction to innovative teaching and learning. A valid image of the nature of the interaction between science, technology and society is presented.

Chapter 2, by Lee and Liu, is on teaching and learning the AI modeling. Authors have presented their study into teaching tools to help learning and understanding the concepts of neural networks, fuzzy systems and genetic algorithms.

viii Preface

Chapter 3, by Karr, Sunal and Smith, describes an innovative course developed and taught at The University of Alabama, U.S.A. for students attending the College of Education. This course presents an overview of artificial intelligence (AI) techniques including expert systems, fuzzy systems, neural networks, and genetic algorithms. Its goal is to provide future educators with enough information about the science of the twenty-first century to effectively educate and motivate, their future students.

Chapter 4, by Vega-Riveros, presents the architecture of an intelligent tutoring system for a neural networks course. A new intelligent tutoring system architecture using collaborating agents is proposed.

Chapter 5, by Devedzi6, focuses on teaching knowledge modeling. It presents a survey of knowledge modeling techniques that are taught at the School of Business Administration and the School of Electrical Engineering University of Belgrade, Yugoslavia. Theoretical and architectural concepts, design approaches, and research issues of various knowledge modeling techniques used in the class room are discussed.

Chapter 6, by Devedzi6, Radovi6 and Jerini6, is devoted to innovative modeling techniques for intelligent tutoring systems. The inclusion of three modeling techniques in teaching environment are included.

Chapter 7, by Fulcher, is concerned with a teaching course on artificial neural networks. A key component of this course is the use of artificial neural networks simulator to undertake laboratory assignments. The visualization of key neural network parameters via the simulator has been found to significantly aid the students' learning process.

Chapter 8, by Hiyama, introduces an innovative education for fuzzy logic stabilization of electric power systems. Matlab/Simulink based transient stability simulation programs for multi-machine power systems are introduced. The programs are used to teach fuzzy logic stabilization of electric power systems as well in the development of generator controllers using fuzzy logic and neural networks.

Preface ix

Chapter 9, by Goh and Amarasinghe, describes a neural network workbench for teaching and learning. The workbench permits to create, train and test various neural network algorithms. One unique feature of this workbench is the use of real time displays for tracking progress when training a neural network.

The final chapter, by Higgins and Mansouri, outlines a coursework system for the automatic assessment of AI programs. The system usefully assesses students' work, improve learning, and allows the marking and assessment of students' progress while learning a particular programming language.

This book will be useful to professors, researchers, scientists, practicing engineers and students who wish to develop successful learning and teaching tools for the teaching of knowledge-based paradigms.

I wish to express my thanks to Berend Jan van der Zwaag and Ashlesha Jain, for their assistance in the preparation of the manuscript. I am grateful to the authors for their contributions. I also thank Professor Janusz Kacprzyk for the opportunity to publish this book, and the Springer-Verlag Company for their excellent editorial assistance.

L.C. Jain, Australia

Contents

Preface vii

Chapter 1 An Introduction to Innovative Teaching and 1 Learning D. Tedman and L.C. Jain, Australia

Chapter 2 Teaching and Learning the AI Modeling 31 R.S.T. Lee and J.N.K. Liu, Hong Kong

Chapter 3 Artificial Intelligence Techniques for an 87 Interdisciplinary Science Course c.L. Karr, C. Sunal, and C. Smith, U.S.A.

Chapter 4 On the Architecture of Intelligent Tutoring 105 Systems and Its Application to a Neural Networks Course J.P. Vega-Riveros, Colombia

ChapterS Teaching Knowledge Modeling at the Graduate 135 Level - a Case Study V. Devedzi6, Yugoslavia

Chapter 6 Innovative Modeling Techniques for Intelligent 189 Tutoring Systems V. Devedzi6, D. Radovi6 and L. Jerini6, Yugoslavia

Chapter 7 Teaching Course on Artificial Neural Networks 235 J. Fulcher, Australia

Chapter 8 Innovative Education for Fuzzy Logic 261 Stabilization of Electric Power Systems in a MatlablSimulink Environment T. Hiyama, Japan

xii Contents

Chapter 9 A Neural Network Workbench for Teaching and 289 Learning W.L. Goh and S.K. Amarasinghe, Singapore

Chapter 10 PRAM: A Courseware System for the Automatic 311 Assessment of AI Programs c.A. Higgins and F.Z. Mansouri, u.K.

Index 331

CHAPTERl

AN IN'"fRODUCTION TO INNOVATIVE TEACHING AND LEARNING

D. Tedman Flexible Learning Centre

University of South Australia Adelaide, Underdale, S.A. 5032

Australia

L.C. Jain Knowledge-Based Intelligent Engineering Systems Centre

University of South Australia Adelaide, Mawson Lakes, S.A. 5095

Australia

This chapter presents an introduction to innovative teaching and learning and knowledge-based intelligent paradigms. The intrinsic nature of knowledge-based intelligent techniques involves an accommodation with the pervasive imprecision of the real world, with the human mind as the role model [1]. Thus there are two important issues that should be considered in the design of effective teaching and learning strategies in this area. The first is the need for careful consideration of the discussions over the years by eminent researchers in regard to the epistemology and thinking processes involved in science and technology, as an appropriate starting point for the design of innovative teaching strategies for knowledge-based intelligent techniques. Secondly, since an aim of education in science and technology is to prepare students for their lives in societies which are increasingly dependent upon technology, reflection upon the nature of science and technology is of great benefit for the design of curricula and learning strategies in knowledge-based intelligent techniques.

2 D. Tedman and L.C. Jain

The main knowledge-based intelligent paradigms include expert systems, artificial neural networks, fuzzy systems and evolutionary computing. Expert systems are designed to mimic the performance of biological systems. Artificial neural networks can mimic the biological information processing mechanism in a very limited sense. Evolutionary computing algorithms are used for optimization applications, and fuzzy logic provides a basis for representing uncertain and imprecise knowledge.

1 Introduction

The knowledge-based intelligent paradigms are those that are inspired by an understanding of information processing in biological systems. When this is the case the process will include an element of adaptive or evolutionary behavior similar to biological systems, and like the biological model there will be a high level of interconnection between distributed processing elements [2]-[7]. We have at our disposal the necessary hardware and software for building knowledge-based systems. A number ~ of Universities in the world have established teaching and research programs in this field. It is also important that we invent and introduce innovative teaching and learning practice in this important area. Effective learning about knowledge-based intelligent techniques requires the development of a wide range of well-developed thinking techniques in students to enable them to develop an understanding of areas such as fuzzy logic, neural networks and evolutionary computing.

By developing a strong and coherent understanding of issues resulting from the interactions between Science, Technology and Society (STS) students would be empowered to take an active role in decision-making in regard to STS issues resulting from the use of knowledge-based intelligent techniques and similar technologies. University graduates would then be committed to ethical and social responsibility as professionals and citizens [8].

An Introduction to Innovative Teaching and Learning 3

1.1 The Nature of Work in Science and Technology

There is a need to present a revised view of science and technology that emphasizes the interaction between STS to university students. The STS view of science has been accepted gradually by scientists and educators, and a world-wide shift or reorientation towards the inclusion of STS objectives in science and technology courses has evolved. The impetus for the changing perception of science and reorientation of science and technology courses and curricula has been due to the writings of many scholars, e.g., see [9]-[12]. Their publications and theories about the nature and philosophy of science have changed understandings of the nature of science. Several decades later, these ideas are finding their way into education. The work of these and other eminent scholars provides an introduction to modem views of the nature and epistemology of science. Consideration of both changes in philosophical and epistemological models of science as well as the educational implications of this changing picture of science is a necessary foundation for the development of curricula and teaching strategies in science and technology courses.

1.2 The Epistemology of Science

Conant's views [9] on the strong influence that the attachment of scientists to some scientific theories had on the advancement of science were shared by Kuhn [10]. When Conant [9] was President of Harvard University, his work on the history of science inspired Kuhn [9] and thereby initiated a dramatic change in Kuhn's conception of the nature of scientific advance. Kuhn [13] suggested that "normal science" consisted of research based on past scientific achievements that received sufficient support from the scientific community to provide models for further scientific work. These models, or accepted examples of scientific practice, included law, theory, application and instrumentation, and Kuhn referred to them as "paradigms." Students were prepared for membership of the scientific community by studying the paradigms they would later practice. In Kuhn's model of the scientific method, research in a particular field necessitated a commitment to the rules and standards of practice prescribed for that branch of science.


When a paradigm ceased to explain all of the observations and would not stand up to testing, there was a transition to another conceptual scheme, or paradigm, through a "scientific revolution" [13]. Kuhn maintained that this was the characteristic developmental pattern of mature science. Paradigm change was radical, and Kuhn believed that paradigms were "incommensurable" and the progress of change from one paradigm to another was not entirely logical. Conant [9] suggested that a new conceptual scheme was accepted because it was at least as satisfactory as, or more satisfactory than, the old one in explaining the facts, and proved more productive of new experiments. Once a scientific community accepted a paradigm it also gained criteria for choosing problems that could be assumed to have solutions, but only as long as the paradigm was accepted. To a large extent, these were the only problem-solving exercises that the community would encourage its members to undertake [13].

The idea that a paradigm guided research conducted by scientists was shared by Polanyi [14]. As Polanyi suggested, such a view of the scientific method entails the presumption that any evidence which disagrees with the existing paradigm is invalid. Thus any deviant evidence is discarded, even if it could not be explained. This is a dangerous practice, but the scientific community protects itself by allowing some difference of opinion.

The concepts of the paradigm and normal science proposed by Kuhn have significant implications for education. Kuhn [13] wrote that after the transition to a new paradigm scientists must be re-educated in their perceptions of nature so that they are able to see things in a different way. After this has been accomplished, the scientist's new world of research would seem incommensurable with the previous one. Kuhn stressed that the observations and measurements that a scientist undertakes in the laboratory are not, therefore, what a scientist sees, but concrete indicators for the elaboration of an accepted paradigm. He argued that since it was difficult to make nature fit a paradigm, the puzzles of science were extremely challenging [13].

Kuhn was a practicing scientist before he became a philosopher and historian of science, and his work displays an accurate awareness of the ways in which scientists work. An unsubstantiated aspect of Kuhn's


work arises from his notion of the incommensurability of paradigms, since it is not clear how it is possible to progress closer to a more valid picture of nature by changing from one paradigm to another. Kuhn's view of science contrasts markedly with the view of Popper, and other philosophers who suggested that science is essentially a cumulative process.

Feyerabend [15] agreed with Kuhn in regard to the incommensurability of paradigms since there was no way of knowing that the new paradigm was better than the old. However, Feyerabend believed that the theory of science proposed by Kuhn was disquieting, since it might increase the anti-humanitarian tendencies of modem science. Feyerabend maintained that modem science was an ideology with insufficient concern for humanity. He therefore aimed to expose, demystify and weaken the hold of the scientific ideology.

Feyerabend's [15] evolutionary model of science was the synthesis of Lakatos's belief that proliferation and tenacity were both always present in science. In this model, development and growth occur as a result of scientists comparing the central paradigm with alternative theories. This comparison features the active interplay of various tenaciously held views. Feyerabend criticized Kuhn for failing to discuss the aim of science. One of his most cogent criticisms of Kuhn, however, was that in Kuhn's normal science when scientists struggle to articulate the paradigm and make it more coherent, they cease to be explorers and develop closed minds. Kuhn suggested, however, that scientific revolutions, or periods in which scientists lost faith in the prevailing paradigm occurred, and in certain circumstances competing theories or paradigms were accepted. It is important to consider the method used by scientists in the formulation of these competing theories.

2 The Empiricist-Inductivist Model of the Scientific Method

The scientific method has been portrayed traditionally as being ruled by empirical facts and logic [16]. Consequently, it is important to consider the derivation of these scientific facts. The empirical method of science implies that science starts with observation, and the universal


statements that make up scientific knowledge are then derived from the singular statements that result from observation by the process of induction. Inductivists claim that:

provided certain conditions are satisfied, it is legitimate to generalize from a finite list of singular statements to a universal law [17, p. 2].

The earlier English scholar, Bacon, suggested that science progressed by this empirical approach, which involved settling questions by direct observations, since proof always required solid evidence [18]. The empirical method of science comprises the gathering of "facts" by careful observation and experiment, and then deriving laws and theories from those facts by some kind of logical procedure [17]. Bacon argued that from a sufficiently large number of observations, the method of induction allowed generalizations to be formulated in laws and theories of nature. Bacon's empiricist-inductivist model came to be regarded as an account of the way in which scientific knowledge was processed. These ideas of the epistemology of science were embodied in logical positivism which dominated philosophy from the seventeenth century until the mid-twentieth century. However, Conant [9] argued that the method of "pure empiricism" was an ancient method of solving problems by experiment and "let's try and see" reasoning. Although this method had, in Conant's opinion, led to amazing results throughout the years, and was still a part of the modem scientific procedure, its role had been reduced by the activities of scientists in modem pure and applied science.

2.1 Logical Positivism and Falsificationism

Inductive reasoning in science features in the philosophical position known as logical positivism. Proponents of logical positivism believe that scientific events are meaningful if they can be verified by observation.

The Australian philosopher Chalmers, defined logical positivism as: an extreme form of empiricism according to which theories are not only to be justified by the extent to which they can be verified by an appeal to facts acquired through observation, but


are considered to have meaning only insofar as they can be so derived [17, p. xviii].

Chalmers advanced a potent admonition of this philosophical position. He wrote that quantum physics and Einstein's theory of relativity could not be explained by logical positivism.

Prior to Chalmers' consideration of the nature of the development of scientific understanding, Lakatos [11] had also argued that Einstein's results convinced many philosophers and scientists that positivism was not a valid view of the philosophy of science. Lakatos further suggested that this whole classical structure of intellectual values fell into ruins and had to be replaced. Lakatos believed that both Popper and Kuhn rejected the notion that evolution of scientific knowledge involved the accumulation of eternal truths. Furthermore, the overthrow of Newtonian physics by Einstein also provided inspiration for both of these philosophers. However, while Popper believed that everyday science was based upon criticism and was somewhat revolutionary by nature, in Kuhn's vision of normal science, revolution was exceptional. The clash between Popper and Kuhn concerned intellectual values. It has been argued that for Popper, scientific change was a rational process rather than what he termed the "religious conversions" that characterized Kuhn's scientific revolutions [11].

The German philosopher of science Popper [19], also refuted logical positivism in a convincing manner. In Popper's view, the problem with positivism was that generalizations made by induction could not be regarded as certain, since they could be overturned by another contrary event. Popper attacked this problem with induction. He criticized and undermined the logical positivists' view of science. Popper concluded that theories could never be conclusively proven by confirmation derived from repeated observations. He concluded that they could, however, be falsified by even one contradictory instance [18].

Popper wrote that scientific revolutions were induced by falsification of the theory or paradigm that was currently accepted. A statement or theory is falsifiable if at least one possible basic statement conflicts with it logically. The ideas of Popper and Kuhn as to falsificationism therefore display some agreement. Popper [19] also believed in the


conservative power of the paradigm, since he wrote that it was possible for a falsification to be insufficiently cogent to convince scientists to regard the existing theory as false. Popper contended that scientists then found a way of ignoring the empirical falsification by introducing a subsidiary hypothesis.

Lakatos [11] argued that dogmatic falsificationism was based upon false assumptions. The first incorrect assumption was that there was a natural borderline between theoretical propositions and observational propositions. The second assumption was that if a proposition was observational, then it was true. These assumptions are not entirely correct, as Lakatos asserted, and the implications for the nature of scientific knowledge are discussed below in the section on the theorydependence of observation. Lakatos [11] further contradicted the falsificationists' claim that theories were admitted as scientific when they were disprovable. He argued that it was necessary to label scientific theories like those of Newton, Maxwell and Einstein as "unscientific" since they could not be disproven or proven by a finite set of observations.

This refutation of falsificationism was supported by Chalmers [17] who suggested that the theory-dependence of observation rendered all observations fallible. As a consequence, conclusive falsifications of a theory were not possible. He also suggested that the inductivists did not give a true account of science. Chalmers used the Copernican Revolution as a case study to support his claims, since he asserted that in this case, the concepts of force and inertia arose from novel conceptions to which the proponents continued to adhere despite apparent falsifications. Furthermore, these concepts did not result from observation and experiment. Chalmers [17], argued that induction and falsification were inadequate accounts of science. Chalmers' suggestion that the theory-dependence of observation undermined the adequacy of the falsificationist account of the nature of science is supported by the argument developed in this chapter.

Popper and Eccles [12] asserted that although scientific theories and problems were made by humans, they were not always the result of planned production by individuals. Once theories existed, they produced previously invisible consequences, or problems. It was


argued, therefore, that the task of the scientist was to discover the relevant logical consequences of the new theory, and to discuss them in light of existing theories. He further wrote that scientific knowledge is objectively criticizable by arguments and tests. Tests are attempted refutations.

2.2 The Human Face of Science

Since scientific theories are made by human beings, who are subject to the whole range of human weaknesses and fallibilities, as well as strengths and inspirations, it is important to consider the human face of science in this discussion. What are the implications of the views of science as described by Conant, Kuhn, Popper and Eccles and Chalmers for the objectivity of science and the freedom of scientific inquiry? Much has been written about the value-laden nature of science, and the fact that there is not always a free flow of information in science. Kuhn completed his book with the following statement:

Scientific knowledge, like language, is intrinsically the common property of a group or else nothing at all. To understand it we shall need to know the special characteristics of the groups that create and use it [13, p. 210].

The STS view of science acknowledges that scientists, as humans, may be influenced by a number of factors when engaged in scientific activities. These factors, or values, which have a profound effect on the direction of science, include: religious views, gender issues, financial concerns such as the pursuit of research grants and rewarding salaries, legal issues, moral issues such as personal views on the violence of wars, and the desire for personal recognition and fame.

Polanyi [14] argued that since the existence of true human values, which motivate people, was acknowledged, the claim that human actions could be explained without any reference to the exercise of moral judgment was implicitly denied. The assertion that scientists made value-free scientific pronouncements was, therefore proved to be inconsistent. Polanyi concluded that if people explained all their human actions by value-free observations, then none of these persons' actions could claim to be motivated by moral values.


Science certainly is not a totally objective activity and scientists, as humans, can not be completely objective. Scientists often take one stance or another because they have a particular ideology based on their social position or education. Longino [22] addressed the question of how human cultural and personal values related to scientific practice. She suggested that science was governed by quite real values and normative constraints. Effective study of the methodology of science therefore operated on the basic understanding that scientific practice was influenced by scientists' subjective preferences regarding what ought to be.

Proponents of the inductive theory of science suggest that scientific knowledge is built by induction from the secure basis of observation, so it would therefore be reasonable to regard experience as the source of know ledge [17]. This theory also leads to the belief that scientists, and therefore science, can not always be totally objective. The method of induction involves framing a general hypothesis by generalizing from observed cases to all cases of the kind [23]. The central factor is the expectation that future cases will be like past ones and it cannot be expected that every trait shared by past cases would carry forward to future cases. Quine and Ullian [23] therefore concluded that induction is essentially just a matter of learning what to expect.

2.3 The Theory-Dependence of Observation

Kuhn argued that all scientific observations are theory-dependent. Observations may be guided by a hypothesis and they may be consequences of the hypothesis together with other assumptions that scientists make. In science, observation normally leads theory, but in extreme cases of well-established theories an observation which conflicts with the theory may be waived. The suggestion has been made by Quine and Ullian [23], that science as a whole is a system of hypotheses that accommodate all observations to date, minus such ones as scientists have found it in their conscience to pass over. These authors defined hypotheses as explanations which might be framed to make up the shortage in predicting the future provided by observations, plus self-evident truths .


There are many similarities between optical vision and the understanding of objects produced by humans. Humans "learn" to behave and to experience as if they were "direct realists." Therefore the learning process associated with objects and knowledge produced by humans is, according to Popper, not natural, but cultural and social. He suggested that learning occurs by practice and active participation rather than by direct vision or contemplation. In this process, published or incorporated theories may also play a role. Part of a scientist's training is "learning how to see" things in a particular way or experience in perception. It is, for example, difficult to perceive the mitochondrion in slides under the microscope before training and experience [12]. This eminent scholar concluded that all observations (and even more, all experiments), are "theory impregnated" since they are interpretations in the light of theories. Popper further wrote that humans observe only what their problems, interests, expectations and action programs make relevant.

The inductivist view of science, that science starts with observation, is, according to Chalmers [17], undermined by "the theory-dependence of observation." Chalmers explained the notion of the theory-dependence of observation when he discussed an experiment in which subjects were asked to draw a card from a pack. On drawing a red ace of spades, which had been printed and inserted in the pack by the researchers, subjects either called it a normal ace of diamonds or a normal ace of spades. In a regular pack of cards it is not possible to draw a red ace of spades, but only a black ace of spades or a red ace of diamonds. However, in this experiment, the personal experience and therefore knowledge and expectations of the observer, incorrectly determined what was seen. He concluded that:

what observers see, that is, the visual experience that observers have when viewing an object, depends in part on their past experience, their knowledge and their expectations [17, p. 25].

The views of Chalmers as to the theory-dependence of observation agree with Popper's views. Chalmers even suggested that the differences in what a person sees were not due to differences in interpretation, and concluded vehemently that visual experiences were


not given uniquely, but varied due to the knowledge and experience of the observer.

It is important to note that Chalmers focused his discussion on occasional scenarios and case studies that may occur, and examples that he referred to as "contrived." Most of these cases might have only a minor effect on scientific advance. Furthermore, it is true that within the scientific enterprise, power struggles develop, and these may affect the course of inquiry, as is true of most academic enterprises.

2.4 A Realistic View of Science

There is a need to present a more realistic picture of science than either that of positivism or falsificationism. In addition to Chalmers' account of the way in which the theory-dependence of observation may sometimes affect the objectivity of science, his particular view of objectivism is more realistic than the traditional philosophical accounts. However, this view only applies in some instances to the scientific method. Objectivists stress that knowledge has properties that may transcend the beliefs of the individuals who devise and contemplate it. In the analysis of knowledge the first concern of objectivism is with the characteristics of knowledge rather than with the attitudes or beliefs of individuals. While advocates of constructivism and post-structuralism decry this assertion, as Chalmers suggested, it has a place in science.

Chalmers believed that propositions have objective properties that were independent of an individual's awareness. He explained this belief by citing Maxwell's lack of awareness of one of the most dramatic consequences of his own electromagnetic theory, which involved the prediction of a new kind of phenomenon, radio waves. In addition, Maxwell's work undermined the view that the material world might be explained according to Newton's laws.

A cogent argument in support of Chalmer's claims is that science is a complex social activity in which the results of experimental work are subjected to critical appraisal and scrutiny by other scientists. These scientific colleagues either review the work of other scientists by conducting further testing procedures, or by acting as referees for journals [17]. Science has an invisible series of checks and balances


that ensures, in most cases, that the power structures in science can be relied on to serve the general interests of the scientific community. Some scientific ideas and technological applications also receive public scrutiny through the media and public debate.

Science should not be defined by its negative aspects since it is easily seen from the effective use of science in contemporary societies that, on the whole, science works. Consequently, the positive features of science and the methods which gave rise to them, as well as a realistic appraisal of the limitations of science should characterize STS education. Ziman [24], discussed this realistic "positive" picture of science, and concluded that STS education should be concerned with what science is and what science knows and does. He wrote that science is a social process that receives communal validation by critical interactions and agreement achieved between scientists.

2.5 Implications for Science and Technology Education

The preceding outline of the STS view of science highlights the need for science and technology courses to shift from presenting science as an objective body of absolute truth to a more accurate representation of science. Science should be viewed as a way in which humans explain the world around them. As a consequence, science may be influenced by social ideals and values, as well as by the values of individuals. This view has gained a great deal of support in the past 20 years. In 1980, Ziman wrote that science was being taught as a "valid" field of learning with a true spirit that was objective, broad-minded, critical and creative. He believed that this inadequately prepared students for life in the real world, since it was ill-founded and presented scientific knowledge as material to be valued without separate justification.

2.5.1 The Social Construction of Science

As Ziman [24] has emphasized, "valid" science can not be taught as if it were unconnected with the world around it, since there are many ways in which science and society are linked, especially through the technological applications of science. The importance of this interconnection between society and science was upheld by Holton [25]


when he wrote that any product of scientific work was profoundly influenced by the sociological setting in which the science was developed, the cultural context of the time, and the scientific knowledge of the larger community. Science is socially constructed in this way, so it is not possible to teach science in a valid manner without looking at areas of STS education such as the influence of sociological and cultural factors, including politics and religion, upon the progress of science. These controversial issues, and conflicts between their supporters, are extremely important features of the social context of science, and should be included in science courses which aim to prepare students to function as vital participants in a changing world.

Lowe [26] agreed with this view that science education should not focus upon the objectivity of science, when he suggested that science should not be taught as a stable body of eternal truths, but as a sequence of different world views. He strongly emphasized the STS view that observations in science are theory-bound and are thus affected by subjective influences. Science is not socially neutral and values play an important role in the choice or acceptance of paradigms. As Lowe therefore concluded, a science course that included discussion of STS issues would present a much more realistic impression of scientific activities.

2.5.2 The Need to Revise Science and Technology Courses Constantly

The profound social and economic changes that have occurred in recent decades are only an indication of the changes that will occur in the lifetimes of present students. The past 30 years have also witnessed significant developments in science and technology that are of vital importance to the community and may also profoundly affect their lives. The interaction between society and these major developments in science and technology present issues upon which citizens need to formulate value judgments. The requirement for citizens to make decisions in relation to these science-based issues therefore highlights the need to teach science and technology courses in a valid manner. Solomon [27] supported this view and stressed that science education must be a vital ingredient in people's thinking about science-based issues which the members of society have to resolve, such as global


environmental problems and the use of new medical technologies and computers.

The discussion in this chapter is based on the assertion that as times change, so too do the purposes of education. This assertion highlights the need to revise science and technology courses constantly. The aims and goals of education in science and technology have changed in order to inform students of the relationships between science, technology and society, as well as equipping them with the basis to make decisions on issues that profoundly affect their lives. Aikenhead and Ryan [28], agreed with the need for revision of science courses when he concluded that to provide motivation for students, it is important that science and technology are taught in a way that gives students some relevant, meaningful reference points in terms of their experience and knowledge of the world around them. Science and technology educators who value a student-centered, socially-relevant science syllabus therefore promote an STS approach.

A central goal of education is to help students to understand their environment and themselves. Humans aim to use the earth's resources to improve their quality of life and effective science education should encourage them to do this responsibly. Science, technology and society education should be an extremely important part of the curriculum as the resolution of a number of social problems depends, in part, on knowledge and skills in science and technology. In addition, this type of science and technology education offers a sound means of developing skills and insights that have educational value well beyond science and technology themselves. Cutcliffe [29], suggested that the major curricular mission of the inter-disciplinary field of STS education was to present science and technology as enterprises that are shaped by and help to shape human values and, in tum, cultural, political and economic institutions.

Science does not hold all of the answers to current social problems. It is important that science and technology courses have been revised to present a realistic picture of science, which includes considerations of science as an evolving view of the world rather than as an objective body of absolute fact. The philosophy and epistemology of science are important considerations for science and technology courses, as


discussed in this chapter. This shift in science and technology education to include studies in STS allows students to develop a realistic understanding of the nature of science and technology. Thus, as Ziman [24] concluded, STS education has become a vehicle by which tolerance of controversy, diverse opinions and unpredictability of the outcome of action may be illustrated to the student.

Change is necessary for any scientific community which is to thrive. Scientists and technologists must continually change their techniques in response to their discoveries. Thus the best scientists and technologists must also possess individual abilities which enable them to use techniques such as the prophetic guesses [30]. The major "leaps forward" or major contributions to science have employed experimental techniques other than those which adhered to the traditional scientific method [24]. These great contributions to knowledge have also required personal traits such as highly-developed intelligence, imagination, originality and creativity. Conant [9] supported this view when he emphasized his belief that brilliant hypotheses often originated in the minds of scientists by processes he described as an "inspired guess", "intuitive hunch" or a "brilliant flash of imagination". When students of knowledge-based intelligent techniques understand that these sorts of creative methods have been used successfully in science and technology throughout history, they will be confident in the use of the methods that facilitate work in this field. They will also be empowered for thinking and learning about knowledge-based intelligent paradigms such as fuzzy systems, evolutionary computing, artificial neural networks and expert systems. Having established the need for using STS as a foundation for courses in knowledge-based intelligent techniques, the pedagogical basis for teaching the issues of knowledge-based intelligent techniques needs to be addressed.

3 Knowledge-Based Intelligent Paradigms The main knowledge-based intelligent paradigms include expert systems, artificial neural networks, evolutionary computing and fuzzy systems. In the following sections we present these paradigms briefly.


3.1 Expert Systems

Expert systems, a subset of knowledge-based systems, have not only proved useful in configuring computers and in medical diagnosis but also, during the last decade in particular, in a wide spectrum of applications in virtually every area of engineering, science, and business. The major components of an expert system are a knowledge base, an inference engine, an explanation facility and a knowledge acquisition facility. A knowledge engineer gathers the expertise about a particular domain from one or more experts, and organizes that knowledge into the form required by the particular expert system tool that is to be used. The engineered knowledge is called the knowledge base.

The inference engine is the driver program. It traverses the knowledge base to provide possible outcomes or conclusion. The explanation facility is simply the appropriate part of the knowledge base. The knowledge acquisition facility is generally an integral part of the expert system. Most expert systems available for use are shells. It is the responsibility of users to organize the creation of the required knowledge bases. The expert systems have been used successfully in many applications including design, diagnosis, control [31]-[33], monitoring and prediction.

3.2 Artificial Neural Networks

Artificial Neural Networks (ANNs) [34] mimic biological information processing mechanisms. They are typically designed to perform a nonlinear mapping from a set of inputs to a set of outputs. ANNs are developed to try to achieve biological system type performance using a dense interconnection of simple processing elements analogous to biological neurons. ANNs are information driven rather than data driven. They are non-programmed adaptive information processing systems that can autonomously develop operational capabilities in response to an information environment. ANNs learn from experience and generalize from previous examples. They modify their behavior in response to the environment, and are ideal in cases where the required mapping algorithm is not known and tolerance to faulty input information is required. Feed-forward neural networks are popular and


they include perceptrons, multilayer neural networks and radial basis function networks.

The multilayer neural network is composed of at least three layers of neurons. The neurons perform a weighted sum of their inputs and use this sum as the input of an activation function. A supervised learning algorithm is typically used to teach the network. It consists of updating all the weights of a network until the output of the network reaches the pre-specified desired output.

Artificial neural networks are increasingly successfully used in various applications [35], [36] including areas as diverse as image processing, optimization, diagnosis, speech recognition and stock market prediction.

3.3 Evolutionary Computing

Evolutionary computation [37], [38] is the name given to a collection of algorithms based on the evolution of a population toward a solution of a certain problem. These algorithms can be used successfully in many applications requiring the optimization of a certain multi-dimensional function. The population of possible solutions evolves from one generation to the next, ultimately arriving at a satisfactory solution to the problem. These algorithms differ in the way a new population is generated from the present one, and in the way the members are represented within the algorithm. Three types of evolutionary computing techniques have been widely reported recently. These are Genetic Algorithms (GAs), Genetic Programming (GP) and Evolutionary Algorithms (EAs). The EAs can be divided into Evolutionary Strategies (ES) and Evolutionary Programming (EP). All three of these algorithms are modeled in some way after the evolutionary processes occurring in nature.

Genetic algorithms (GAs) are very popular and used in many areas including design, diagnosis, optimization, economics, business and scheduling. A number of software tools are available in the market to implement these algorithms.


3.4 Fuzzy Logic

Fuzzy logic was first developed by Zadeh in the mid 1960s for representing uncertain and imprecise knowledge [39]. It provides an approximate but effective means of describing the behavior of systems that are too complex, ill-defined, or not easily analyzed mathematically. Fuzzy variables are processed using a system called a fuzzy logic controller. It involves fuzzification, fuzzy inference, and defuzzification. The fuzzification process converts a crisp input value to a fuzzy value. The fuzzy inference is responsible for drawing conclusions from the knowledge base. The defuzzification process converts the fuzzy control actions into a crisp control action.

Fuzzy logic techniques have been successfully applied in a number of applications: computer vision, diagnosis [40], [41], decision making, and system design including ANN training. The most extensive use of fuzzy logic is in the area of control, where examples include controllers for cement kilns, braking systems, elevators, washing machines, hot water heaters, air-conditioners, video cameras, rice cookers, and photocopiers.

4 Constructivism and Teaching and Learning about Knowledge-Based Intelligent Techniques

Students develop ideas and beliefs about the physical world long before they enter science classes, and the sound pedagogical principle of constructivism assumes that meaningful learning takes place when these ideas, as well as new ones, are used by the individual to make sense of the world around them. Furthermore, according to the constructivist view of learning, the meanings constructed in given situations are influenced by the individuals' knowledge and belief structures. The construction of meaning is a process which involves active participation by the learner [42].

Meaningful learning can occur in science and technology courses with an STS emphasis, since students construct their own meaning of events


by active participation, by reflection, and by practice at transferring a scientific idea to an everyday context. In this way, students may incorporate new ideas into their existing common-sense knowledge framework, or replace these ideas with more precise scientific concepts [28]. In his article, the suggestion that scientific experiments and their outcomes are social constructions showed that Robottom [43] agreed with the previous statement. Furthermore, the author stressed that a more "human" and socially sympathetic form of science education was needed in order to "demystify science." Robottom reached the timely conclusion that a reassessment of the relationships between science and society embedded in science courses was necessary.

However, Driver [44] reported that students' attitudes were influenced by out-of-school factors, and that they did not easily use the scientific knowledge that was taught to them in conjunction with personal evaluation for social decision-making. It has also been suggested that in the constructivist approach to science teaching the incorrect ideas may serve as a foundation for the construction of further knowledge [45]. Educators should therefore employ teaching strategies that provide opportunities for students to discuss their existing ideas before constructing further understandings. Incorporation of STS issues would also provide motivation for students, by presenting science and technology in a way that would give students some meaningful reference points in terms of their experience and knowledge of the world around them [46].

It is important for educators to make allowance for the students' understanding of both the scientific concepts being covered and their social context. This would lead to effective student-centered education for life, since scientific knowledge is not received impersonally, but comes as part of life in the real world and is influenced by the values and views of the students. Constructivism can be an underlying principle for innovative teaching and learning strategies such as flexible learning and problem-based learning.

An Introduction to Innovative Teaching and Learning

5 Flexible Learning and KnowledgeBased Intelligent Techniques

21

The philosophical ideals of flexible learning center upon increasing access to education and the control that learners have over their learning. Learning is generally considered to be more flexible when it caters for a wider range of learners and their learning styles and needs [47].

Flexible learning through online teaching and computer mediated communication (CMC) has many potential advantages that would assist both teaching and learning of knowledge-based intelligent techniques. These include:

1. the fostering of lifelong learning through the use of the internet [48];

2. the access that students have to a much wider range of educational sources than are available on one university campus [49];

3. the possibility of multimedia presentations, group discussions, and the provision of local and global classrooms [50]; and

4. the freeing up of lecturer time for other teaching and research activity, since some lectures and tutorials can be replaced by online materials [47].

After consideration of the potential benefits of CMC for teaching and learning, it is unfortunate that a recent survey of students in the Northern Territory University demonstrated low levels of student satisfaction with computer mediated approaches to teaching and learning. After the students were exposed to approaches to course delivery ranging from traditional lectures to computer mediated approaches, they ranked computer-mediated approaches fourteenth out of the fourteen approaches for both effectiveness and comfort [51].

Lecturers' reactions to online teaching have not been overwhelmingly positive in all cases. Moran [52] suggested that the problem stemmed from lecturers' unwillingness to "surrender their traditional authorities


and powers for a more traditional role". The response by academics to alternative delivery styles appears to be in terms of the additional personal resources required to achieve levels of understanding, expertise and satisfaction in comparison with that experienced after the more traditional approach consisting of lectures and tutorials [51].

However, online teaching represents an opportunity for academics to reflect on their current practice and consider the potential of other strategies for teaching and learning. Shaw [50] argues that the evolution of online teaching cannot be resisted since "efforts to resist this change are a little like trying to stop the tide. In any case who would want to -it's more fun surfing."

Since computers have the potential to provide students with more flexible learning opportunities, it is important to consider ways in which the use of online teaching in the university learning experience can be supported. It appears that lecturers of knowledge-based intelligent techniques would not mount strong resistance to the use of computers in the teaching of their courses. In the light of this consideration, it is necessary to find ways in which students' positive experience of online teaching can be enhanced.

In order to achieve the potential benefits offered by computer mediated instruction, it is important to change teaching to incorporate technology rather than try to adapt the traditional ways of teaching for technology. Petit [49] believes that this will engender greater freedom in education than was possible with the traditional didactic approach.

The advantages of mixed -mode delivery have been strongly argued by Palmer [53]. The use of more than one learning mode has the advantage of providing students with the opportunity to increase the quality of learning rather than the amount of learning.

Christie [54] contended that communication through computer mediated communication "cannot replace what happens when a group of human beings meet face to face to work towards a common educational goal". Face to face discussion is fundamental for quality learning since learning is a social process which involves interacting with others. This mind and body interaction includes body language to


convey meaning [49]. Christie [54] agrees that this is especially the case for groups comprised of students of different languages and cultures, regardless of how sophisticated the computer is.

There is a need for students to be engaged in dialogue in regard to what is being presented to them. This is a cogent consideration in an area such as knowledge-based intelligent techniques, where constructing understandings of fuzzy logic, neural computing and genetic computing is facilitated by students considering the work and concepts against their own experiences and life understandings that they bring to the learning environment. This is supported by Palmer's [53] contention that there is hard mental as well as emotional work involved in coming to grips with unfamiliar ideas.

6 Problem-Based Learning and Knowledge-Based Intelligent Techniques

Problem-Based Learning (PBL) would be effective in an area where fuzzy logic or substituting the component of absolute certainty is being dealt with. Students would be able to offer ideas for discussion by the group even if these ideas are still quite tentative or have little hard evidence at their foundation. Problem-based learning has evolved as an effective innovation in fields of teaching and learning such as medical education, since it is an interactive, student-centered model which motivates students by promoting independent, integrated learning [55]. In this learning method, students focus upon key problems while they discuss the skills and understandings required in order to solve these problems. In a similar way, PBL could be used effectively in teaching about knowledge-based intelligent techniques, when students focus upon similar problems or case studies.

A particular advantage of PBL is that it is founded upon constructivism, since it encourages students to build their understandings on the basis of their previous knowledge and experiences [20]. The benefit of PBL for contemporary Australian universities appears to be an improvement in the Graduate Course Experience Questionnaire (GCEQ) scores. After aspects of PBL were piloted in the Bachelor of Applied Science in


Medical Radiations in 1993, a significant improvement in GCEQ results was observed [21].

The use of flexible learning and problem-based learning in knowledgebased intelligent techniques courses as well as other student -centered techniques would enable students to develop the appropriate thinking techniques to use in constructing their understandings of knowledgebased intelligent techniques with the STS approach as a foundation.

7 Conclusion

It is important to consider the human face of science and discussions of the nature of science and the nature of scientific knowledge in knowledge-based intelligent techniques courses. The characteristics of scientists and technologists, including their values, imagination, creativity and personal preferences influence their observations and discoveries. Thus, there has been a worldwide change in science education to include consideration of the issues of STS. Moreover, in modern societies the work of scientists and technologists receives considerable scrutiny through peer-review and public examination through the media. Science and technology have significant effects upon the lives of citizens in modern societies. It is necessary, therefore, to present students with a valid image of the nature of science and the nature of the interactions between science, technology and society to enable citizens to take an active role in social debates in relation to decisions involving STS issues.

In order to fulfill the purposes of science and technology education in a community that is increasingly dependent upon science and technology, science and technology courses need to be revised constantly. This revision should include consideration of the thinking processes that are needed to develop an understanding of the concepts of the subject. Teaching development should also be guided by reflection upon the nature of science and technology as well as of the methods used by scientists. This reflection would lead to the development of innovative curricula. Innovative teaching methods that could be used with these curricula include flexible learning and problem-based learning.


References

[1] Zadeh, L.A (1997), "The Roles of Fuzzy Logic and KnowledgeBased Intelligent Techniques in the Conception, Design and Deployment of Intelligent Systems," in Nwana, H.S. and Azarmi, N. (Eds.), Software Agents and Knowledge-based intelligent techniques, Springer-Verlag, Berlin, Germany, pp. 180-190.

[2] Jain, L.c. and Jain, R.K. (Eds.) (1998), Proceedings of the Second International Conference on Knowledge-Based Intelligent Engineering Systems, Vol. 2, IEEE Press, U.S.A.

[3] Jain, L.c. and Jain, R.K. (Eds.) (1998), Proceedings of the Second International Conference on Knowledge-Based Intelligent Engineering Systems, Vol. 1, IEEE Press, U.S.A.

[4] Jain, L.C. (Ed.) (1997), Proceedings of the First International Conference on Knowledge-Based Intelligent Engineering Systems, Vol. 2, IEEE Press, U.S.A.

[5] Jain, L.c. (Ed.) (1997), Proceedings of the First International Conference on Knowledge-Based Intelligent Engineering Systems, Vol. 1, IEEE Press, U.S.A.

[6] Jain, L.c. (Ed.) (1995), Electronic Technology Directions Towards 2000, ETD2000, IEEE Computer Society Press, U.S.A.

[7] Jain, L.c. and Allen, G.N. (1995), "Introduction to Artificial Neural Networks," Electronic Technology Directions Towards 2000, ETD2000, IEEE Computer Society Press, U.S.A., pp. 35-62.

[8] The University of South Australia (1996), Qualities of the University of South Australia Graduate, The University of South Australia, Adelaide, Australia.

[9] Conant, J.B. (1957), Harvard Case Histories in Experimental Science, Vol. 1, Harvard University Press, Cambridge, MA, U.S.A.


[10] Kuhn, T.S. (1963), "Scientific paradigms," in Barnes, B. (Ed.), Sociology of Science, Penguin Books, Rarmondworth, U.K., pp. 80-104.

[11] Lakatos, I. (1970), "Falsification and the methodology of scientific research programmes," in Lakatos, I. and Musgrave, A (Eds.), Criticism and the Growth of Knowledge, Cambridge University Press, London, u.K., pp. 91-196.

[12] Popper, K.R. and Eccles, J. (1977), The Self and Its Brain, Springer International, U.S.A

[13] Kuhn, T.S. (1970), The Structure of Scientific Revolutions, The University of Chicago Press, Chicago, U.S.A.

[14] Polanyi, M. (1969), Knowing and Being, Routledge and Kegan Paul, London, U.K.

[15] Feyerabend, P.K. (1970), "Consolations for the specialist," in Lakatos, I. and Musgrave, A (Eds.), Criticism and Growth of Knowledge, Cambridge University Press, London, U.K., pp. 197-230.

[16] Riggs, P.J. (1992), Whys and Ways of Science, Melbourne University Press, Melbourne, Australia.

[17] Chalmers, AF. (1982), What is this Thing Called Science: An Assessment of the Nature and Status of Science and its Methods, University of Queensland Press, St. Lucia, Australia.

[18] Connole, R., Smith, B. and Wiseman, R. (1993), Issues and Methods in Research, Distance Education Centre, The University of South Australia, Adelaide, Australia.

[19] Popper, K.R. (1983), Realism and the Aim of Science, Rowman and Littlefield, Totowa, New Jersey, U.S.A

[20] Camp, G. (1996), "Problem-based learning: A Paradigm shift or a passing fad?" Medical Education Online, 1:2, The University of Texas Medical Branch, Texas, U.S.A.


[21] University of South Australia (1996), School of Medical Radiations Course Description, Adelaide, Australia, Version 5, June.

[22] Longino, H. (1983), "Beyond "bad science": Skeptical reflections on the value-freedom of scientific inquiry," Science, Technology and Human Values, Vol. 8, pp. 7-17.

[23] Quine, W. and Ullian, J. (1970), The Web of Belief, Random House, New York, U.S.A

[24] Ziman, J. (1980), Teaching and Learning about Science and Society, Cambridge University Press, Cambridge, U.K.

[25] Holton, G. (1978), The Scientific Imagination: Case Studies, Cambridge University Press, Cambridge, U.K.

[26] Lowe, I. (1993), "Making science teaching exciting: Teaching complex global issues," 44th Conference of the National Australian Science Teachers' Association, Sydney, Australia.

[27] Solomon, J. (1992), "The classroom discussion of science-based social issues presented on television. Knowledge, attitudes and values," International Journal of Science Education, Vol. 14, pp. 431-444.

[28] Aikenhead, G.S. and Ryan, AG. (1992), "The development of a new instrument: Views on Science-Technology-Society (VQSTS)," Science Education, Vol. 76, pp. 477-491.

[29] Cutcliffe, S.H. (1990), "The STS curriculum: What have we learned in twenty years?" Science, Technology and Human Values, Vol. 15, pp. 360-372.

[30] Goldstein, M. and Goldstein, I.F. (1978), How we know - An Exploration of the Scientific Process, Plenum Press, New York, U.S.A

[31] Jain, L.c. and de Silva, C.W. (Eds.) (1998), Intelligent Adaptive Control: Industrial Applications, eRC Press, U.S.A


[32] Jain, L.c., Johnson, RP., Takefuji, Y. and Zadeh, L.A (Eds.) (1998), Computational Intelligence Techniques in Industry, CRC Press, U.S.A

[33] Jain, L.c. andVemuri, R. (Eds.) (1998), Industrial Applications of Neural Networks, CRC Press, U.S.A

[34] Jain, L.c. (Ed.) (1997), Soft Computing Techniques in KnowledgeBased Intelligent Engineering Systems, Springer-Verlag, Germany.

[35] Jain, L.c. and Jain, RK. (Eds.) (1997), Hybrid Intelligent Engineering Systems, World Scientific Publishing Co., Singapore.

[36] Narasimhan, V.L., and Jain, L.c. (Eds.) (1996), Proceedings of the Australian and New Zealand Conference on Intelligent Information Systems, IEEE Press, U.S.A

[37] Vonk, E., Jain, L.c., and Johnson, RP. (1997), Automatic Generation of Neural Networks Architecture Using Evolutionary Computing, World Scientific Publishing Co., Singapore.

[38] Van Rooij, A, Jain, L.c., and Johnson, RP. (1996), Neural Network Training Using Genetic Algorithms, World Scientific Publishing Co., Singapore.

[39] Sato, M., Sato, Y. and Jain, L.c. (1987), Fuzzy Clustering Models and Applications, Springer-Verlag, Germany.

[40] Jain, L.c., and Martin, N.M. (Eds.) (1998), Fusion of Neural Networks, Fuzzy Systems and Evolutionary Computing Techniques: Industrial Applications, CRC Press, U.S.A

[41] Teodorescu, R.N., Kandel, A, and Jain, L.C. (Eds.) (1998), Fuzzy and Neuro-Fuzzy Systems in Medicine, CRC Press, U.S.A

[42] Driver, R & Oldham, V. (1986), "A constructivist approach to curriculum development in science," Studies in Science Education, Vol. 13, pp. 105-122.


[43] Robottom, 1. (1992), "Images of science and science education," Australian Science Teachers Journal, Vol. 38, No.2, pp. 19-25.

[44] Driver, R. (1990), "Theory into practice il: A constructivist approach to curriculum development," in Fensham, P. (Ed.), Development and Dilemmas in Science Education, Falmer Press, London, U.K., pp. 133-149.

[45] Baird, 1.R. and White, R.T. (1982), "A case study of learning styles in biology," European Journal of Science Education, Vol. 4, pp. 325-337.

[46] Yager, R.E. (1993), "Make a difference with STS," The Science Teacher, Vol. 60, pp. 45-48.

[47] Nunan, T. (1994), Flexible Delivery - a discussion of the issues, Distance Education Centre, University of South Australia, Adelaide, Australia.

[48] Young, R.M. (1998), "A developmental model for selecting computer mediated communication approaches for tertiary education," in Cameron, 1.M.R. (Ed.), Online Teaching, Centre for Teaching and Learning in Diverse Educational Contexts, Northern Territory University, Australia, pp. 5-17.

[49] Pettit, A (1998), "Teaching first, technology second: the possibilities for computer mediated communication," in Cameron, 1.M.R. (Ed.), Online Teaching, Centre for Teaching and Learning in Diverse Educational Contexts, Northern Territory University, Australia, pp. 18-35.

[50] Shaw, G. (1998), "Using computer mediated communications in teaching tertiary teachers," in Cameron, 1.M.R. (Ed.), Online Teaching, Centre for Teaching and Learning in Diverse Educational Contexts, Northern Territory University, Australia, pp. 36-49.

[51] Cameron, 1.M.R. (Ed.), (1998), "Introduction," in Cameron, 1.M.R. (Ed.), Online Teaching, Centre for Teaching and Learning


in Diverse Educational Contexts, Northern Territory University, Australia, pp. 1-4.

[52] Moran, L. (1995), "Towards the year 2020 - Trends in flexible learning," Conference on Flexible Delivery of Training and Education, Sydney, Australia, July.

[53] Palmer, B. (1998), "The use of CMC for exploring educational issues," in Cameron, J.M.R. (Ed.), Online Teaching, Centre for Teaching and Learning in Diverse Educational Contexts, Northern Territory University, Australia, pp. 70-82.

[54] Christie, M.P. (1998), "Whose web? Cultural factors in the delivery of online courses: an Asia-Pacific case study," in Cameron, J.M.R. (Ed.), Online Teaching, Centre for Teaching and Learning in Diverse Educational Contexts, Northern Territory University, Australia, pp. 63-69.

[55] Feletti, G. (Ed.) (1991), "The challenge of problem-based learning," Kogon Page, London, U.K., pp. 177-185.

CHAPTER 2

TEACHING AND LEARNING THE AI MODELING

R.S.T. Lee and J.N.K. Liu Department of Computing

Hong Kong Polytechnic University Hung Hom, Hong Kong

[email protected], [email protected]

To learn new concepts and algorithms requires an analytical mind and intensive conceptual thinking. With the illustration of appropriate applications and teaching tools, it will assist and enhance the learning ability.

Since the discovery and emergence of Artificial Intelligence (AI) in the past decades, numerous AI concepts and algorithms have been developed to help solve problems such as foreign currency prediction in the finance sector, missile tracking techniques in the military forces, investigation of natural resources in the field of science and weather forecast in the field of meteorology.

In this chapter, we will present our study into some innovative teaching tools to help learning and understanding the concept of three important AI models, namely Neural Nets, Fuzzy Systems and Genetic Algorithms. We will examine the potential of these tools and illustrate with examples on weather forecasting problems involving the integration of the said models. The study will set out the main features and limitations of different tools and open room for the development of some better AI system. Experimental results demonstrating the feasibility of these tools in solving practical problems are given. We will focus on the means of predicting the 24-hour temperature and rainfall using historical meteorological data from multi-weather stations in the Hong Kong region.

32 R.S.T. Lee and J.N.K. Liu

1 Introduction Advance development in technology and computational power of computers has encouraged and urged scientists to carry out endless exploration and further studies into many areas for finding solutions in two distinct areas: (a) sophisticated analytical problems, such as the numerical simulation of global warming in earth science, astrophysical research on distant planets [24], [25] and (b) highly unstructured problems, such as prediction in the shares and stock market and fault diagnosis in machinery maintenance[14], [17]. It is generally accepted that the most powerful and sophisticated computer will never match the might of human being in particular in respect of matching human performance. The discovery and emergence of robot industries in the past decades had urged the scientists and engineers to carry out research for the development of "intelligence" systems and "simulated human capabilities" such as machine vision [2] for object recognition and artificial intelligence (AI) [20] in task scheduling in the computer industry.

Scientists have proposed various models and algorithms to solve the highly unstructured problems and to simulate human performance in the past fifty years. One of the most remarkable models proposed is Artificial Neural Network (ANN) [5], which simulates the architecture and processing mechanism of human nervous system. In contrast to traditional analytical model, ANN focuses on problem solving by means of "machine learning". By using "supervised" and "unsupervised" training, "knowledge" is generated by regulating the synaptic weights of the neurons within the ANN architecture.

Many studies have shown that ANNs have the capability to learn the underlying mechanics of the time series and other pattern recognition and prediction problems. However, it is often difficult to design good and optimized-performance ANNs because many of the basic principles governing information processing in ANNs are hard to formulate, interpret and the complex interactions among network units usually make engineering techniques such as "divide-and-conquer method" inapplicable. Moreover, complex combinations of performance criteria such as learning rate, generalization scheme and noise-level as network applications continue to grow in size and complexity and the human-

Teaching and Learning the AI Modeling 33

engineering approach will not work hence a more efficient automated solution is needed. On the other hand, Genetic Algorithms (GAs) [6], a biological metaphor that tries to emulate some of the processes observed in natural evolution is one of the most eminent techniques to solve the optimization and parameter selection problems. GAs express their ability by efficiently exploiting the historical information to speculate on new offspring with expected improved performance according to the Darwin's rule of evolution. GAs have been widely applied in ANN design in several ways such as neural network topology optimization, genetic training algorithms and control parameter optimization.

Another distinguished characteristic of human being is the capability to interpret "inexact" or "imprecise" concepts such as "fairly", "quite" and "a lot", and more importantly, to make judgments based on these "inexact" or "imprecise" information. To emulate this distinguished qualitative human characteristic in computers, Fuzzy Theory [9] is formulated to quantify the problems. In Fuzzy systems, "membership functions" are being used as a method for numeralizing the "inexact" or "imprecise" information that conventional computer systems find so difficult to cope or handle. Incorporated with ANNs, Fuzzy systems can be applied as a data preprocessing scheme for some "fuzzy" input nodes. "Fuzzy" systems can also be used as an effective classification scheme in the output nodes in ANN s as well.

Various application tools are produced to illustrate the capabilities of these newly-developed techniques. Since different application tools have their own special features, limitations and targets of different problems domains, it is not an easy task to choose an appropriate tool to use, especially for teaching and research purposes. In this chapter, we will examine several application tools for illustration : (1) NeuroForecaster from NIBS Pte Ltd.; (2) Professional II Plus from NeuralWare. Inc.; (3) NeuralSIM (formerly NeuralWorks Predict) from Aspen Technology; (4) NeuroSolutions from NeuroDimension Inc. and an integrated prototype from Hong Kong Polytechnic University.

The aim of the study is not to compare the strengths and weaknesses of different applications tools but rather to provide a useful indication and recommendation to their counterparts, colleagues and collaborators for selecting appropriate application tools for their own problem domains


and more vitally to demonstrate the usefulness of these application tools in teaching these new computer concepts.

To explain the main features of the application tools and to demonstrate the potential strength of integration of ANNs with GAs and Fuzzy systems in dealing with real world problems, we look into the problems of weather forecast in the Hong Kong region. During the study, 6-hourly meteorological data for predicting the temperature and rainfall in the next 24 hours such as wet bulb and dry bulb temperatures, rainfall (RF), mean sea-level pressure (MSLP), relative humidity (RH), wind directions and wind speeds were extracted from the records of 11 weather-stations in Hong Kong for the period from 1993 to 1998. GAs and Fuzzy systems were integrated to the neural network model selectively whose implementation could be supported by such application tools in many different ways. For teaching and learning development, the problem solving methodology will be detailed and a hybrid system using fuzzy neural network will be implemented [15]. Experimental results and comparison with that of traditional ANN systems will be discussed in later sections.

This chapter will be set out and presented in seven sections as follows:

Section 1 Introduction

Section 2 Neural Nets, Fuzzy Systems and Genetic Algorithms

Section 3 System Application and Development Tools

Section 4 Teaching and Learning the AI Fundamentals: Weather Forecasting Problems

Section 5 Fuzzy Neural System Modeling

Section 6 Experimental Results

Section 7 Conclusion


2 Neural Nets, Fuzzy Systems and Genetic Algorithms

In this section, we will present a brief background on ANNs, GAs and Fuzzy systems and will illustrate the contemporary approach for the integration of these three techniques.

2.1 Neural Nets

2.1.1 Background

In the early days, researchers of Artificial Intelligence (AI) had always aimed to model the function of human brains but their attempts were unsuccessful until the late 1940 when Warren McCulloch and Walter Pitts [21] proposed the first Neural Network model which was only improved in the late 1980 as a promising approach to deal with some classes of vital AI problems that have defied solution.

Different from the traditional computer systems, the main architecture of ANNs emulates the functionality of human nervous system. The human nervous system, as now known, consists of an extremely large number of (over 1011) nerve cells, or neurons, which operate in parallel to process various types of information. Tree-like networks of nerve fibre called dendrites are connected to the cell body or soma, where the cell nucleus is located. Extending from the cell body is a single long fibre called the axon, which eventually branches into strands and substrands, and are connected to another neurons through the contact points, known as synapses. The transmission of signals from one neuron to another at a synapses is a complex chemical process in which specific transmitter substances are released from the sending points of the junction. The effect is to adjust the electrical potential inside the body of the receiving cell. If the potential reaches a threshold, a pulse will be generated down the axon, as called "firing" (Figure 1).

2.1.2 ANN Architecture

As an analogy to the biological neuron, a schematic diagram (Figure 2) of neuron structure can be interpreted as a mathematical model in which the synapses are represented by "weights" which modulate the effect of the associated input "signals". The non-linear characteristics


exhibited by neurons are represented by a transfer function such as binary or bipolar sigmoid function. The neuron impulse (output signal) is then computed as the weighted sum of the input signals, transformed by the transfer function. The learning capability of an artificial neuron is achieved by adjusting the weights in accordance with pre-defined learning algorithm, usually by a small fraction L1Wj = aaXj where a is called the learning rate and 0' the momentum rate.

Input Signals

Figure 1. Biological neurons.

Input Weights

Xj~ Wj

X2 W2

~

Transfer Function

-y

! Output Signals

Figure 2. Schematic model of a neuron.

Typical ANNs often consist of intermediate layers known as "hidden layers" to facilitate the nonlinear computational capabilities of the network model. Classical ANNs such as Feed Forward Neural Network


(FFNN) (Figure 3) which allows signals to flow from the input units to the output units, in a forward direction. Examples are Kohonen SelfOrganizing Maps (SOM) and Learning Vector Quantization (LVQ) -neural networks based on competition, Adaptive Resonance Theory (ART) and Feed-forward Backpropagation Neural Net (FFBPN).

input nodes

hidden nodes

output nodes

Figure 3. Classical feed-forward neural network model.

ANNs can be regarded as multivariate nonlinear analytical tools, and are known to be superior at recognizing patterns from noisy, complex data, and estimating their nonlinear relationships. Many studies revealed that ANNs have the distinguished capability to learn the underlying mechanics of the time series problems ranging from prediction of stocks as well as foreign exchange rates in various financial markets to weather forecast in meteorology [10], [11], [12], [15], [18], [28].

2.2 Fuzzy Systems

2.2.1 Basic Principle

Different from traditional computer systems, fuzzy theory [3], [8] looks at the things in imprecise terms, in much the same way as our own brain takes in information. For example, to describe the degree of hotness of today's temperature, we will say it is "hot", "very hot", or "quite hot", instead of given a precise temperature reading. In contrast


to the classical set theory, fuzzy sets allow for the possibility of degrees of membership. That is, any of the value between 0 and 1 may be assigned. For example, given the fuzzy set of "weather is hot", we may describe a particular day as being 0.75 of the "member" of this fuzzy set "hot". This is hot, but is not the hottest day that is imaginable. The function which assigns this value is called the "membership function" associated with the fuzzy set "weather is hot".

1/)-II) I/) ::J I/)

- II) CIS c: >-0

0.64

.9- :x: 0.8

..s::-I/) 0 0.6 ... II) II) II) .c ... 0.4 Eel Q) II)

0.2 :i:8.

"Very Hot"

0

26 27 28 29 30 31 32 33 34

Temperature (OC)

Figure 4. Membership functions for fuzzy sets "Very Hot", "Hot", and "Quite Hot". For temperature 32°C, the corresponding membership values of the three membership functions are: "Very Hot" = 0.32, "Hot" = 0.64, "Quite Hot" = O.

A typical fuzzy system which describes today's weather may consist of three distinct fuzzy sets "Very Hot", "Hot", and "Quite Hot" (Figure 4), which is represented by a collection of piecewise trapezoidal functions. Fuzzy sets can be combined through fuzzy rules to define a more sophisticated action such as "If today is very hot and relative humidity is high, then I have to set my air-conditioner to high power".

2.2.2 Fuzzy Expert System

Fuzzy systems can handle imprecise knowledge and for that reason they are widely used in many practical commercial applications. Many Japanese cars now incorporate fuzzy systems for antilock braking, active suspension systems, automatic transmission, and engine emission controls. Fuzzy systems are easy to set up, and typically require less processing power than alternative approaches, and provide robust performance. The typical schematic diagram of a fuzzy expert


system, which incorporates with "Fuzzy Knowledge Base" to drive expert advice is shown in Figure 5.

Fuzzy Knowledge Base

Membership Function Expert Advice

1 ~ J~ ~

Temperature

Fuzzy Inference ....... Fuzzy Rules

Engine -- Rule 1:

J~ lJTemperature = Cool Air Pressure = Increase Rapidly

Then Fact Values Chance of Cold Surge Arrive within 24 hour

= Very High

Fuzzy Strategy

Logical Product Method Algebraic Product Method

Figure 5. Schematic diagram of a fuzzy expert system for Cold Surge Prediction.

2.2.3 Hybridization with Other Models

The major shortcoming of the fuzzy system is the lack of learning capability. The design of fuzzy sets and the assignment of all fuzzy relations are done by the system designers (or experts) without any way to acquire the membership functions and inference rule automatically. To address these shortcomings, the "hybridization" with other models, such as neural networks, is proposed by many researchers; for instance, in the "FuzzyNet model" proposed by Wong and Wang in 1991 [27], an expert system was established by the hybridization of neural network and fuzzy system in the following manner (Figure 6). The model consists of three main modules: (1) Membership Function Generator (MFG) - to generate the membership functions which can be either provided by domain experts, or can be automatically generated using some statistical methods based on historical data; (2) Fuzzy Information Processor (FIP) - to accept three types of information from the


database, fuzzy rules and initial weights indicating the "credibility" of the rules, historical data and current data; (3) BackPropagation Neural Network (BPN) - similar to the conventional back-propagation neural network except that the processing elements used are the neural gates generated by the FIP module. Other possible hybridization schema such as the extraction of fuzzy rules from a multi-layered neural network proposed by Enbutsu et. al [4]; learning fuzzy controller using genetic algorithm done by J anikow [7] in 1994 are typical examples.

MFG: Membership Function Generator

Membership Functions

FIP: Fuzzy Information Processor

If Clause Then Clause

Fuzzy rules ~O & initial wts. of rules ~O

Historical ~O data

~O

Current data ~O for desired targets ~O

BPN: BackPropagation Neural Net

Figure 6. FuzzyNet schematic diagram.

2.3 Genetic Algorithms

2.3.1 Basic Principle

Final Decision / Forecast

Evolution refers to the operation on encoding of biological entities (chromosomes) rather than the living beings themselves. Natural selection is based on "survival of the fittest" - chromosomes with high fitness value will reproduce more than those with low fitness values.

In Genetic Algorithms (GAs) [23], the basic entity is the chromosome, which is a sequence of values / states. The basic algorithm is resembled to the natural evolution, which involves the following operations (Figure 7):

1. Initialization of "Population"


2. Parent Selection process

3. Reproduction process involving crossover & mutation operations

4. Fitness value evaluation

5. Execute iteratively on the "new Population" until satisfactory performance is attained

In nature, an offspring is normally more fit if its ancestors are better. According to this theory, chromosomes will grow better as the generation goes on and on.

Initialize Population

Decode chromosome string

Selected parents for re roduction

Figure 7. A typical flow diagram of a GA system.

2.3.2 Population Initialization

Population is a collection of chromosomes with the representation of a parameter set {Xl, X2, X3, .. , xm}.This parameter set is to be encoded as a finite length string over an alphabet of finite length. Usually it is to be coded as binary value of O's and 1 'so To initialize the population, usually random number generator is applied. For chromosome of length m, the possible number of different chromosome strings is 2m.


2.3.3 Fitness Evaluation

An evaluation function is applied to the population to compute the "fitness" of a chromosome. It is vital that the whole GA remains in one form as it is the only selection criteria of the chromosome performance for the whole population and increases the possibility for reproduction. To justify the stopping criteria of the GA, it usually depends on whether the best chromosome in the population has attained a sufficient level or the "evolution" (i.e. iteration of reproduction) has exceeded the generation limit (say max 1000 generations).

2.3.4 Parent Selection Scheme

For parent selection, a "Roulette-wheel Parent Selection" scheme was used. Those chromosomes being selected for the possibility of reproduction are directly proportional to its fitness value, and they conform to the basic feature of nature selection that "Fitter organisms have higher rates of survival, hence reproduction."

2.3.5 Crossover and Mutation

In genetic algorithms, there are two main operators for reproduction, namely "Crossover" and "Mutation". In "Crossover", a pair of parent chromosomes are selected from the population. In I-point "Crossover", a random location is selected from the chromosome string, chromosome elements beyond this "Crossover" point are exchanged to form a pair of new offspring according to the crossover rate. Similarly, for two-point and uniform-crossover, multiple points are selected for "Crossover" operations.

For "Mutation", a single chromosome is selected from the population which will be "scanned" throughout the whole list. A particular element will be changed according to the "Mutation" rate, which is normally much lower than the "Crossover" rate.

The main purpose of "Crossover" is to exchange information between randomly selected parent chromosomes with the aim of not losing any improvement of information. While the main objective of "Mutation" is to introduce some genetic diversities into the population, it will remain in a slow rate in order not to disrupt the genetic characteristic of "good" genes.


2.3.6 Implementation of GAs

Based on different parent selection criteria, reproduction scheme, crossover and mutation methods, there are numerous versions of schema for GA implementation. The fundamental one is the reproduction to replace all the parent population, using I-point crossover and bit mutation. For parent selection, "Roulette-wheel Parent Selection" scheme based on parent fitness value is applied.

In "Elitism" scheme, parents of the highest fitness value will be retained in the next generation in order to "guarantee" the "performance" of the population at a certain standard. For crossover operation, a 2-point crossover in the other extreme to uniform crossover can be applied for other GA schema.

For GA parameters, besides a fixed crossover and mutation rates throughout the whole "evolution" process, a changing crossover and mutation rates scheme can also be used. The ratio of mutation rate will normally reset to a higher value when the number of generations increases to a higher value, such as 500 iterations. The main reason for this is to "induce" a higher diversity of the chromosomes when the whole population is "evolved" to a more "mature" stage whereas higher mutation rate can bring more "freshness" to the population.

2.3.7 Hybridization of GAs with Neural Networks

Genetic Algorithms have been widely used with neural networks in two specific areas: 1) Topology optimization; 2) Genetic training algorithms. In topology optimization, GAs are used to select the optimal topology for the neural network which in tum is trained using some fixed training scheme such as back-propagation. In genetic training algorithms, the learning of a neural network is formulated as a weight optimization problem, usually using the inverse mean square error as the fitness evaluation scheme. Instead of the hybridization of GA with classical back-propagation neural net, Liu and Lee [16] have proposed a hybrid system for offline handwritten Chinese character recognition. In their proposed model, GA is hybridized with the revised Dynamic Link Architecture (DLA) to enhance the accuracy and robustness of the Chinese character recognition system. According to the schematic diagram (Figure 8), the main function of GA is to optimize the weights of the "dynamic links" within the hybrid DLA


model. Experimental results revealed that GA hybridization provides an overall 20% improvement.

Encoding Process

Stored Patterns

Encoding Stored Patterns

Recognition Process

Test Pattern

Encode Temporary

Link

GALinks Initialization

Engine

DLA Matching ~ Recognition

Result Engine

Figure 8. Schematic diagram of the hybrid DLA model.

3 System Application and Development Tools

3.1 Neuro-Forecaster

3.1.1 Introduction

Neuro-Forecaster is a PC windows-based, integrated neural network application tool which aims to solve three main types of problems: (1) Time-Series Forecasting - e.g. stock and currency market forecasts, GDP forecast; (2) Classification problems - e.g. stock selection, bond rating, credit assignment, property valuation; (3) Indicator Analysis -e.g. identification of useful input indicators. With built-in AI tools such as neural networks, fuzzy computing and non-linear dynamics, NeuroForecaster allows user to select from 12 different types of transfer functions within the feed-forward, hierarchical neural network model such as standard Sigmoid Function, Radial Basis Function (RBF), FastProp Hyperbolic Tangent and Neuro Fuzzy Function.


3.1.2 GENETICA Net Builder

Besides the conventional feed-forward backpropagation network model, Neuro-Forecaster provides a GA modeling tool known as "GENETICA Net Builder". Making use of the optimization technique of GAs, Net Builder GENETICA generates many possible networks which are evaluated, purged and recombined to produce the optimal network structure.

3.1.3 Neuro-Fuzzy Network

Neuro-Forecaster also provides "hybrid" networks such as NeuroFuzzy Network with the integration of neural network model and fuzzy system. In the Neuro-fuzzy network, the neural network module functions as a quantifier, its output nodes indicate the state of the current set of indicators, in the form of a set of fuzzy membership functions. The conventional backpropagation learning algorithm can be applied to the leaning of the weights. The fuzzy module serves as a defuzzifier for generating the output target (e.g. rainfall forecast results) and as a fuzzifier for learning of the error. A schematic diagram of Neuro-fuzzy model for weather forecast (e.g. rainfall forecast) is shown in Figure 9.

Neural Network Module

RH

TT

OT

ws

WO

PR

RF

Input Variables

/\

Fuzzy Module

Membership functions

Figure 9. Schematic diagram for Neuro-fuzzy network on rainfall (RF) forecast using meteorological data: Relative humidity (RH), dry-bulb temperature (TT), dew-point temperature (DT), wind direction (WD), wind speed (WS), mean sea level pressure (PR) and rainfall (RF).


3.1.4 Network Training and Analytical Tools

In network "training" and "testing" phases, Neuro-Forecaster provides intensive visual interface for user to monitor the progress of network operations - e.g. monitor the real-time learning and testing errors. User can also fine-tune the learning rate, tolerance level and adjust noise level during the course of training, or set to "Auto" mode for the system to regulate the parameters automatically (Figure 10).

The application provides three types of analysis tools: (1) Re-scaled Range Analysis - In time series problem, the "Hurst exponent" is used to estimate the predictability and the fractal dimension of the time series, to unveil any hidden cycle and the cycle length; (2) Correlation Analysis - Traditional analytical method to compute the correlation of the target and an indicator (e.g. predicted output); (3) Accumulated Error Analysis - Extracts the error accumulated at the input nodes of the neural network during and at the end of training. The accumulated error index (AEI) also indicates the relative significance of the indicator associated with the input node.

Figure 10. "Training" phase of Neuro-Forecaster for multi-stations temperature prediction.

3.1.5 Windowing Feature

In time series problems such as weather forecasting, raw data are organized as consecutive records with time relation (Figure 11), a window size (number of consecutive rows of data) greater than 1 will always yield better results, especially for problems which exhibit long-


term memory or periodic variation (such as daily vanatIOn of temperature in weather prediction). A larger window size is a good way to capture temporal information contained in the time series. However, if the input variables already contain such temporal information. as in the case of some technical variables such as stochastics and moving averages, one could reduce the window size to 2 or 1 to save memory space and computational load.

DATE DATA

93010106 93010112 93010118 93010124 93010206 93010212 93010218 93010224 93010306 93010312

TT

I

6 11 11 9 6

10 11 11 10

9

FF

5 45 30 5 5

40 25 25 50 65

DB

156 183 175 168 160 201 191 178 172 194

DP

117 101 125 125 130 133 150 151 148 157

RH

78 59 72 76 82 65 77 84 86 79

MSLP

10243 10248 10219 10231 10221 10224 10195 10208 10209 10217

RF

Figure 11. Windowing features of forecast horizon in Neuro-Forecaster.

0 0 0 0 0 o o o o o

NeuroForecaster allows user to set the window size at network creation time, there is no need for user to adjust or rearrange the input data manually according to different window sizes as required by other application tools. In the above example, a snapshot of weather data from 06:00HKT 01 Jan 1993 to 12:00HKT 03 Jan 1993 is shown. To capture the daily pattern of weather changes, a window size of 4 is chosen, and the 24-hour temperature (TT) is selected as the predicated output.

3.1.6 Strengths and Weakness

Neuro-Forecaster provides an excellent graphical user interface (GUI) which significantly reduces the time and difficulty for user training. The user-friendly visual interface for monitoring network training progress is particularly useful for teaching purpose in the illustration of neural network operation. Network optimization using GENETICA Net Builder can also be used as a typical demonstration of GAs functionality. The major limitation of the application is threefold: (1) Only one output node is allowed for the network, and significantly limits the scope of problem domain; (2) The application does not support any data processing scheme such as node selection scheme,


normalization tools; (3) System Integration Problem - The application does not provide any efficient way to "export" the user established model to program code (e.g. C program codes) in order to integrate with other programs and systems, which in turn hinder the possibility of further development.

3.2 Professional II Plus

3.2.1 Introduction

Professional II Plus is a software package developed by NeuralWare, Inc. in USA. Different from conventional neural nets packages which can only be operated in a single OS platform, Professional II Plus provides a multi-platform operating environments in DOS, Windows, Unix and Mac OS. Besides, it provides a variety of network models such as Backpropagation nets, Learning Vector Quantization (LVQ), Radial Basis Function Nets, Adaptive Resonance Theory (ART) and Self Organizing Map (SOM).

3.2.2 Network Architecture and Training

In the creation of a network model, the package provides a list of parameters and functions for user to select (Figure 12), e.g. learning rule, transfer function, data pre-processing schema such as MinMax Table, momentum rate and others. One of the most impressive features of the application is the visual display of the network architecture. In a standard backpropagation model, the application will show the "physical" structure of the network, and user can select any component within the network and fine-tune the parameters; add, remove, or deactivate any network nodes and redefine the whole architecture.

3.2.3 System Monitoring - Instrument

Another striking feature of Professional II Plus is the network monitoring facility. The software icon "instruments", is a visual object for user to create, modify, clone, control or delete display charts such as RMS error chart, network weights histogram, confusion matrices and classification rate diagrams. A snapshot of the above instruments, together with the network diagram for Rainfall (RF) and Temperature (TT) prediction using back-propagation neural network based on a single weather station is shown in Figure 13.

Teaching and Learning the AI Modeling

Figure 12. Parameters and functions selection tables.

Figure 13. A snapshot of neural net model for temperature and rainfall prediction using Professional II Plus.

49


3.2.4 Strengths and Weaknesses

The capability of multi-platform operations, includes the wide variety of neural networks, intensive visual monitoring schema and interactive user interface, and all vital factors for Professional II Plus to be used as an effective teaching and illustration tool to convey neural network concepts and to compare the functionality of different neural network models. For system integration, Professional II Plus provides a facility called "User 10 Facility", an interface which allows user to write "C" programs that interact with software. User can also base on this interface to control over the data being presented to the network, and the results being returned from the network as well. However, the software provides a very limited and primitive support for data preprocessing, node selection, and network optimization schema.

3.3 NeuralSIM

3.3.1 Introduction

NeuralSIM, formerly known as NeuralWorks Predict, is an integrated neural nets software package developed by Aspen Technology, Inc. in USA. Different from most of other packages which are "standalone" applications, NeuralSIM is fully integrated into Microsoft Excel in the sense that the whole application is being operated within the Windows Excel environment. In other words, once the software is being installed, all the operations for NeuralSIM will become part of Excel environment. For example, after a network model is constructed, trained and tested, user can invoke an Excel formula "predictO" to calculate the predicted output of any "cells" selected, just like any conventional Excel formulae.

3.3.2 Data Analysis Scheme

One of the most impressive functions of NeuralSIM is the data preprocessing scheme. For each input element (node), NeuralSIM will automatically apply a variety of transformation schema and choose the best out of them. The collection of transformations includes: (1) Continuous Transformation - e.g. linear, natural logarithm (log), hyperbolic tangent function (tanh); (2) Logical Transformation - e.g. logical and reverse logical transforms; (3) Integer / String Enumerated Transform; (4) Quintile Transform - consists of five piece-wise linear


transformations which map the input data into the target range. Figure 14 shows a snapshot of the "Data Analysis and Transformation" table denoting a list of preprocessed input elements such as wind speed and direction, humidity, wet and dry bulb temperatures, etc.

Figure 14. Data Analysis and Transformation Table.

3.3.2.1 Fuzzy Transformations

Another distinct data transformation scheme provided by NeuralSIM is fuzzy transformations. In the tool, there are four types of fuzzy transformations:

fzlft - fuzzy left

fzrgt - fuzzy right

fzraw - fuzzy center on raw data

fzval - fuzzy center on last continuous transform

Schematic diagrams of the different fuzzy transformations are shown in Figure 15.

52 R.S.T. Lee and J.N.K. Uu

Membership Value

Fuzzy Left

Fuzzy Right

Fuzzy Center

Temperature

Figure 15. Fuzzy transformations for data analysis.

3.3.3 Input Variable Selection Scheme - Genetic Algorithms

Besides the various transformations, Genetic Algorithms (GAs) are being applied to the input variable selection phase. In this scheme, the algorithm starts off with a small set of input variables, and successful groups of variables are maintained in the population and are used by the algorithm to select a larger set of variables if necessary. A snapshot of the input variable selection scheme using GA is shown in Figure 16. In the figure, "Set" refers to the index of an individual in the current population, "Fitness refers to the fitness of this current individual, "Size" refers to the number of variables in the current or best variable set. "Patience" is a mechanism which is used to control the convergence of GA. Each time if the population's average fitness does not improve by more than a certain tolerance, the patience factor is increased by 1. When the patience exceeds a certain number (4 by default), the evolution process will stop, hence the optimal set of input variables are resulted.


Figure 16. A snapshot of GA input variable selection scheme in Neural SIM.

3.3.4 Other Facilities

Besides the striking data pre-processing facility, NeuralSIM provides a complete workflow for developing neural net model which includes: (1) Data analysis and node selection scheme; (2) Network training; (3) Network testing; and (4) System validation. For an inexperienced user, NeuralSIM provides function called "Wizard" - a "step-by-step" screen layout as guidance for user to build a complete network model.

NeuralSIM also allows experienced users to automate the whole neural building and training process in "batch" mode and "expert" users to fine-tune their models from a collection of parameters provided.


Perhaps the most impressive functions for NeuralSIM is its full integration with Microsoft Excel, which will not only enhance the usability but also effectively reduce the time for user training. Comprehensive data pro-processing scheme is also an encouraging factor. For system integration, NeuralSIM also provides a facility called "FlashCode" which converts the specific model into C codes, Visual Basic code, or even Fortran coding which can be compiled and linked with other system modules. The major limitation of this tool is that the model it provides is only restricted to feed-forward backpropagation network model, while other application tools such as Professional II Plus provide a collection of different network models for user to choose.


3.4 NeuroSolutions

3.4.1 Introduction

NeuralSolutions is a software package developed by NeuroDimension, Inc. in USA. Unlike other application tools, NeuralSolutions provides a Windows-based, object-oriented simulation environment for neural network experiments. In other words, all the network components, such as input/output files, network nodes, and even network monitoring devices such as matrix viewer and barcharters are all system objects, and users are free to select and arrange these objects within their network models. Based on the needs of different users, the software is packaged with five different levels: 1) Educator level; 2) Users level; 3) Consultants level; 4) Professional level; and 5) Developers level. The highest level (i.e. "Developer" level) supports a wide range of network models ranging from simple Multilayer Perceptrons model (MLP), Generalized Feedforward Networks, to complex models such as Jordan-Elman Recurrent Networks, Time Lag Recurrent Networks (TLRN). Besides, the application can generate C++ codes for system integration, or convert into dynamic link libraries (DLLs) for future development.

3.4.2 A Snapshot of the Application Interface

A snapshot of the application interface is shown in Figure 17 for multistations weather prediction. The network model is illustrated as an interconnected diagram, which links up different object components. The most impressive function is that users can "insert" any probing devices (e.g. Barcharter) into any part of the network model so that users can keep a flexible monitoring of the network operation. Besides, the users can also define their own functions (e.g. for network training or pre-processing) and "insert" these defined functions as separate objects into the network.


Similar to NeuralSIM, NeuroSolutions provides "Wizards" utility for inexperienced users to build their own network models by going through a series of panels containing the configuration parameters for the model. After completing all the panels, the utility will construct the network according to the user specifications. In addition to the availability of the wide range of network models, the flexibility of the


software itself for further integration is another promising factor for teaching and developing tools. Again, the limitation of data preprocessing scheme is perhaps its major weakness.

Figure 17. A snapshot of the application interface for weather prediction.

4 Teaching and Learning the AI Fundamentals: Weather Forecasting Problems

4.1 Background

Weather forecasting has been one of the most challenging problems around the world for more than half a century. Not only because of its practical value in meteorology, but it is also a typically "unbiased" time-series forecasting problem in scientific researches. Effective tools which can solve this forecasting problem can also be applied to other areas such as "stock index forecast" in financial market or "fault detection" in machine maintenance. Nowadays, meteorologists and weather forecasters base their weather predictions mainly on numerical models [22]. This classical approach attempts to model the fluid and thermal dynamic systems for grid-point time series prediction based on boundary meteorological data. The simulation often requires intensive computations involving complex differential equations and computa-


tional algorithms. Besides, the accuracy is bound by certain "inherent" constraints such as the adoption of incomplete boundary conditions, model assumptions and numerical instabilities [13].

4.2 ANN for Weather Prediction

Since the emergence of Artificial Neural Networks (ANNs), extensive researches have been conducted on time-series forecasting. The classical application of ANN in weather forecasting was found in the work of Widrow and Smith [26] which applied Adaline to predict the occurrence of the next day's rainfall on the basis of fluctuations in the barometric pressure over the two days preceding their calculation. Recent researches by Chung and Kumar [1] and Li and Liu [12] using Backpropagation Network (BPN) and NaIve Bayesian Network (NBN) for rainfall prediction have achieved an average accuracy rate of 65%. Among the many different meteorological parameters, rainfall is the most difficult one to predict rainfall. As explained by Li and Liu [12], the low accuracy of rainfall prediction is mainly due to the problem of insufficient data, and so far most of the weather prediction schemes using ANN models were based on the meteorological data from one single weather station, while human experts (weather forecasters) using the conventional approach would correlate extra information from surrounding areas in support of rainfall prediction.

4.3 Data Collection

"Good quality" of the training data sets contributes to the accuracy of the prediction. Observation data collected from Hong Kong Observatory between 1 Jan. 1993 to 31 Dec. 1997 via 11 weather stations (Figure 18) constitutes of the following elements taken every 6 hours (0600H, I200H, 1800H and 2400H):

• dry bulb temperature (TT); • dew point temperature (DP); • relative humidity (RR); • mean sea-level pressure (MSLP); • hourly rainfall (RF); • 60-min prevailing wind direction (DD) and; • mean wind speed (FF).


Figure 18. Location map of the 11 Automatic Weather Stations (A WS) divided into five different regions.

57

Due to severe loss of data from some weather stations, the 11 stations were grouped to cover five regions (Rl, R2, R3, R4 & R5) according to the distribution of weather records that can be collected in each region.

The process of collecting data also involves the verification of it. A common problem is that data formats on little used fields may change over time without proper re-formatting of the database, or may even have different formats in different programs. It is easy to check for proper formats, and it can save substantial processing time and effort.

4.4 Data Preprocessing

It is also important to check for the missing or unavailable data in the input data set. However, due to a vast amount of data that was missing from some stations (see Figure 19), the weather data of stations Waglan Island, Wong Chuk Hang, Tai Po Kau, Peng Chau and Sha Lo Wan were discarded. In order to obtain a better performance of the network, we approximated it using a linear interpolation function which was constructed with the nearby values of the same element within the region.

58

100 C) 90 .~ 80 .!!! 70 ::0 60

f ~~ ~ 20 - 10

o

R.S.T. Lee and J.N.K. Liu

CCH EPC HKO HKS JKB LFS SHA SLW TKL TPO WGL

Weather Station

Figure 19. Distribution chart for missing data.

4.5 Analyzing and Transforming Data

Converting data into a form suitable for building effective models is an iterative process that interacts with the model development process. There are several methods for handling enumerated data. Three methods shown below are: continuous encoding, binary encoding and one-of-N code. A linear or continuous encoding uses one model input, and simply scales the raw data into target range. Binary encoding recognizes that the linear encoding has no meaning, and maps the various enumerated values into an arbitrary binary code .~sing two network inputs. The one-of N code assigns a separate model input to each enumeration. This requires as many inputs as categories in the enumerated field.

The performance of a neural or statistical model is often improved by transforming the continuous numeric inputs. The primary purpose of these transformations is to modify the distribution of the input or explanatory variables so that they better match the distribution of the dependent variables. By testing a variety of transformations using different transformation functions, the transform which produces the most similar distribution to the output variable is the one to be selected.


4.6 Variable Selection Scheme Using Genetic Algorithms

59

Picking the right input variables is critical to effective model development. A good subset of variables substantially improves the performance of a model. The technique, genetic algorithm, is used to search for good sets of input variables: shaping a population of individuals through the survival of its most fit members.

Firstly, the individual potential solutions of the problem domain are encoded into representations that support the necessary variation and selection operations. In the second stage, mating and mutation algorithms, analogous to the sexual activity of biological life forms, produce a new generation of individuals that recombine features of their parents. Finally, a fitness function judges which individuals are the "best" life forms, that is, most appropriate for the eventual solution of the problem. These individuals are favored in the survival and reproduction, shaping the next generation of potential solutions. Eventually, a generation of individuals will be interpreted back to the original problem domain as a solution to the problem (Figure 20).

No

Figure 20. Schematic diagram for the input variable selection using GA.


5 Fuzzy Neural System Modeling

5.1 Introduction

In the course of solving the real world problems, we apply the aforementioned application tools as well as develop a "hybrid" weather forecasting tool that integrates fuzzy system with neural network. Based on the theories described in section 2, the fuzzification on certain weather elements (e.g. rainfall) is being applied into the Feedforward Backpropagation neural network model.

5.2 Neural Network Model

Feedforward Backpropagation (FFBP) model with momentum was adopted for neural network training and testing. The backpropagation learning phase for a pattern consists of a forward phase followed by a backward phase. The main steps are as follows:

1. Initialize the weights to small random values.

2. Select a training vector pair (input and the corresponding output) from the training set and present the input vector to the inputs of the network.

3. Calculate the actual outputs - this is the forward phase.

4. According to the difference between actual and desired outputs (error), adjust the weights to reduce the difference - this is the backward phase.

S. Repeat from step 2 for all training vectors.

6. Repeat from step 2 until the error falls within the threshold value.

In the model, when there are training data which are very different from the majority of the data, momentum modification of gradient descent is better used to avoid oscillation. In order to use momentum, previous training patterns must be saved. Momentum actually allows the net to make reasonably large weight adjustments as long as the corrections are in the same general direction for several patterns, while using a smaller learning rate to prevent a large response to the error from anyone training. Schematic diagram using FFBP in our weather prediction model is shown in Figure 21.


Output Set

Dry-bulb Dew-point temperature temperature

EJ I W;"d,~d I

Outp ut Layer

Relative humidity

Mean-sea level pressure

Figure 21. FFBP model for weather prediction.

5.3 Fuzzy System

61

Fuzzy logic was proposed by L.A. Zadeh to provide an appropriate technique for describing the behavior of the systems that are too imprecisely to be amenable to formal mathematical analysis. Unlike traditional logic types, fuzzy logic aims to model the imprecise modes of human reasoning and decision making [8]. Such reasoning was adopted in the system of this study. Under fuzzy logic environment, the measurement of weather phenomena can be described in linguistic terms which takes values from a natural language. For the prediction of accumulated rainfall, in experiments, these weather inputs were first fuzzified by fuzzification function to different fuzzy sets before the neural network training. The linguistic terms can be assigned as 'Rain' or 'No Rain' (two fuzzy sets). Fuzzy sets allow for the possibility of degrees of membership. That is, any of the values between 0 to 1 may be assigned. The fuzzy sets can be described by membership function. The function associated with fuzzy set. Figure 22 shows the membership function of two fuzzy sets.

62

'No Rain'

0.5 I----If-----~

o 0.02 0.05 0.075 0.1

'Rain'

21

R.S.T. Lee and J.N.K. Liu

Rainfall

(mm)

Figure 22. Fuzzy membership functions for "No Rain" and "Rain".

In Figure 22, there are two fuzzy sets called 'Rain' and 'No Rain'. For example, if the rainfall raw data is 21mm, it can be described as 'Rain', However, if there is O.03mm rainfall, it will be classified as 'No Rain'. In the case when the rainfall is 0.075mm, the value actually is the member of two fuzzy sets according to the fuzzy membership function. The result is that it is a member of 'Rain' fuzzy set with 0.5 degree and also it is a member of 'No Rain' fuzzy set with 0.5 degree.

Membership function often associates with the linguistic variables. Through this, fuzzy system can interface with the outside world. The domain of a membership function is the set of possible values within some given variables.

In the experiments, after the fuzzification, the fuzzified observations are then taken as target output for neural networks in training and testing. In neural networks, the predicted output could be generated after sufficient training. The predicted output and target output would then be defuzzified to determine the crisp result and evaluate its accuracy according to some classification methods (Figure 23).

Nonnalized rainfall value from raw data

Fuzzifcation with two fuzzy sets

x No Rain Rain

Figure 23. Fuzzification scheme on rainfall element.

Two fuzzified output range from 0 to 1


5.4 Fuzzy Neural Network - System Architecture

Fuzzy neural system shares some characteristics with the fuzzy and neural systems since they are similar in some ways. First, the rationale behind the fuzzy inference and function approximation in neural networks is the same, and that is to produce an interpolated output for related situations. Secondly, both approaches build nonlinear models based on bounded continuous variables A schematic diagram for the fuzzy neural network model is shown in Figure 24.

Rainfall to be fuzzified Fuzzification

of two fuzzy sets (Rain, No Rain)

Other weather data in multi-stations such as temperature, wind speed, etc., act as input variables of neural network

Two predicted rainfall defuzzified output would then compare defuzzified target output in classification method

Neural Network

Two predicted rainfall fuzzified output

Figure 24. Fuzzy neural network model for weather prediction.

5.5 System Implementation

The fuzzy neural system was implemented by a c++ builder development tool running under Microsoft Windows on a personal computer. There are two major functions in the system: neural computation and fuzzification (Figure 25).

In the experiments, major elements that were considered as input variables to neural network included: dry bulb temperature, dew point temperature, wind speed, humidity, amount of rainfall and mean sealevel pressure. Four major sets of data were created and a list of input


weather parameters is shown in Table 1. Experimental results will be described in the following sections.

Evaluate the accuracy by correlation and absolute

percentage error

Fuzzification of particular input node(s)

or target output(s)

Fuzzified training cases / Fuzzified test cases

Classification of training and testing cases

Evaluate the accuracy by classification rate

Figure 25. Schematic diagram for system implementation of a fuzzy neural network.

SetA Five-year data collected at five regions (RI, R2, R3, R4 and R5, as shown in Figure 18) every six hours (0600H, I200H, I800H and 2400H) were used as input variables to predict the weather phenomena inR3.

SetB Five-year data taken at R3 every six hours only were used as input variables to forecast the weather phenomena in R3.


SetC Five-year data in Set A was fuzzified before feeding into the neural network for training and testing.

SetD Five-year data in Set B was fuzzified before feeding into the neural network for training and testing.

Table 1. List of input weather parameters.

Input Description of the variables Region Remarks Nodes

1 60-Minute Mean Wind Speed R1 in units of 0.1 rn/s 2 Dry-Bulb Temperature R1 in units of 0.1 degree C 3 Mean Sea-Level Pressure R1 in units of 0.1 hPa 4 Amount of Rainfall R1 in units of 0.1 mm 5 60-Minute Mean Wind Speed R2 in units of 0.1 rn/s 6 Dry-Bulb Temperature R2 in units of 0.1 degree C 7 Dew-Point Temperature R2 in units of 0.1 degree C 8 Relative Humidity R2 in % 9 Mean Sea-Level Pressure R2 in units of 0.1 hPa 10 60-Minute Mean Wind Speed R3 in units of 0.1 rn/s 11 Dry-Bulb Temperature R3 in units of 0.1 degree C 12 Dew-Point Temperature R3 in units of 0.1 degree C 13 Relative Humidity R3 in % 14 Mean Sea-Level Pressure R3 in units of 0.1 hPa 15 Amount of Rainfall R3 in units of 0.1 mm 16 60-Minute Mean Wind Speed R4 in units of 0.1 rn/s 17 Dry-Bulb Temperature R4 in units of 0.1 degree C 18 Dew-Point Temperature R4 in units of 0.1 degree C 19 Relative Humidity R4 in % 20 60-Minute Mean Wind Speed R5 in units of 0.1 rn/s 21 Dry-Bulb Temperature R5 in units of 0.1 degree C 22 Dew-Point Temperature R5 in units of 0.1 degree C 23 Relative Humidity R5 in % 24 Amount of Rainfall R5 In units of 0.1 mm


5.5.1 Fuzzification Scheme

The system allows users to fuzzify any input/output node(s) into two or more fuzzy sets. It was implemented to provide users the means to define the points of membership functions. This means that users can change the range of particular fuzzy set and create a new training patterns file for training in the neural network model. For example, if the output node of rainfall is fuzzified into four fuzzy sets, we can define O-O.Smm to be the range of first fuzzy set, O.Ol-Smm as the second fuzzy set, 1-20mm as the third fuzzy set, lOmm and above as the fourth fuzzy set. A snapshot for such setting· in the model is shown in Figure 26.

Figure 26. User interface for fuzzification parameter setting.


5.5.2 Network Parameter Setting

Before starting the network operation, different network training and testing parameters such as learning rate, momentum, tolerance level, maximum epochs are needed to be specified. The screen layout for parameter setting is shown in Figure 27.

Figure 27. Parameter setting for network training/testing.

The system was implemented to provide three different methods for user to initialize the network weights.

1. Random Initialization - A common procedure to initialize weights (also biases) to random values between -0.5 to 0.5.

2. Nguyen-Widrow Initialization - Modifications of common random weight initialization so as to improve the learning speed. The approach is based on a geometrical analysis of the response of hidden neurons to a single input. The analysis is extended to the case of several inputs by using Fourier transforms. Weights from the hidden units to the output units are initialized to random values between -0.5 to 0.5.

3. Weights are allowed to input into the system from weight file provided by user. However, correct format should be used by weight file.


5.5.3 Network Training

The system was implemented to perform the network training. Its performance was evaluated based on the root mean square error (RMSE). Figure 28 shows a snapshot of the network training process.

iteration 495 : Total root mean squared error is: 0.03430219 iteration 496 : Total root mean squared error is: 0.03430103 iteration 497 : Total root mean squared error is: 0.0342999 iteration 498 : Total root mean squared error is: 0.03429876 iteration 499 : Total root mean squared error is: 0.03429765 iteration 500 : Total root mean squared error is: 0.03429652

number of iterations is: 500

Input Training Patterns File Name: e4train1.txt Output Report File Name: out.txt File stored Root Mean Squared Error Records: sse.txt File stored Target Results and Computational Results for Analysis: final.txt

IlO:T1"N\'>I' DESCRIPTION: (BACKPROPAGATIOII)

4cliuafinn Function Used: Binary Sigmoid Function Weight Setting Method: Nguyen-W'idrow normalization File stored Updated Weights: www

Figure 28. Network Training Process.

5.5.4 Network Testing and Evaluation Scheme

In the network testing process, the system provides two system testing schemes. User can test the training data or provide external test file different from the training data for testing.

For performance analysis, the system provides two types of analysis tools, namely "correlation analysis" and "classification rate".

Correlation Analysis This analysis measures the relationship between two data sets that are scaled to be independent of the unit of measurement. The population


correlation (Px,y) calculation returns the covariance (cov(X,Y» of two data sets (X and Y) divided by the product of their standard deviations.

PX,y _ cov(X,Y)

ax ·ay

(1)

Classification Rate This evaluation scheme is used for analyzing the system performance after the processes of fuzzification and neural computation. For example, in the system, if one rainfall target output is fuzzified to two fuzzy sets ('Rain' and 'No Rain'). After fuzzification, the number of fuzzified output should be two. The fuzzified output would then be put into neural network as target outputs. After training finished, there would also be two predicted outputs. In this case, the classification rate would be calculated as follows. First, the predicted result would be classified to crisp results. If the predicted results are 0.8 and 0.23 for "No Rain" and "Rain" respectively, then the resolved outcome should be "No Rain" and the results changed to 1, O. These values are then compared with the original target data. If the target data is also categorized as "No Rain", for this set of test data, it is considered to be matched successfully. The classification rate for each fuzzy set is calculated as:

Sample Matched Successfully *100% Total Test Sample

6 Experimental Results

(2)

The experimental tests are divided into two parts. In the first part, the four AI tools: Neuro-Forecaster, Professional II Plus, NeuralSIM and NeuroSolutions are used accordingly. Except Professional II Plus which ran on Sunsparc10 workstation for simulation. Personal Computer (PC) with Pentium II 333MHz CPU was used by other application tools throughout the tests. Owing to the difference of main features for each of these tools, the type(s) of models being selected


would vary. Results are shown as follows. In the second part, experimental results using our proposed fuzzy neural network will be presented.

6.1 Neuro-Forecaster

The experimental tests were divided and carried out in two separate sets, namely Test Set A and Test Set B. In Test Set A, meteorological data from a single weather station (Hong Kong Observatory, HKO) was taken against those from multiple weather stations for temperate (TT) and rainfall (RF) forecast. In Test Set B, different network models were used to forecast rainfall (RF) based on multiple data from multiple weather stations.

In Test Set A, standard feed forward backpropagation model with sigmoid transfer function was used to forecast 24 hour temperature and rainfall amount. Results were compared between that involving data from a single weather station (Hong Kong Observatory, HKO) against those from multistations. Since Neuro-Forecaster allows only one output node in each network, four individual tests were needed to generate the experimental results. For network training criteria, the tolerance level was preset to 0.1 and the maximum number of training cycles were fixed at 1000 epochs. In Test Set B, five different network models were used for comparison, namely: (1) Standard Sigmoid Feedforward Backpropagation (FFBP) Networks; (2) Genetica (GA Model); (3) Fuzzy Neuro Networks; (4) FastProp. Radial Basis Function (FRBF) Networks; and (5) FastProp. Hyperbolic Tangent Function (FHTF) Networks. Results are shown in Table 2.

Some important findings can be observed. As revealed from Test Set A, the accuracy and correlation the meteorological data for temperature forecaster is very promising irrespective of using data from a single or multi-stations. An average accuracy of over 93% and correlation over 0.9 is attained. Nevertheless, an overall 79% improvement in % error and over 180% improvement in correlation values in rainfall forecast have been achieved when data from multistations are taken into consideration.

Regarding the performances of different network models, the best accuracy of 98% in rainfall forecast was attained while Genetica Net


Builder, an impressive model provided by NeuroForecaster constructed the optimal network model by selecting the best combination of input data, and the most predictable forecaster horizon based on Genetic Algorithms (GAs). A snapshot (generation 261) of multi-stations rainfall forecast using Genetica Net Builder is shown in Figure 29. In the figure, the total population size is the maximum number of chromosomes input to the system. Out of the 196 input data, 137 of them are selected for network training. Besides, the application allows users to retain the best 5 networks for evaluation, it also provides additional functions to purge the under-performing networks automatically.

Table 2. Experimental results of weather prediction using NeuroForecaster.

I Learn I Test I Overall Test Description Absolute % Absolute % Correlation %

Error Error Error Error value Error Test Set Single Vs. Multi·stations Temperature (TT) & Rainfall (RF) Forecast· Usiing standard Feedforward Backprogation

rAI Network (Siamoid Transfer Function

1 Multi-stations TT Flc 17_04 6.29% 16.85 6.22% 0.9103 6.19%

2 HKO Temp. FIG 17.51 7.46% 17.60 7.56% 0.9005 7.89%

3 Multi-stations RF FIG 18.25 2.54% 17.05 2.37% 0.4995 2.44%

4 HKO RF FIG 21.43 2.98% 33.07 4.59% 0.1775 4.39% Test Set MUlti-stations Rainfall (RF) Forecast Using Different Modets

[Bl

5 Standard Sigmoid FFBP Networks 18.25 2.54% 17.05 2.37% 0.4995 2.44%

6 Genetica (GAl 13.67 1.90% 12.12 1.69% 0.3128 1.79%

7 Fuzzy Neuro Networks 40.09 5.69% 39.03 5.54% 0.3781 5.64%

8 FastProp. Radial Basis Networks 75.41 9.59% 73.51 9.95% 0.1800 9.89%

9 FastProp Hyp. Tangent Networks 20.26 2.82% 20.40 2.84% 0.3976 2.84%

Figure 29. A snapshot of the GENETICA Net Builder window.


In view of the various analysis tools provided by Neuro-Forecaster, one analysis tool known as "Distribution Pattern" is very useful to analyse the distribution and contribution to every input node to the whole network. As shown in Figure 30, indicator 1 (mean sea level pressure) contributes an "even" distribution for training both the "Very Negative (VN)" and "Very Positive (VP)" target node. In other words, users can base on this indicator to justify the "quality" of their input data set.

Figure 30. Distribution pattern chart for rainfall prediction model.

6.2 Professional II Plus

Similar to Neuro-Forecaster, two Test Sets were being applied in the experimental tests. In Test Set A, using standard Feedforward Backpropagation (FFBP) network, comparison between single weather station (HKO) and multi stations was conducted. In Test Set B, the application was validated against four network models: (1) Standard Feedforward Backpropagation Network (FFBP); (2) Adaptive Resonance Theory (ART); (3) Learning Vector Quantization (LVQ); and (4) Radial Basis Function Network (RBFN). Results are shown in Table 3.

For the experiment, since the software has not put a limit on the number of output nodes, both the temperature (TT) and rainfall (RF) can be "predicted" directly, which can significantly reduce the processing


time. An overall promising results were attained in the Temperature (TT) forecast, with the average of over 95% accuracy and over 0.85 correlation rate. In rainfall prediction, the application of multi stations data has achieved over 50% improvement of % error and over 130% increase in correlation rate. In the second test where the performances of different networks were considered, the Radial Basis Function Network (RBFN) attained the best results, both in temperature and rainfall forecast, with promising results of 96.7% and 98% accuracy in temperature and rainfall prediction respectively.

Table 3. Experimental results for weather prediction using Professional II Plus

Temperature (TT) Rainfall (RF)

Test Set % Error Icorrelation % Error Icorrelation

A Single Vs. Multi-stations TT I RF Forecast Using Feedforward Backprop. Network

1 Single Station (HKO) 5.45% 0.85 5.07% 0.16

2 Multi-stations 4.28% 0.96 2.18% 0.37

B Multi-station TT I RF Forecast Using Different Network Models

3 Feedforward Backprop Network (FFBP) 4.28% 0.96 2.18% 0.37

4 Adaptive Resonance Thoery (ART) 5.12% 0.91 3.84% 0.21

5 Learning Vector Quantization (LVQ) 4.87% 0.93 2.16% 0.38

6 Radial Basis Funciton Network (RBFN) 3.21% 0.99 2.02% 0.47

Two useful "probing" tools that can be found in Professional II Plus are the "weight histogram" and "confusion matrix". Weight histogram is used to observe the overall learning results of different network nodes, while confusion matrix is a useful vision tool to observe the correlation between target and predicted output. A typical example obtained from the Feedforward Backpropagation network (FFBP) for multi-station weather prediction is shown in Figure 31.

6.3 NeuroSIM

Two separate tests were conducted in this experiment: (1) Single weather station 24 hour temperature (TT) and rainfall (RF) forecast; and (2) Multi-stations temperature (TT) and rainfall (RF) forecast. Since the whole application can be done within Microsoft Excel environment and the selection of input nodes can be done by "cell-


highlighting" just as usual Microsoft Excel operations, the implementation steps can be tremendously simplified. For examples, only one Excel spreadsheet is needed to conduct two separate tests. Besides, experimental results can be simply generated by invoking the "Test" command. The result will be produced as an Excel. spreadsheet (Tables 4 and 5) which can facilitate user for any additional graphical analysis using standard Excel tools.

Figure 31. Weight histogram (Left) and Confusion Matrix (Right) for the FFPB network model.

Similar to other application tools, results of multi-stations weather forecast is 30% more accurate compared with results of single weather station's. An overall accuracy of 98% and 97% is attained by the temperature and rainfall predictions respectively, with an improvement of 130% in "Net-correlation" in the rainfall prediction.

Table 4. Single weather station temperature and rainfall forecast results.

Net- Avg. Abs. Max. Abs. Accuracy Cont.

Predict RF RMS Error Interval Records Correlation Error Error (20%)

195%) All 0.167 11.516 715.040 43.624 0.957 84.815 6801 Train 0.177 5.529 715.040 29.483 0.986 57.327 4760 Test 0.199 25.478 532.885 65.683 0.941 127.766 2041 Valid 0.167 11.516 715.040 43.624 0.935 84.815 6801


Predict TT RMS Error Interval Records Correlation Error Error (20%)

195%)

All 0.894 13.201 71.027 16.912 0.896 32.880 6801 Train 0.910 13.662 68.204 17.457 0.919 33.944 4760 Test 0.893 12.126 71.027 15.565 0.881 30.277 2041 Valid 0.899 13.201 71.027 16.912 0.896 32.880 6801


Table 5. Multistations temperature and rainfall forecast results.


Predict RF RMS Error Interval Records Correlation Error Error (20%)

195%) All 0.383 11.823 416.920 23.961 0.978 85.470 6801 Train 0.375 5.654 316.920 19.307 0.998 56.985 4760 Test 0.337 26.212 433.173 32.607 0.969 129.565 2041 Valid 0.383 11.823 316.920 27.961 0.989 85.470 6801

Net- Avg. Abs. Max. Abs. Accuracy Coni.

Predict TT RMS Error Interval Records Correlation Error Error (20%)

(95%)

All 0.957 10.641 77.771 14.205 0.984 27.618 6801 Train 0.951 10.835 77.659 14.435 0.983 28.068 4760 Test 0.949 10.189 77.771 13.653 0.986 26.558 2041 Valid 0.957 10.641 77.771 14.205 0.984 27.618 6801

[VS group 0 2 30/freq 0.357] -1.00 1.00 Avg 109,000314.000 [f 0,32] -1.00 1.00 Avg 57,000 314,000 [f 0,01]

V T03 fzrgt 0,00 1.00 314,000333,000333,000 [f 0,05] I F026 <DP> [VS group 0 2 30/freq 0,352] V TOl Linear -1.00 1.00 Avg 22,000 263,000 [f 0,00]

T02InvPwr2 -1.00 1.00 Avg 22,000 263,000 [f 0,33] V T03 fzlft 0,00 1.00 -69,000 -69,000 22,000 [f 0,01] V T04 fzrgt 0,00 1.00 263,000273,000273,000 [f 0,02] I F027 <RH> [VS group 0 2 30/freq 0,510] V TOl Linear -1.00 1.00 Avg 40,000 97.000

Figure 32. A snapshot of Data Analysis and Transformation Table on three meteorological elements: Dry bulb temperature (DB), Dew point temperature (DP), and relative humidity (RH).

Another impressive feature for NeuroSIM is the capabilities of data pre-processing. As explained in the previous section, before the network training, all input nodes will undergo different transformations in order to give the "best" selection for network training. A typical example is shown in Figure 32. For "Dry bulb temperature (DB)", three


different transformations have been performed: 1) Linear Normalization (Linear); 2) Hyperbolic Tangent Function (Tanh); 3) Fuzzy Left Transformation (fzlft). In this case, "Linear Normalization" with the best performance is selected for network training. A similar case shall be the "Dew Point Temperature (DB)" and "Relative Humidity" in which "Inverse Square Function (InvPwr2)" and "Hyperbolic Tangent Function (Tanh)" are selected accordingly.

6.4 NeuroSolutions

Same as the other experiments, two Test Sets were performed. In Test Sets A, a "Generalized FeedForward" network and data from a single station (HKO) were used for the 24-hour temperature (TT) and rainfall (RF) prediction is compared with the multi-station weather prediction. In Test Set B, four different network models were used for multi-station weather forecast: (1) Multilayer Perceptrons Model; (2) Generalized Feedforward Network Model; (3) Principal Component Analysis (PC A) Model; and (4) Time Lag Recurrent Network (TLRN). Results are shown in Table 6.

Table 6. Experimental results for weather prediction using NeuroSolutions.

Temperature (TT) Rainfall (RF)

Test Set % Error Icorrelation % Error Icorrelation

A Single Vs. Multi-stations IT I RF Forecast Using Generalized Feedforward Networks

1 Single Station (HKO) 10.98% 0.78 4.45% 0.14

2 Multi-stations 3.32% 0.96 2.45% 0.32

B Multi-station IT I RF Forecast Using Different Network Models

3 Multilayer Perceptrons Model (MLPM) 6.97% 0.88 5.55% 0.21

4 Generalized FeedForward Model 3.32% 0.96 2.45% 0.32

5 Principal Component Analysis (peA) Model 4.88% 0.92 3.12% 0.31

6 Time Lag Recurrent Network Model (TLRN) 2.12% 0.99 1.21% 0.61

Two major findings can be observed from the experimental results: (1) When multi-station weather prediction was used, a significant improvement both in temperature and rainfall forecast was attained; (2) Among all different network models, "Time Lag Recurrent Network Model (TLRN) performs the best while "Multilayer Perceptrons Model (MLPM)" performs the worst mainly because the weather prediction


problem is a typical time series dynamic forecasting problem for which the static "Multilayer Perceptron Model" can hardly simulate the situation with promising results.

6.5 An Integrated Fuzzy Neural Network Model

An automatic learning scheme was applied in the experiments. The learning rate changed from 0.8 to 0.01 progressively and a momentum of 0.9 was adopted. This setting allows the net to make a dynamic weight adjustment as long as the corrections are in the same general direction for patterns preventing large fluctuation. Then, the activation function used was binary sigmoid. The hidden nodes were chosen based on the approximation of relation by Widrow (1997):

H = P / (10(m+n))

where P = number of training examples m = number of outputs n = number of inputs

The network architecture is represented as I-H-O where I is the number of input nodes, H is the number of hidden nodes and 0 is the number of output nodes. Experimental results are presented into two main categories, namely "Prediction of six-hourly dry-bulb temperature" and "Prediction of 24 hours accumulated rainfall". In each category, various network architectures and "degree of fuzzification" are applied. Results are shown as follows.

6.5.1 Prediction of Six-Hourly Dry-Bulb Temperature in Region 3

This part aimed to predict next six-hourly dry-bulb temperature using neural networks only and compare the performance based on input data from multistations and single station respectively. There were exactly 6283 sets of records used for the experiments. The number of training sets was 5083 and number of testing sets was 1200. The following table shows the setting and the results of different tests in experiments. The first two tests applied data from multiregions as input variables. The difference between them is that the first one (5.1.1) uses four consecutive 6-hour periods of data from multiregions as input to predict the temperature phenomenon in the next 6-hour period. While the second test only gives prediction based on previous 6-hour period of


data. Tests 5.13- 5.14 are similar to Tests 5.1.1-5.1.2 except that they use data from a single region.

As shown in Table 7, Test 1.1 gives the best performance. The correlation is 0.967 and absolute percentage error is 5.01 %. It is found that the five-region data actually is quite useful for the network to learn the weather phenomenon. It may be because all regions are located near each other. There may be some relationship between them and the neural network can take advantage from these regional data for generalizing some results. However, the results indicate that the improvement of adding data from more regions for the experiments is not so significant in predicting the temperature.

Table 7. Experimental results for prediction of 6-hourly dry-bulb temperature in Region 3.

Test Input Variable No. Network Correl- Absolute % Set Architecture ation Error 1.1 [1-3,5-14, 16-23 in Table 1] x 4 84-6-1 * 0.967 * 5.01 %

(four time jJeriod before) (multiregion) 1.2 [1-3,5-14, 16-23 in Table 1] 24-20-1 0.935 5.66%

(multiregion) 1.3 [10-15 in Table 1] x 4 24-20-1 0.956 5.02%

(four time period before) (single region) 1.4 [10-15 in Table 1] 6-72-1 0.928 11.58%

(single region) * Optimal Performance

~ 0.9 +------+-----,----ot---r--+--:l;-;---------j .!, 0.8 t-------.--__.l__ M---mrA---IlIL-~ .. !; '=' 0.7 +------::o.t-...... cItI-...,. "-"tJI1J-LPIW'""1"'-l1F---H-l"''-I'I!-

,!.g 0.6 HIHIIl\h-lInllf'l1f----''------'lf-----L-::..r-------lI--l'-----lIF1l1fll--llr:l .... f~0.5r..~~~-----~~--------~~~ t! e 0.4 -r---'--------------'---'-"'-'='-''''''-'~'-''-'--------f'---'Y "0 "§ c 0.3 +-----------------------j :; 0.2 +-----------------------1 ~ 0.1 +----~------------------j o~ ________________________________________ ~

TeslSample

Figure 33. Dry-bulb temperature prediction performance.


6.5.2 Prediction of 24 Hours Accumulated Rainfall in Region 3

This part includes three different types of tests:

~ Test 2.1: 24 hours accumulated rainfall prediction in Region 3 without fuzzification.

~ Test 2.2: 24 hours accumulated rainfall prediction in Region 3 with fuzzification.

~ Test 2.3: 24 hours accumulated rainfall prediction in Region 3 with fuzzification (with other fuzzy set).

79

In each of the above tests, data from multi regions and single region are used for testing. In the experiment, we only defined two fuzzy sets including 'No Rain' and 'Rain'. The experiment aimed to compare the performance of test with and without fuzzification, and with data from single or multiregions.

There were exactly 3715 sets of records used for the experiments. The number of training sets was 2425 and number of testing sets was 1290. Tables 8-10 show the settings and results of different tests in the experiments.

Test 2.1 The first test predicted the next 24-hour accumulated rainfall in Region 3 without fuzzification. The target output and predicted results were then classified by classification method to two classes (Rain, > 0.05mm and No Rain <= 0.05mm). After that the classification rate can be calculated and experimental results are shown in Table 8.

Test 2.2 The second test predicted the next 24-hour accumulated rainfall in Region 3 with fuzzification. The rainfall target output from training and testing sets first was fuzzified to two fuzzified outputs (Rain and No Rain). The membership function setting is shown in Figure 34. The fuzzified outputs would act as target outputs in neural network for training. After training, the predicted results and target outputs were then classified by classification method to two classes (Rain and No Rain, Figure 34) according to the largest value in two fuzzy sets. After that the classification rate can be calculated results are shown in Table 9.


No Rain Rain

o ~ ______ -L ________ ~ ____________ __

0.05mm 0.06mm

Figure 34. Membership functions used in Test 2.2.

Test 2.3 The third test predicted the next 24-hour accumulated rainfall in Region 3 with fuzzification using different fuzzy sets compared with that of the second test. In this test, the membership function was defined with more fuzziness (more data exhibit certain degree of fuzziness). The main difference between the two tests was the fuzzification parameter settings. In this test, the range of fuzzy set of rain was from a to Imm and the fuzzy set of no rain was from 0.05 to 60mm. The membership function setting is shown in Figure 35. Experimental results are shown in Table 10.

No Rain Rain 1

o ~ ______ -L ________ ~~ __________ ___

0.05 mm 1 mm

Figure 35. Membership functions used in Test 2.3.

First, to compare the first test without fuzzification and second test with fuzzification, it was found that after fuzzification, the performance of the system actually were better enough. Neural network actually could learn more efficient and predict more accurately. In the first test without fuzzification (Test 2.1), a lot of sample test data was wrongly classified from no rain to rain. The neural net overestimated the data to be rain and made the result of classification rate of no rain fuzzy set very low. To sum up, the prediction of neural network was dominated


Table 8. Experimental results of Test Set 2.1.

Test Input Network Classifi. Classifi. Average Variable No. Archi- rate of rate of classifi-

tecture (Rain) (No Rain) cation rate 2.1.1 [1-3,5-14, 16-23 84-20-1 * 68.5% * 5.61 % * 50.8%

in Table 2] x 4 (multiple (four consecutive regions) time periods)

2.1.1 [10-14 in Table 2] 20-11-1 60.1% 0.57% 42.3% x 4 (four consecu- (single tive time periods) region)

* Optimal Performance



tecture fuzzy set fuzzy set cation rate (Rain) (No Rain)

2.2.1 [1-3,5-14, 16-23 84-20-2 *77.4% * 52.7% * 70.1 % in Table 2] x 4 (multiple (four consecutive regions) time periods)





tecture fuzzy set fuzzy set cation rate (Rain) (No Rain)

2.3.1 [1-3,5-14, 16-23 84-20-2 *88.5% * 66.7% * 82.1% in Table 2] x 4 (multiple (four consecutive regions) time periods)




on one fuzzy set (rain). For Test Set 2.2, there was a significant improvement in rainfall classification rate, especially for the classification rate of "No Rain" category, which was poorly classified in the Test Set 2.1, the one without fuzzification ..

In Test Set 2.3, the performance was actually the best. The definition of fuzzy sets were more fuzziness (more sample data exhibit certain degree of fuzziness). The peak performance was average classification rate 82.1 %, From the second and third test, it was found that the definition of membership function for fuzzy sets affected the performance very much. It is suggested that the integration of certain degree of fuzzification into FFBP model had a promising effect on prediction result.

Comparing the multi regions' data and single region's data, from three different tests, it was found that multi regional data actually performed better, the improvement was much more significant than temperature prediction.

7 Conclusion In this chapter, we have presented four major applications tools for teaching and research studies, in order to illustrate the academic concepts in Neural Nets, Fuzzy Systems and Genetic Algorithms. As clearly pointed out in Section 1, the main aim of this study is not to compare the strengths and weaknesses of different applications tools but to demonstrate the usefulness of application tools in teaching the new computer concepts and to provide a useful indication and recommendation to their counterparts, colleagues and collaborators for selecting the best teaching or application tools that meet individual needs. Nevertheless, here are guidelines which may be useful for the selection of application tools: (1) In case of noisy or low quality input data which extensive data preprocessing or objective data selection scheme is needed, Neural SIM will be the most appropriate choice; (2) If external modules (e.g. tailor-made pre-processing functions, transfer functions) are needed to integrate into the application tools, NeuroSolution provides the most flexible and modular (object-oriented) scheme, together with fancy probing features; (3) If the main purpose of using the application tools is just ·to illustrate and compare the


features of different neural network architectures, Professional II Plus can be the most handy solution; and 4) If the user wants to implement a time-series forecasting system, and would like to apply other architecture such as Genetic Algorithms (GAs), NeuroForecaster will be the best option as it provides a network construction option known as "Genetic a Net Builder" which makes use of GA for network construction and optimization.

In order to illustrate the concept of AI modeling and differentiate the main features of different application tools, case problems on weather forecast in the Hong Kong region were discussed in Section 4. We demonstrate that different tools can be applied appropriately to fulfil the needs of integrating the AI models for solving various problems. In term of system pnicticality, we note that a significant improvement in temperatures and rainfalls forecast was achieved by using multi-station approach in comparison with that by the single-station one [19].

Besides, with the integration of fuzzy system into Feedforward Backpropagation neural network, significant improvement in rainfall and temperature prediction was attained.

Acknowledgments

We are grateful to the Hong Kong Observatory of the Hong Kong Special Administrative Region of the People's Republic of China for the provision of the necessary weather data for the case study in this research. We are also grateful to the Hong Kong Polytechnic University for its partial support of the RGC Grant #B-Q092 to complete those case studies.


References

[1] Chung, C.c. and Kumar, V.R (1993), "Knowledge Acquisition using a Neural Network for a Weather Forecasting Knowledgebased System," Neural Computing and Applications, No.1, pp. 215-223.

[2] Davies, E.R (1990), Machine Vision: Theory, Algorithms, Practicalities, Academic Press Ltd.

[3] Dubois, D. and Prade, H. (1981), Fuzzy Sets and Systems: Theory and Applications, Academic Press.

[4] Enbutsu, L., Baba, K., and Hara, N. (1991), "Fuzzy Rule Extraction from a Multilayered Neural Network," Int. Joint Conference on Neural Networks (IJCNN'91), Singapore.

[5] Fausett, L. (1994), Fundamentals of Neural Networks: Architectures, Algorithms, and Applications, Prentice Hall Inc.

[6] Goldberg, D.E. (1989), Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley.

[7] Janikow, C.Z. (1994), "Learning Fuzzy Controllers by Genetic Algorithms," Proceedings 1994 ACM Symp. On Applied Computing, ACM Press, New York.

[8] Klir, G.J. and Folger, T.A. (1982), Fuzzy Sets, Uncertainty, and Information, Prentice Hall.

[9] Kosko, B. (1992), Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence, Prentice Hall Inc.

[10] Lee, RS.T. and Liu, J.N.K. (1999), "An Automatic Satellite Interpretation of Tropical Cyclone Patterns Using Elastic Graph Dynamic Link Model," International Journal of Pattern Recognition and Artificial Intelligence, (to appear).


[11] Lee, RS.T. and Liu, J.N.K (1999), "An Oscillatory Elastic Graph Matching Model for Scene Analysis," Proceedings of International Conference on Imaging Science, Systems, and Technology (CISST'99), June 28-July 1, Monte Carlo Resort, Las Vegas, Nevada, U.S.A.

[12] Li, B., Liu, J. and Dai, H. (1998), "Forecasting from Low Quality Data with Applications in Weather Forecasting," Int. Journal of Computing and Informatics, Vol. 22, No.3, pp. 351-358.

[13] Liu, N.K (1988), Computational Aspects of a Fine-mesh Sea Breeze Model, M. Phil. Dissertation, Department of Mathematics, Murdoch University, Western Australia.

[14] Liu, N.K and Lee, KK (1997), "An Intelligent Business Advisor for Stock Investment," Expert Systems, Vol. 14, No.3, Blackwell Publishers, U.K, pp. 129-139.

[15] Liu, J.N.K and Lee, RS.T. (1999), "Rainfall Forecasting from Multiple Point Sources Using Neural Networks," Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC'99), October 12-15, Tokyo, Japan (to appear).

[16] Liu, J.N.K and Lee, RS.T. (1998), "Invariant Handwritten Chinese Character Recognition by Dynamic Link Architecture," Proceedings ICONIPIJNNS'98, Vol. 1, pp. 275-278, Kitakyushu, Japan.

[17] Liu, J.N.K and Sin, KY. (1997), "Fuzzy Neural Networks for Machine Maintenance in Mass Transit Railway System," IEEE Trans. on Neural Networks, Vol. 8, No.4, pp. 932-941.

[18] Liu, J.N.K and Tang, T.I. (1996), "An Intelligent System for Financial Market Prediction," Proceedings of the 2nd South China International Business Symposium, pp. 199-209, Macau.

[19] Liu, J. and Wong, L. (1996), "A Case Study for Hong Kong Weather Forecasting," Proceedings of Int. Conference on Neural Information Processing, pp. 787-792, Hong Kong.


[20] Luger, G.F. and Stubblefield W.A. (1993), Artificial Intelligence: Structures and Strategies for Complex Problem Solving, 2nd

Edition, Benjamin Cummings Publishing Company Inc.

[21] McCulloch, W.S. and Pitts, W. (1943), "A Logical Calculus of the Ideas Immanent in Nervous Activity," Bulletin of Mathematical Biophysics, No.5, pp. 115-133.

[22] McGregor, J.L., Walsh, KJ. and Katzfey, J.J. (1993), "Climate Simulations for Tasmania," Fourth Int. Conference on Southern Hemisphere Meteorological and Oceanography, American Meteorological Society, pp. 514-515.

[23] Pal, S.K and Wang, P.P. (1996), Genetic Algorithm for Pattern Recognition, CRC Press.

[24] Silvermann, R. and Bernstein, R. (1971), "Digital Techniques for Earth Resource Image Data Processing," Proceedings of the American Institute of Aeronautics and Astronautics, Vol. 21, pp. 455-468.

[25] Slezak, E., Bijaoui, A. and Mars, G. (1990), "Structures Identification from Galaxy Counts: Use of the Wavelet Transform," Astronomy and Astrophysics, Vol. 227, pp. 301-316.

[26] Widrow, B. and Smith, F.W. (1963), "Pattern-Recognition Control Systems," Proceedings of Computer and Information Science Symposium, Spartan Books, Washington, DC, U.S.A.

[27] Wong, F., Wang. P.Z. and Goh. T.R. (1991), "Fuzzy Neural Systems for Decision Making," Proceedings of the Int. Joint Conference on Neural Networks (IJCNN'91), Singapore.

[28] You, J., Liu, J. and Lee, R. (1999), "An Image Matching Approach to Tropical Cyclone Pattern Recognition," Proceedings of International Conference on Imaging Science, Systems, and Technology (CISST'99), June 28-July 1, Monte Carlo Resort, Las Vegas, Nevada, U.S.A.

CHAPTER 3

ARTIFICIAL INTELLIGENCE TECHNIQUES FOR AN INTERDISCIPLINARY

SCIENCE COURSE

C.L. Karrl, c. Sunal2, and C. Smith2

1 Department of Aerospace Engineering and Mechanics 2 Area of Teacher Education The University of Alabama Tuscaloosa, AL 35487-0280

USA

This chapter describes an innovative course developed and taught at The University of Alabama in which students in the College of Education are given an overview of artificial intelligence (AI) techniques including expert systems, fuzzy systems, neural networks, and genetic algorithms. The class, ESM 130: Artificial Intelligence Systems in Science, was developed and is team-taught by professors from the Colleges of Engineering and Education. When artificial intelligence techniques are taught in an engineering or computer science curriculum, the focus is generally on the mathematical or algorithmic details of the various techniques. In ESM 130, however, the focus is on the biological systems upon which most artificial intelligence techniques are based, and the subsequent modeling of the biological paradigm. The goal of the class is to provide future educators with enough information about the science of the twenty-first century to effectively educate, motivate, and stoke the fires of inquiry burning in their future students. To do this, they must have at least a fundamental understanding of artificial intelligence: what it is, where it comes from, and what it can be used for.

88 C.L. Karr, C. Sunal, and C. Smith

1 Introduction Artificial intelligence (AI) techniques are becoming more and more prominent in the fields of science, engineering, medicine, business, transportation, manufacturing, and others. Thus, it is extremely important for today's students to gain a true appreciation for what these techniques can do. To achieve this goal, it is imperative that future teachers (1) gain a general understanding of the mechanics of AI techniques, (2) understand what the techniques can be used to do, and (3) know how, in principle, AI techniques are developed. To this end, researchers from the Colleges of Engineering and Education at The University of Alabama developed a college course offered to undergraduate teacher education majors. The course, ESM 130: Artificial Intelligence Systems in Science, is a four semester-hour credit course that serves as an education major's interdisciplinary science elective. This course combines laboratory activities, tutorials, lectures, simulations, and several other strategies for effective learning.

This innovative, interdisciplinary course, ESM 130: Artificial Intelligence Systems in Science was developed under a grant from the National Aeronautics and Space Administration's NOVA project. Project NOVA is a NASA-sponsored program created to develop and disseminate a national framework for enhancing science, mathematics, and technology literacy for pre-service teachers in the twenty firstcentury. Currently, NOVA consists of a network of over fifty member institutions working to produce enhanced scientific literacy for preservice teachers. This effort is being accomplished through the demonstration of an undergraduate science and mathematics course framework, examples of successful course models, and a mentoring support system for faculty wishing to implement new courses or modify existing courses at their universities. The framework uses interactive learning and integrates science, mathematics, and technology as a means of developing a new paradigm for educating pre-service teachers.

Teaching the fundamentals of artificial intelligence to pre-service teachers tends to create some interesting challenges. First, these students rarely have a solid background in mathematics. The concepts

Artificial Intelligence Techniques for an Interdisciplinary Science Course 89

of linear algebra, integration, and even probability cannot be used in the presentation. Second, these students often lack the computer programming skills needed to implement the algorithms that form the AI techniques considered. Finally, these students generally are not interested in the details and complexities of the problems these techniques often are used to solve. These considerations can be overcome in teaching the techniques to education majors because the goals of such courses for these students differ considerably from those of engineering or computer science students.

To begin to understand the goals of this course, consider a scenario that, in part, helped motivate its creation. A middle school student goes to her eighth grade science teacher and asks the question: "Ms. Jones, the character Data on the television show Star Trek is a neural network, what is that?" At this point, most eighth grade science teachers are forced into an answer something like the following "Well, Sue, that is just science fiction; we really can't do that." Unfortunately, this answer is not correct since we can "do that." However, this is not the worst outcome of this scenario. The worst outcome of such a response from a teacher is that it may cause a young student to lose interest in a scientific field. A much more appropriate and nurturing answer might be: "Sue, neural networks are loose computer models of the human brain. They are different from conventional computer programs in that, like humans, they can improve their performance over time via their experience in problem solving. Although we have not yet constructed neural networks as sophisticated as Data on Star Trek, in theory we certainly could. Why don't you let me dig out some reference material from the World Wide Web that you might find interesting." Developing teachers who can nurture students' scientific interests in artificial intelligence with responses such as this is the focus of ESM 130: Artificial Intelligence Systems in Science.

The intent of the course is to introduce students to artificial intelligence systems by relating them to systems found in nature with which the students already are familiar. Many AI techniques can be presented as computer models of systems in nature. The course presents a progression of such techniques, involving students in the modeling of natural systems. Further, it teaches the "process" of doing science,


which is fundamental to scientific literacy. There are four specific goals for the course:

(1) to help develop students' understanding of how systems in nature are modeled,

(2) to make students aware of the types of problems that routinely are solved in the natural sciences using AI techniques,

(3) to provide students with an introduction to AI techniques, and

(4) to provide students with the opportunity to solve scientific problems in a laboratory setting using AI tools they acquire from the Internet or that are provided by the instructor.

The course is a hands-on, minds-on experience for students. Each student is provided the opportunity to solve interesting problems from the natural sciences in a laboratory setting using modem computer software as is done by most scientists and engineers. In solving their respective problems, students are expected to apply the scientific method and draw conclusions based on their work. In addition, they form conclusions and impressions about the relative strengths and weaknesses of the various AI techniques as they are used to solve problems in several scientific disciplines. Thus, they have hands-on experiences that they are required to think about. This is hands-on, minds-on learning where problem solving occurs through many trials.

This chapter provides an overview of ESM 130: Artificial Intelligence Systems in Science as it is taught at The University of Alabama. After providing an overview of the course, an example of how a specific AI technique, genetic algorithms, is presented in the class is presented. Finally, conclusions are drawn and the future of the course is discussed.

2 Course Description

This course provides students with an overview of several AI systems typically used to solve scientific problems. Students have opportunities to solve interesting problems using AI techniques. The AI techniques


that are applied in laboratory activities include expert systems, neural networks, fuzzy logic, and genetic algorithms. Each of these techniques can be thought of as a model of a system in nature. In this course, emphasis is placed on natural systems such as the human nervous system, the immune system, and the genetic system and how each relates to an AI technique. Further, it teaches the "process" of doing science that is fundamental to scientific literacy.

The systems and corresponding AI techniques are explored by students using a structured investigative process. Initially, a system in nature is presented. Then, a corresponding AI technique is introduced as a computer model of the natural system. Next, methods of modeling the system on a computer are explored. Students acquire computer models of the natural systems from the Internet or the instructor, and apply the model to a scientific problem in a laboratory setting. They collect and analyze resulting data and decide whether initial hypotheses regarding the system are supported. Finally, closure is achieved through the discussion of problems that have been solved in the natural sciences using the various AI techniques addressed.

2.1 Course Focus

It is imperative that teachers of science at the elementary through high school levels, engineers, and scientists from all disciplines have at least a rudimentary knowledge of how to model natural systems. This includes some knowledge of the mechanics and capabilities of AI techniques and the natural systems on which they are based.

Many of the most popular AI techniques used today are based on systems found in nature. For instance, neural networks are models of the human neurological system, genetic algorithms are models of the human genetic system, and there are emerging AI techniques that are models of immune systems. Students receive several distinct benefits when common AI techniques are introduced as computer models of natural systems. First, they gain a better appreciation for the details of the natural systems. Second, they better understand the modeling of a system for which they already have gained a fundamental understanding. Third, they better understand the AI technique. Fourth, they gain some understanding of the synergy of systems required to

92 C.l. Karr, C. Sunal, and C. Smith

produce intelligence in nature. Fifth, they gain some understanding of the role the AI technique eventually may have in the development of truly intelligent machines of the future that will rely on a synergy of several AI techniques just as intelligence in nature relies on a synergy of several systems ..

One of the most important aspects of learning about a new technique is to understand its capabilities and limitations. This is a crucial and fundamental component in modern science and engineering. Students in ESM 130 have opportunities to use computer programs employing AI techniques to solve scientific problems in a laboratory setting from a variety of natural science fields. For instance, students determine the decay rate of nuclear material, track the trajectory of a planet, and determine the growth rate of a virus introduced into a population of organisms by using neural network software. This hands-on and reflective minds-on experience in using the AI techniques is invaluable in the attempt to develop students' understanding of the systems investigated and the strengths and weaknesses of the various modeling methods introduced.

An appreciation for the importance of modeling the wide variety of scientific disciplines which AI techniques can simulate is useful to preservice education students because it enables them to communicate to others the ways in which current science is done. Too often pre-service teachers are exposed only to classic scientific studies. While these are important, emerging scientific techniques for the investigation of today's scientific problems often are not studied. Thus, an important facet of this course is to provide students with general information on problems in the natural sciences solved using AI techniques as well as other means.

2.2 Topics

ESM 130: Artificial Intelligence Systems in Science is taught in a 15-week semester period. After introducing the idea of systems and computer modeling, the course material focuses on specific AI techniques. The course culminates with the presentation of student projects. These student projects involve the development of a presentation to a group of students in which one of the AI techniques


presented in the course. Eight major topics organize the course. These are described below.

The first topic is an introduction to the course and to technological tools used in the course such as electronic mail and the Internet. This occurs during week one of the course. All assignments and information dissemination in this course are done electronically. Thus, the first week of class is focused on providing students with information needed to use electronic mail to acquire and turn in assignments, and to use the Internet to obtain copies of course lectures that are made available on a course Web page3 and to locate and acquire computer software implementing the AI techniques to be introduced later.

The second topic is an introduction to basic theory of systems. This topic is addressed during week two of the course. The concept of a system is such an important theme that it pervades all of the sciences. Students develop their own definitions of a system, which they compare and contrast with "standard" definitions. Generally, in terms of this course and more widely in the sciences, a system is thought of as any collection of things that have some influence on one another [1]. Additionally, students participate in group-learning activities in which they explore the relationship between various human systems. They develop a preliminary working model identifying which of these systems are necessary for intelligence.

The third topic addressed in this course is an overview of systems in science that have been simulated by artificial intelligence techniques. During weeks three and four this topic is investigated. Students are familiarized with some common instances of scientific systems that have been simulated by AI techniques. They also explore a classic test for AI called the Turing Test. Here, the focus is on the wide range of scientific disciplines and systems in which AI has been applied with special emphasis on the natural sciences including geology, chemistry, physics, astronomy, and biology.

The remainder of the course is devoted to the introduction and use of specific systems in science and the related AI technique heavily used to

3 The web page address is: hUp://galabl.mh.ua.edu/classes/esm130.shtml


solve problems related to each system. These are presented in units. Each unit consists of the following four parts:

(1) an introduction to a system in nature, (2) a presentation of a method for modeling the system, (3) a laboratory session presenting students with a

problem related to the system which they investigate then solve by applying a computer model, and

(4) a discussion of complex scientific problems that have been solved using the AI technique.

Five topics are investigated using this structure. They constitute topics four through eight of the course. Each is described below.

The fourth topic is animal classification using tree diagrams. It is explored during weeks five and six of the course. The hierarchical structure by which animals are classified is studied using tree diagrams. The students construct their own hierarchical diagrams representing the classification of animals. These hierarchical diagrams are converted into tree diagrams. Mechanisms for modeling and searching hierarchical diagrams are discussed. At this point, the students solve a classification problem using a computer model of their tree diagrams in conjunction with algorithms for searching both depth and breadth first. Aside from discussions of scientific problems requiring rapid searching of hierarchical structures as in medical diagnosis, game-playing computer programs are presented as a popular instance of systems in which computers are used to search tree diagrams to make effective decisions rapidly.

The fifth course topic is developing rules for modeling systems in science: specifically, expert systems. This topic is addressed during weeks seven and eight. Students draw on their experience in classifying animals to develop a set of rules for accomplishing this goal. Methods for modeling these rules are discussed. Students expand this idea by working with another system, using computer software in which an expert system is able to successfully classify mineral samples. The students complete a laboratory experience in which they perform mineral classification, and then use the computer software to check their thought processes against that of the expert system. Finally,


additional examples of modeling rules for systems from chemistry, medicine, geology, and engineering are presented.

The sixth topic addressed in the course is manipulating scientific concepts to solve complex scientific problems: fuzzy logic. This topic is studied during weeks nine and ten of the course. Humans effectively manipulate abstract concepts to solve complex problems through fuzzy logic. Computers also use fuzzy logic to solve problems. In this course fuzzy logic is presented as an extension to expert systems. There are numerous classification tasks in which the evidence is not as concrete as in the above example of mineral classification. Fortunately, nature provides mechanisms for dealing with imperfect information, and for manipulating abstract concepts. For instance, in diagnosing illnesses physicians often must utilize subjective information such as "my head hurts." The students discover the need for such fuzzy systems as they solve a simulated problem from chemistry in a laboratory session. Specifically, the students solve a titration problem in a computersimulated environment. They work with an imperfect pH sensor and determine the need for subjective descriptions as they develop rules for neutralizing a solution. Fuzzy systems developed in chemistry, engineering, and physics are presented to provide the students with a feel for the wide ranging applicability of fuzzy logic.

The seventh topic investigated in the course is the human brain as a system: modeling it using neural networks. This topic is investigated during weeks eleven and twelve of the course. Models of the human brain are found in the form of neural networks. This unit begins with an exercise in which students describe their impression of how the human brain works. Next, they diagram a simple vision of how they think the human brain makes a decision. Then, they develop an initial concept of what a neural network is and how it works. They receive a description of the general way in which a neural network functions, and computer software for implementing a backpropagation neural network. Students solve one of several problems from natural science including: determining the decay rate of nuclear material, tracking the trajectory of a planet, and determining the growth rate of a virus introduced into a population of organisms. For closure, examples of problem solving and simulation using neural network applications in chemistry, geology, astronomy, and engineering are investigated.


The eighth and final course topic is the human genetic system: modeling it using genetic algorithms. This topic is studied during weeks thirteen and fourteen of the course. Human genetic systems are modeled using genetic algorithms. In this unit, the students' understanding of genetics is explored and developed. The steps necessary for modeling the genetic system are discussed. Computer software for implementing a genetic algorithm is provided and used to solve one of several scientific problems including performing spectral analysis of chemical compounds. Finally, several examples are presented of systems where problems are solved using genetic algorithms.

During the final week of the course, week fifteen, the focus is on student projects demonstrating their conceptualization of a major course topic. The students present the results of a course project assigned early in the course. The projects require them to demonstrate a fundamental understanding of at least one of the major topics presented in the class. Students develop a presentation consisting of exercises and discussion material designed to describe a system in science, a problem related to the system that has not been presented in class, and an AI technique that might be used to solve a problem related to the system.

3 Using a Research Based Innovative Teaching Methodology: The Learning Cycle

This class uses a research-based teaching methodology, the learning cycle. The learning cycle is a three-phase method of teaching based on what cognitive theorists, such as Jean Piaget, found were the best conditions for learning and constructing concepts [3], [5]. The three phases of the learning cycle have been given various names, but will be referred to here as Exploration, Invention, and Expansion [4]. Each phase is described below.

The first stage of the learning cycle is the Exploration. It is in this stage that students participate in "hands-on" activities that allow them to explore materials and manipulate objects. These activities may include discrepant events, which are experiments, activities, etc., that do not

Artificiallnteliigence Techniques for an Interdisciplinary Science Course 97

have the results expected by students [2]. This phase of the learning cycle asks students to access their prior knowledge and relate it to the new information they have witnessed, challenging them to reform and reconstruct their existing inaccurate concepts.

The teacher's role in the Exploration phase is to observe, assist, and facilitate student interaction. The teacher also may ask guided questions such as "What did you discover?" and "What can you try?". This phase enables teachers to ascertain what prior knowledge students are working with. It also enables teachers to confront students' knowledge with experiences motivating them to seek new ideas.

The second phase of the learning cycle is the Invention. It is in this phase that students use the experiences and ideas gathered in the Exploration to actually invent for themselves a more scientifically accurate idea of the concept being studied [6]. Through teacher assistance, the observations made and data collected in the Exploration are interpreted. The teacher also uses questions to help guide the students in the formation of the concept studied. Examples of questions . thatmay be used are, "What does this mean?", "What do you think?", etc. It is important to note that the lesson concept is not formally introduced until the students have had opportunity for "hands-on" exploration related to the concept, which occurred in the first phase of the cycle.

The teacher's role in the Invention phase is to clarify and formally introduce the students to the concept of the lesson. The students are provided with terminology and other relevant information. This distribution of information may sometimes include direct instruction. Misconceptions must be revised for the students as well. A range of examples of the concept is investigated. More examples of the concept also may be presented so students can eliminate similar ideas that may confuse them.

The final phase of the learning cycle is the Expansion. In this phase, the students' new ideas are applied to situations that are different, but related to those explored in earlier parts of the learning cycle. This allows for the reinforcing and cementing of their newly formed

98 C.L. karr, C. Sunal, and C. Smith

concepts. The more widely students apply an idea, the more useful it becomes.

The teacher's role in the Expansion phase is to provide students with problem-solving situations that use information from the previous stages in order to expand the students' new concepts. As students find more and more applications for the concept, they expand its potential usage, thus, this is called the expansion phase.

4 An Example: Genetic Algorithms

This section describes an example of the use of the learning cycle as employed to teach students in ESM 130: Artificial Intelligence Systems in Science about genetic algorithms. Genetic algorithms are search algorithms based on the mechanics of natural genetics. They represent a computer-based technique rich in complex probability-based theory regarding their convergence characteristics. Since, however, the typical student in ESM 130 has neither the mathematical, nor the computer background to fully appreciate the intricate details of genetic algorithms, the focus in the class is providing the students with an overall viewpoint of how and why genetic algorithms work. This approach is consistent with the goals of ESM 130: Artificial Intelligence Systems in Science.

What follows in this section is a synopsis of a "lesson plan" for the presentation of genetic algorithms. This approach is followed closely in the course and represents approximately six hours of class of combined lecture/laboratory time.

4.1 Exploration Phase

A. Objective: Students begin to recognize and state that there are many attributes in organisms that are pre-determined through genetics.

B. Procedure:

1. Students begin by completing the following group exercise:


Please list characteristics or attributes (desirable and undesirable) you believe the following might have (or not have): (a) a scientist, (b) a President, ( c) an AI student, (d) a decathlete, and (e) a soldier.

2. After discussing these attributes or characteristics briefly in class, the students go back to their respective groups and identify which of these attributes are genetically determined, or are hereditary. Again, we discuss and expand their ideas in class. We make sure to note that this passing of information from one generation to the next is prevalent in all organisms. Then, the instance of the peppered moth in the midst of the Industrial Revolution is discussed. This is a classic example of an organism having to adapt to its environment for survival.

Given that this phenomenon does occur, the students need to discuss mechanisms they might employ to take advantage of this phenomena to achieve some sort of improvement or adaptation from one generation to the next. A group discussion takes place about Darwin's idea related to competitive environments and the "survival-of-the-fittest." This also is a good opportunity to discuss the Baldwin effect: the idea that we can learn things that we then pass on to our offspring. The students discuss this phenomenon, whether they think it is feasible, and try to come up with examples that might be appropriate.

4.2 Invention Phase

A. Objective: Students describe their understanding of the basic mechanics of the human genetic system.

B. Procedure:

1. The invention phase begins with groups of students writing a short paragraph on how they believe the human genetic system works. This paragraph can contain sketches if appropriate.

2. Once the groups complete the above task, the instructor leads the class in a discussion in which each group presents their ideas


about the human genetic system and how it works. Following the discussion the instructor asks the key question, "What does a chromosome look like, and what is its purpose?" If the students did not specifically mention genes and chromosomes, then the instructor introduces to the class exactly what genes and chromosomes are, and the roles they play in the action of genetics. If the class mentioned genes and chromosomes, then the instructor has the groups draw a chromosome, label it, and describe what it does.

C. Closure: The instructor asks the key question, "How do you think we might use something like this to solve meaningful problems?"

1. At this point, a "black box optimization problem" is introduced. We talk about a situation in which we have a box that has a definite number of knobs to turn which effect some action on the world via a signal that is passed through the black box. Both the parameters (the knobs) and the effects (the output) can be multi-faceted, or vector quantities.

2. The students sketch some representation of this problem: they may get a table, a semantic net, a figure, or something else. After they present their ideas, they are shown a surface that represents a function of several variables.

They now work with a piece of black-box code constructed for them in Microsoft Excel. The code has some parameters that can be changed at the top of the spreadsheet. When altered, the spreadsheet produces five different plots of varying complexities. The functions that are plotted represent the output signal coming from their black-box. Some groups are asked to alter the parameters such that a given plot has the highest possible peak; some are asked to alter the parameters such that a given plot goes to zero at a specified time; some are asked to alter the parameters such that a given plot mirrors a curve that is superposed on their figure. All of the groups are given problems that in some way represent an optimization.


4.3 Expansion Phase

A. Objective: Students develop genetic operators and solve a scientific problem using genetic algorithm software supplied by the instructor.

B. Procedure:

1. The instructor reviews the concept of a black-box optimization problem by putting some information and sketches on the board. The instructor provides some examples of this type of problem from various scientific disciplines including chemistry, geology, and engineering. The examples generally are curve-fitting problems, examples of which include:

(a) Given data forming a parabola (x y pairs from the equation y = CIX2 + C2x + C3), which incidentally can be thought of as the trajectory of a satellite about the earth, can we compute the values of the three constants?

(b) Given some radioactive decay data (S, t pairs according to the equation Set) = So * exp( -At) ), which incidentally is the decay rate of a radioactive isotope, can we compute So and A?

(c) Given some data for the oscillatory motion of a springmass-dashpot system which is governed by the equation y = A exp (-cot) * sine cot), can we compute the values of A and ro?

2. Now, the students are introduced to the idea that basically each parameter can be thought of as being analogous to a gene. They are presented with the idea that each problem solution can be represented as an artificial chromosome - as a real-valued vector.

3. At this point, the class is divided into groups, and the groups are asked to develop genetic operators that might simulate evolutionary properties occurring in nature.


4. The class is re-assembled and the standard genetic operators used in genetic algorithm applications are discussed. These include selection, crossover, and mutation. Mention is made that people actually use advanced genetic-like operations including niching, recessive genes, and other things, but not often.

5. The students run genetic-algorithm based curve-fitting software provided by the instructor. The software uses the least median squares criteria to solve the optimization problems. It is fairly easy for the genetic algorithm to solve the problems presented to the class. However, the problems are virtually impossible for the students to solve by hand, using their trial-and-error process, in the short time allowed in class. Some examples the students worked on when neural networks were presented earlier in the class also are employed.

c. Closure:

1. The students are brought back together to discuss their impressions of the genetic algorithm.

2. Some examples are discussed in which genetic algorithms are used to solve real-world scientific problems.

If time is available, mention is made of learning classifier systems. Learning classifier systems are machine learning algorithms that employ genetic algorithms as their adaptive component. This discussion dovetails with prior discussions of artificial intelligence systems such as computer chess games. An example of a learning classifier system learning to do novel maneuvers in a fighter plane is presented.

5 Conclusions

ESM 130: Artificial Intelligence Systems in Science is a hands-on, minds-on interdisciplinary science course for future teachers. It is offered as a course in the College of Engineering at the University of Alabama. As well as exposing students to developing technologies in the field of artificial intelligence, the course offers students an


opportunity to practice the scientific method, to see how systems in nature are modeled using computers, and to embrace the learning cycle.

The goal of this course is to provide future teachers with a general understanding of some of the techniques that will play a role in developing technologies of the twenty-first century. It is also to have them be able to fuel the fires of interest that exist in many of their future students as these technologies continue to develop. The course attempts to achieve its goals by building on the students' knowledge of natural systems. Each artificial intelligence technique is introduced as a computational model of a system in nature. The students explore what they currently know about the natural system, they discover how a natural system might be modeled using a computer, they expand their knowledge by using a computer model of a natural system to solve a meaningful problem.

Initially, ESM 130: Artificial Intelligence Systems in Science was taken exclusively by education majors. However, in the most recent offering of the course, two engineering majors enrolled in the course. At first, there was a concern that this might create a problem since these two students had superior mathematical backgrounds. However, these two students interacted well with the education majors. In fact, the engineering majors served as tutor for the education majors in the area of math; the education majors, on the other hand, served as tutors for the engineering majors in the area of the natural sciences. The result was some highly competitive/cooperative teams. Some extraordinary projects resulted when the "mix was just right."

Like the natural systems studied in the course, ESM 130: Artificial Intelligence Systems in Science is an ever-changing entity. The course is part of a responsive curricula at the University of Alabama meant to constantly meet the needs of the students, industry, and the university itself. Thus, refinements are constantly made in the way the material is presented. As stated by one of the instructors, "if given enough time, we will eventually get it right." This is an interesting comment since the course consistently gets very positive feedback from students, faculty, and the administration.


Acknowledgments

This effort was supported by the National Aeronautics and Space Administration Headquarters' NOVA project.

References

[1] American Association for the Advancement of Science (1993), Benchmarks for Science Literacy, Oxford University Press, New York.

[2] Driver, R, Leach, J., Millar, R, and Scott, P. (1996), Young People's Images of Science, Open University Press, Philadelphia, PA.

[3] Karplus, R (1979), "Teaching for the Development of Reasoning," in A.E. Lawson (Ed.), 1980 AETS Yearbook: The Psychology of Teaching for Thinking and Creativity, ERIC/SMEAC, Columbus, OR.

[4] Lawson, A.E., Abraham, M.R, and Renner, J.W. (1989), "A Theory of Instruction: Using the Learning Cycle to Teach Concepts and Thinking Skills," National Association for Research on Science Teaching, Monograph #1, Atlanta, GA.

[5] Piaget, J. (foreword), Furth, H.G. (1969), Piaget and Knowledge: Theoretical Foundations, Prentice Hall, Englewood Cliffs, NJ.

[6] Sunal, D. and Sunal, C. (1991), "Young Children Learn to Restructure Personal Ideas About Growth in Trees," School Science and Mathematics, Vol. 91, No.7, pp. 18-25.

CHAPTER 4

ON THE ARCHITECTURE OF INTELLIGENT TUTORING SYSTEMS

AND ITS APPLICATION TO A NEURAL NETWORKS COURSE

J.F. Vega-Riveros Department of Electronic Engineering

Javeriana University, Santafe de Bogota, DC Colombia

This chapter presents the architecture of an Intelligent Tutoring System for a Neural Networks course that consists of four basic modules. The subject knowledge-base stores an object-oriented representation of the theme knowledge. The student modeling module traces the user interaction with the system and assess goal accomplishment and motivation-towards-achievement. The instructional strategy module in this implementation consists of a knowledge navigation tool based on a concept-space metaphor and an automatic multiple choice question generator. The user interface, based on the concept-space metaphor, provides the means for the student to access hypermedia information that includes theory and examples, the question generator and the neural networks simulators. The description of the architecture is followed by a presentation and analysis of a learning model based on which a new Intelligent Tutoring System architecture using collaborating agents is proposed.

1 Introduction

As the need for a wider coverage of high quality education at all levels becomes a strategic development issue, technology is regarded as instrumental towards making this goal feasible. The inclusion of

106 J.F. Vega-Riveros

Artificial Intelligence (AI) techniques in the development of computerized tutoring systems may provide means to achieve individual tutoring while the greater availability of communication services can be used to promote another important aspect of education, namely collaborative work. In this chapter we present an implementation of an Intelligent Tutoring System (ITS) for a Neural Networks course which consists of four basic modules: the subject knowledge-base, a student modeling expert system, an instructional model and the user interface.

The subject knowledge-base contains the course material structured in an object oriented fashion. This structure provides a basis for the student modeling expert system to assemble both the maps of goals to accomplish and goals accomplished. Additionally, the student model includes a measure of the level of concept acquisition and motivation. The instructional model considers two aspects: automatic question generation and knowledge navigation. The automatic question generation exploits the types of relation imbedded in the subject knowledge-base to produce multiple choice questions. The knowledge navigation is based on a space metaphor to present the user options to move through the course material. The user interface has a structure parallel to the subject knowledge-base, and through hypertext links associated with the knowledge navigation presents the course material to the student.

While the first implementation emphasized on individualization, the analysis of the experiments with some parts of the ITS and other types of tutoring systems developed at the Department of Electronic Engineering at laveriana University showed other aspects that should be considered in developing Computer Aided Teaching and Computer Assisted Learning (CAT/CAL) systems. Those aspects are related to the processes of learning at different levels, from the early stages of information reception to advanced stages where creativity is a central process, passing through simple problem solving and, articulation and argumentation of what has been learned.

Feedback is another major aspect to be considered. Traditionally, feedback is given through right/wrong replies to problem answers. Nevertheless, discussion groups may provide additional means to self-

Intelligent Tutoring Systems and Its Application to a Neural Networks Course 107

assess learning and promote advanced knowledge communication processes among students. Based on the analysis of the teachinglearning process, we propose the development of a new and more comprehensive architecture that includes additional information and processes that can be used by both teacher and student.

2 The Basic Architecture of the Intelligent Tutoring System

An ITS is composed basically of four modules as shown in Figure 1. The subject knowledge-base contains a representation of the knowledge related to the course material. This knowledge-base serves two purposes: on one hand it may be part of an expert system that assists the student in solving problems, and on the other hand, it is the basis for building the model of the cognitive structure of the student. The student modeling expert system provides information for the instructional model to guide the student through the course material and automatically generate questions. Finally, the user interface is the communication module between the student and the system. These modules are described in detail below.

Figure 1. Basic structure of an ITS.


2.1 The Subject Knowledge-Base

The knowledge-base, depending on the application, can use any of the representation schemes utilized in expert systems. In the ITS on Neural Networks, the knowledge-base uses the object-orientation paradigm and is structured as a hierarchical array of classes and objects, with their associated properties and methods together with the inheritance mechanisms [1]. We have defined four types of relations that can exist between the different entities in the knowledge-base:

• category relations which are present between a class and its subclasses and/or its objects;

• membership relations which refer to the ones existing between an object and its sub-objects or components;

• functional relations that are represented by the methods in the object-oriented paradigm and represent the' processes that relate entities in the object structure; and

• combinatorial relations that refer to structural similarity between parts of the knowledge-graph representation.

The first three types of relations can be easily represented in the objectoriented structure. The combinatorial relations require a fuzzy similarity measure which is a very complex computation problem and thus is not used in this ITS. However it is an interesting AI problem which would help bring about examples of knowledge-base sub-graphs different from the one being studied. For example, since neural networks are inspired on the nervous system, this type of relation finding may help clarify concepts of one by using examples of the other. Notice that even though there may be similarities between the knowledge sub-graphs, they may not have an identical structure.

Since we have not used the combinatorial relation in the ITS, I will not discuss it any further, and concentrate on the other three.

2.1.1 Category Relations

These can be described as is_a_type_of relationship. An example of this type of relation is presented in Figure 2. In all the figures in this


chapter, classes will be represented by circles, objects by triangles, properties by squares and methods by diamonds. From this figure the following relations can be found:

Jeedback_neural_nenvorksis_a_type_ofneural_nenvorks

JeedJorward_neuraCnenvorks is_a_type_of neuraCnenvorks

These two cases refer to subclass-class relations. The relations

Adaline is_a_type_of Jeedforward_neuraCnenvorks

Multilayer -perceptron is_a_type_of JeedJorward_neuraCnenvorks

Hopfield_nenvorks is_a_type_ofJeedback_nenvorks

Boltzmann_machine is_a_type_ofJeedback_nenvorks

are object-class relations.

multilayer -perceptron

Hopfield_network

Figure 2. Category relations.


2.1.2 Membership Relations

These refer specifically to the sub-object - object relations and can be enunciated as is_a_componenCof. An example of this type is shown in Figure 3. From this figure the following relation can be extracted:

inputs is_a_componenCof neuron.

Notice that each component of a neuron has associated properties. In the example of Figure 3, output has an associated property value, while input has two properties: value and weight. The object input can be an array, thus representing the multiple inputs of a neuron .

....

value

------10

Figure 3. Membership relation.

2.1.3 Functional Relations

In Figure 4, an example of this type of relation is shown. In this example, the method supervised_learning takes the property value of the object error and returns a value for the property weight of the object input. This relation involves three elements, an input, an output and the function itself. In the example of Figure 4, the following relation can be found:

supervised_learning uses error to_compute weight.

In this example, the method supervised_learning acts on properties of objects but could also create or delete instances of classes and objects, thus making the knowledge-base dynamic. In this case, properties for the newly created object are inherited from the classes it is attached to.


output value

-.... -.... -....

Figure 4. Example of a functional relation.

2.2 The Student Modeling Expert System

Student modeling allows to describe the student behavior and the level of acquisition of concepts to adapt a teaching strategy [2], [3]. Student modeling is a very complex task since it is a spontaneous, dynamic and non-monotonic process plagued with uncertainty. Students show a variety of behaviors during learning, depending on their environment, previous experience and knowledge. Additionally, there are multiple error sources in the student actions [4]. In our case, student modeling is based on the meaningful learning theory [5].

As mentioned above, the student model uses the subject knowledgebase and builds another hierarchical structure to store the tutoring goals. The information about the student is composed of:

• knowledge-domain;

• concept-map;

• evaluation information;

• tutoring goals;

• motivation towards achievement; and

• actions.

These components will be explained below.


2.2.1 Student Knowledge-Domain

The group of concepts through which the student has navigated was named student knowledge-domain. The user interface module continuously monitors the student's actions and sends this information to the Student Modeling Expert System. Each concept has associated information which is used in the student model to show his/her behavior and knowledge acquisition level. To represent this domain, a class knowledge_domain was created. Some objects can be initially created within this class to represent the previously acquired knowledge. When a concept is being studied a dynamic object with the same name is created within the knowledge_domain class. To differentiate between the subject knowledge-base and the student model, a prefix con_ is added to the object name. When created, the object inherits the properties of knowledge_domain that are explained below:

• count: the value of this property corresponds to the number of times the student has read the material related to the concept;

• examples: the value of this property counts the number of examples related to the given concept that have been studied;

• acquisition_level: this is a qualitative variable which measures the level of acquisition of concepts;

• in_map: this variable indicates whether a concept is in the student concept-map;

• f -problems: this integer variable counts the number of questions related to functional relations that the student has correctly solved;

• s-problem: this integer variable counts the number of questions involving either category or membership relations that the student has correctly solved; and

• time: the value of this property corresponds to the time used by the user to study a given concept; if the student covers the concept several times an average time is used in this case.

All the descriptions of the properties given above are self-explanatory, except for the property acquisition_level that needs some further clarification, especially since the value of this property will be used for


assessing when a tutoring goal has been accomplished. The acquisition_level can take on any of three possible values: rem, know and use. The lowest level corresponds to rem which indicates that the student has read the material related to a given concept. The next level is know which is further subdivided in three, i.e., know 1, which is reached when the concept can be incorporated in the model of the student cognitive structure; know2, which is acquired when questions of low difficulty have been satisfactorily solved; and know3, which indicates that questions of intermediate difficulty level have been correctly answered. The highest level is use which indicates that questions of high difficulty have been correctly answered by the student.

Since we have based the student model on the meaningful-learning theory [5], the incorporation of a concept in the model of the cognitive structure is carried out when all the related concepts are already in the structure or they have all reached at least the acquisition_level of know 1. We will need to refer to some of these properties in the sections that follow.

2.2.2 Student Concept-Map

Similarly to knowledge-domain, this structure may include permanent or dynamic objects to store a model of the student cognitive structure. Dynamic classes and objects that are attached to the concept-map as the study of the subject progresses replicate their hierarchical arrangement in the subject knowledge-base Static objects represent previously acquired knowledge. It is important to note that concepts maps [6] were adapted to object-orientation as explained in [7].

In this structure, a prefix cm_ is attached to the names of the objects. The dynamic objects are created by rules in the student modeling expert system and can be attached to permanent classes or to other dynamic objects previously created in the map. All these classes and objects are attached to the class concepCmap.

2.2.3 Evaluation Information

The evaluation information is central to updating the student's acquisition_level of concepts. This module also stores the time used by


the student to select an answer to a question and its associated difficulty level.

The number of goals accomplished by the student is an important parameter to evaluate his/her learning performance. Even though in this ITS the student has a certain degree of liberty to elaborate a study plan, we cannot loose sight of the fact that in the end of the learning process, the student must have acquired a minimum of knowledge represented by the accomplishment of specific tutoring or course goals. Thus, goals can be established within the ITS that specify that minimum number of concepts to be learned.

In the beginning, all these goals are attached to the class goals_to_ accomplish, which belongs to the general class tutoring~oals. Another subclass of the latter one is goals_accomplished. An individual goal is an object which contains as sub-objects a set of concepts that need to be studied and used to accomplish this goal. These concepts are the same as in the student concept-map. An example of this structure is shown in Figure 5.

goals_accomplished goaC2

Figure 5. Goals in the ITS.


In order for a goal to be considered accomplished, everyone of its individual concepts has to reach an acquisition_level of use. Thus, in Figure 5, goaCl can only be transferred from goals_to_accomplish to goals_accomplished when all the concepts represented by its subobjects reach the level use, i.e. cm_l, cm_6, cm_12 , cm_l3, cm_l4, cm_l6, cm_l7 and cm_l8. Although not shown in full detail, in order for goaC2 to have been assigned to goals_accomplished, all the concepts in this goal must have reached the level use.

Finally, it is important to note that it is also possible to transfer a given goal from goals_accomplished to goals_to_accomplish if the acquisition_level of at least one of its concepts is changed to a level lower than use, as a result of the student failing to correctly solve a question involving the concepts in a given goal. This action can be considered as a correction of a previously erroneous assumption by the ITS and the need for the student to review the material.

2.2.4 Motivation Towards Achievement

Motivation, being an important factor in human performance, may be in general difficult to model. In this ITS we made a first approach to this problem by defining four motivation categories based on the experimental learning theory [8]. Taking into consideration the learning and evaluation environment in the ITS, two parameters were chosen to evaluate the student's motivation towards achievement: the number of goals accomplished and the number of problems correctly solved in a session.

The time that a session lasts and the complexity of the material studied has also to be taken into account. It is also important to establish average or typical time reference values for the student population to compare with the information about a particular student.

The structure used is composed of a class achievemencmotivation which consists of two subclasses: motivationyractice and motivation_ theory. The first one evaluates the student's level of motivation towards solving questions while the second one pertains to the motivation towards theoretical type of activities. The achievemencmotivation class has also an object called criteria to represent the information of the


parameters measured, i.e., n-$oals_accomplished and n-fJroblems_ solved. Each of these objects has three properties: number, value and average. The first one is an integer that stores the number of goals accomplished or problems solved by the student for each criterion. The second one is a Boolean variable that takes on the value true when the property number for the respective criterion is greater than the reference value for the population. The third property, i.e. average, is computed from the data collected from the use of the system by the student population and serves as the reference value previously mentioned. The described structure is shown in Figure 6.

value

level

value

level

criteria n-$oals_accomplished

Figure 6. Representation of the motivation towards achievement.

Notice also that achievement_motivation has two properties that are inherited by its two subclasses: value that assess whether the student is motivated towards theory or practice, and level which is a qualitative variable that measures the level of motivation. The assignment of values for these properties is carried out by rules that are explained below.

If the value properties of n-fJroblems_solved and n-$oals_accomplished are both true, then it is considered that the student has a notable motivation towards practice, in which case the level of motivation_ practice is set to high and the value of motivation_theory tojalse.


If the value of nJoals_accomplished is true and the value of nyroblems_solved is false then the level of motivation_theory is set to high, its value to true and the value of motivationyractice to false.

If the value of nJoals_accomplished is false and the value of nyroblems_solved is true, it is considered that there is a weak motivation towards practice and so the value of motivationyractice is set to true and its level to low. In this case, motivation_theory is set tofalse.

The last case occurs when the value of both motivation_theory and motivationyractice are false, in which case it is considered that the student does not present motivation towards achievement.

2.2.5 Student Actions

The last structure within the student model records the student activities during a session. This record keeps information about the student's activities at any time. The types of actions contemplated in this module are:

• studying a concept;

• studying an example; and

• solving a problem.

We define a class called actions which has three objects: concepC explanation, problem_solving and example_study. The class actions has two properties that are inherited by its subclasses: number and value.

The property number counts the number of concepts or examples studied, or the number of problems solved during a session, depending on the object that inherits it. The second one indicates the student current action and thus can only be true for one object at any time.

2.3 Instructional Model

The instructional model refers to the part of the ITS where the teaching strategies are handled. In this regard, there is still no clear or optimum solution to the dilemma between guided learning or learning by discovery. Many of the traditional and commercial tutors fall in the first category, where the user has to follow a particular order when studying


the material, as determined by the system designer. On the other extreme, systems where no guide is provided to the user are found. The fully guided learning may force the user to just study in the order, amount and depth that has been predetermined in the system. The second strategy, learning only by discovery, may lead to not accomplish significant goals within a reasonable amount of time thus leading to frustration. We have implemented an intermediate solution as explained in Section 2.3.1.

Another aspect of the teaching-learning process', particularly in the engineering area, is problem solving. The ITS, in the first version, involves two ways of providing opportunities for the student to solve problems. One is by generating theoretical questions for the student to answer and the other is by using the Neural Networks simulators available in the system. Since the second case, i.e. the use of simulators, has been widely used in this type of courses, whether with computerized tutors or not, I will not discuss it here any further, sufficing to say that the simulators can be accessed during the study of a given subject, whenever relevant.

For the theoretical questions, two alternatives were readily evident. One was to develop a data-base of questions and answers, and the other one was to investigate the automatic generation of questions from the subject knowledge-base. The first one has been extensively used, while the second one posed a challenging AI research problem. We opted for · the second one which will be discussed in Section 2.3.2.

2.3.1 Knowledge Navigation

We designed a concept-space metaphor which was based on the structure of the subject knowledge-base and that allowed an intermediate strategy between guided and discovery learning. As can be seen from the structure of this knowledge-base (see Section 2.1), concepts conform a hierarchical array where related concepts are directly connected. This direct connection implies that those concepts are close to each other, either in a father-child or child-father like relation. We chose to present this relationship by images of space objects (planets) where only the closer concepts can be visualized in the navigation window, thus being the only ones that can be visited,


although with no predetermined order. This strategy allowed certain freedom in the navigation but did not permit wandering completely unguided. The student can also visualize the part of the concept map structure where he/she is currently located and, by the use of sensitive maps, in both types of display, move around the topics to study.

Although at first sight these navigation methods may seem equivalent to a partial display of a table of contents, this metaphor may be extended to a game, where, for instance, the effect of gravity of a given concept can be surpassed when gaining enough energy (reach a given acquisition_level). As additional help, navigation instruments could be added to allow the user to know where in the concept-space he/she is located. This game would make studying an entertaining way for the student to receive guidance.

2.3.2 Question Generation

Several alternatives for generating questions and problems were contemplated. A first one was to generate analysis or design problems. Nevertheless, this is a very complex problem since there could exist multiple solution paths, multiple solutions, plus it would probably need natural and formal language processing which just adds more complexity. Thus, we decided to concentrate, for the time being, on question generation. Two approaches were contemplated: the first one was to design a problem-answer data-base and the second one was to automatically generate questions from the subject knowledge-base. It was decided to investigate the second approach [9].

Two basic types of questions were designed: descriptive, and explicative. It was also decided to generate multiple choice questions instead of open ones since again natural language processing would make the problem still more complex.

The descriptive questions refer to relations of the category or membership type that were described in Section 2.1. These types of relations permit the direct generation of questions from the following types of assertions:

{ class name} can be classified in {subclass list}

{object name} is composed of {sub-object list}


Thus, multiple choice questions could be easily generated from the knowledge-base by leaving a blank in one of the fields in brackets in the assertions above, and putting a list of options among which only one would be correct. An additional simplification is that we can use templates for the question generation using the assertions above. For example we could ask for the class name in the first assertion by leaving that field blank and filling in the field of subclass list. Another possibility would be to replace one of the elements in the subclass list by a blank. Similar questions can be carried out with the second type of assertion. The list of possible answers can be extracted randomly from the elements, classes and objects, in the knowledge-base. The difficulty level can be adjusted depending on the complexity associated with a given set of related concepts or by choosing very similar concept options for the answer choice list.

The explicative questions refer to functional relations in the knowledgebase, i.e. relations regarding methods in the object structure. In this type of relation the method relates its input with its output. To generate this type of questions the following assertion was used:

{ entity} is related to {entity} by means of {method name}

From this assertion three question templates were generated:

{ __ } is related to {entity} by means of {method name}

{entity} is related to { __ } by means of {method name}

What relates {entity} and {entity} ? __

Similarly to the first type of relation, the difficulty of the questions was adjusted based on the difficulty level associated with the concepts involved or the similarity between the multiple answers.

2.3.3 The User Interface

The user interface was designed based on the metaphor chosen to navigate through the subject material. As presented in Section 2.3.1, we chose a concept-space metaphor in which the current study material was represented as the planet being visited, while the related concepts, i.e., the concepts connected to the current one, were presented as


neighboring planets. Once a given concept was studied the user could visit anyone that appeared on the window.

In order to accommodate the navigation metaphor and all the course material, the display was divided in three areas as shown in Figure 7. The upper left area or main screen presented the text description of the material being studied. In the example shown, the perceptron appears in this window. The lower portion of the display was the navigation screen where the concept space could be seen. The planets appearing there are the only ones that can be visited. These two areas were always displayed. The third area or auxiliary screen was presented only for displaying figures, examples and in a few cases further explanations about the subject on the main screen. To open this area, text hyperlinks were inserted in the main screen for the user to click when desired. In Figure 7, the auxiliary screen is shown in the right frame.

The separation of the main screen from the auxiliary one has several advantages [10]. The first one is that the student can skip examples or figures at will. This could be the case when reviewing the material . about a concept. The second advantage was that both, the main and auxiliary screen had separate scrolling bars and thus, figures would remain within sight for long portions of the text on the main screen. The use of separate frames for text and figures allows the use of independent scrolling bars for text and images, thus avoiding either, repeating graphs within the text or main screen, or scrolling back and forth to observe a particular figure referred to in several parts of the text. In Figure 7, the independent scrolling bars for the main screen, presenting the textual explanation about the perceptron, and the auxiliary one, showing the topology graphs, can be observed. The auxiliary frame has a button in the lower part, not seen in the figure, to close it at will.

2.4 Remarks about the Architecture of the ITS

Our experience during the design and development of the system has revealed a number of questions that needs to be addressed for future developments. Individualized education presents important advantages from a short term-point of view since each student can cover the course material at his/her own pace. Nevertheless, this may present disadvantages when dealing with the current and probably future


requirements of education. We live in a society where minimum goal accomplishment is set with time bounds, thus time frames are a necessary education component. With this last requirement in mind, what is important is to set minimum goals associated with adequate time bounds and provide the student with motivation and tools to deepen in the study of the course material at will.

Figure 7. Organization of the information display.

Another issue that we started to explore in this approach is related to automatic question or problem generation. What has been clear from our experience is that this indeed is a challenge to AI involving research on complex problem solving strategies. This process involves the analysis of the problem and solution alternatives. The traditional use of expert systems is oriented to solve specific problems based on heuristics that a designer incorporate within the system. In the case of problem generation, one heuristic would not suffice since humans use a variety of problem solving strategies which would need to be analyzed by the system. If in addition to verifying whether a student solution is correct or not we want to provide a more elaborate form of feedback,


the problem additionally becomes one of synthesis with at least still two unresolved AI problems:

1. The system would have to process natural and formal language to analyze the student solution process.

2. The system would have to evaluate the optimality of the student's solution, suggest a better solution or recommend further analysis of a problem.

Italics have been purposely used in writing the words "optimality" and "better" since criteria for defining these terms would involve a great deal of subjectivity. Once defined those criteria, still other problems remain, i.e., how to define an optimality measure and how to apply it.

The case presented here is a coarse simplification of the generation of just one type of question, namely multiple choice. The way difficulty is graduated, on one hand, is still subjective and depends on the designer/teacher criteria. On the other hand, the choice of answer options which, as explained in Section 2.3.2, is part of the difficulty of the question is also unresolved waiting to define measures of similarity between the correct and the dummy answers. This makes the difficulty level erratic and unpredictable.

It is my view that problem and question generation and solution analysis demands a serious and continued research effort since it is one that affects not only the proposal of AI solutions to this problems but one that must generate a serious reflection upon our own student evaluation processes. This is by all means a task that demands an interdisciplinary approach involving education, psychology, cognitive science, and engineering specialists among others in addition to the teachers.

Education strategy design is another open research problem even if AI is not involved. There is no agreement as to what is the optimum strategy for engineering courses, much less when dealing with other disciplines and sciences. It will probably never be an agreement in this regard and what will be probably needed is the use of multiple strategies. Some students may perform better in a game-like strategy, yet others may need a more formal and traditional approach; some will need guidance and supervision, but others may demand full freedom.


I believe that there could be at least as many strategies as cognitive styles, so the problem remains in developing a basic set of them and ways to combine them.

Two issues of utmost importance are student-student interaction and student-teacher interaction. I view interaction as an extremely rich and powerful feedback strategy. The feedback in the student-teacher interaction is evident but peer collaboration may become yet a richer source of feedback and stimulate advanced learning stages involving argument construction and communication. While discussion in the classroom develops oral communication skills, discussion groups, using for instance Internet, in my own experience help to. develop written communication skills, really necessary in the modern working environment. The difference between these two types of communication is that while the first one requires an agile exchange of arguments, the second one allows the time to mature arguments and weight multiple possibilities that may arise when consulting information sources. Although the traditional essay type of homework presents a similar possibility, discussion groups provide a continuous and growing flow of ideas, which not necessarily stop at the end of the course term.

Self-evaluation is one aspect that in traditional approaches is not always explicitly considered. However, I have no doubts in my mind that this is the final and most important step in the learning feedback chain. The realization of this process leads not only to the mere admission of mistakes but to form criteria from which sound and innovative problem solutions may arise.

Finally, the classroom could not be ignored. If technology is to provide ever richer learning tools probably reducing the number of hours in a classroom, the direct human interaction is necessary. Although not fully understood, the affective processes that occur during personal meetings have a direct effect on learning. Nothing or very little will be gained by inserting technology in the teaching-learning environment if our teaching practices continue unchanged.

The reflections just summarized have led to the elaboration of a teaching-learning model which would have definite implications in the


design of tutoring tools, whether artificially intelligent or not. This model is presented in the next section.

3 The Steps in the Teaching-Learning Process

Before presenting a new architecture for an Intelligent Tutoring System, it is important to describe a teaching-learning model on which this design is based. In this section I propose a model which is the result of the discussions in our research group at Javeriana University, my reflections upon my own experience as a student and teacher, and some of the readings on the topic, a selection of which are cited here [5], [11], [12], [13], [14].

The proposed learning model can be divided in four main stages: perceptive; applicative; communicative and; productive. These four stages may not necessarily take place in that strict order and in fact overlap over the evolution of learning. Even more, many forward and backward paths exist between these stages as a consequence of the many processes that take place within each stage. A brief description of these stages follows.·

3.1 Perceptive Stage

In this stage a number of information reception and appropriation processes take place. This stage starts with the sensation of the environment, which is complemented by filtering and discrimination processes. The result of this combination is an inquiry about the subject of interest. Although it is very difficult to devise boundaries between these four processes, the first one refers to one of pure reception of sensorial information. Filtering is related to certain processes that extract features of the sensorial information that make it possible for discrimination to take place. It is evident that this filtering is timevarying and non-linear and, although not fully understood as yet, is, in the context of this chapter, responsible for the necessary signal preprocessing and adaptation in the early stages of the sensory paths. Discrimination refers to the process of obtaining a figure of interest


from its background. It is dependent on the characteristics of the external stimuli as well as the adaptation due to filtering. This discrimination is a complex task that involves different levels of pattern recognition of the early perceptive stages. An associated process is perceptual constancy which permits the identification of an object of interest under varied conditions, Therefor it implies more sophisticated forms of pattern recognition since it may involve mental reconstitution of the known object as it is assumed to be rather than the actual stimulus [15], [16].

The focus of much of the work on the development of computerized material for education has been oriented to the perceptive stage. Multimedia information intends to enrich the educational environment to facilitate the formation of percepts. Nevertheless, the development of material for higher education has in general maintained old forms to which visual animation and sound have been added. In my view, while much of the material developed for children places a great deal of attention to these processes, in the case of higher education, fomenting the development of more sophisticated manifestations of these same perceptive processes is lacking, not leaving room for the student to formulate his/her own inquiries. In courses of emerging fields such as Neural Networks, Fuzzy Systems and Genetic Algorithms, where theoretical work is still incipient, there is still a lot of room and an excellent opportunity to develop educational material to stimulate the student into investigating cause-effect relationships, and formulating hypothesis and theories to be experimentally verified. I will make some additional comments about this in Section 3.3 and Section 3.4.

Up to this point, the processes are focused mostly on reception of information. To appropriate the information specifically as knowledge, two concurrent and complementary processes must take place. One is the analysis of the received information which intends to divide it into manageable and elementary components. This analysis is intimately related to the process of integration of the information into the cognitive structure, where the main task is one of association of the new material with previous knowledge. Thus, analysis requires integration as a concurrent process and vice-versa to accomplish meaningful learning. These last two processes cannot strictly speaking be classified as purely perceptive and are essential for the applicative stage to set in.


3.2 Applicative Stage

The analysis and integration processes are accompanied by experimentation. It is important to clarify that experimentation here does not necessarily involves physical (tangible) objects but also mental entities and images, or a combination of both. The learner tests small and probably very elementary hypothesis through whose testing, new concepts are more formally integrated into theoretical frames consistent with previous experiences and knowledge. Thus, this stage may be assisted with examples and theoretical questions to help the student refine the analysis and integration processes. It is here where feedback starts to gain importance and may lead to return to the processes of the perceptive stage but with more acute discrimination and filtering capabilities. This is one process where the perceptive and applicative stage overlap.

Experimentation yields to a reflection process where heuristics and rules for problem solving and application of the new material start to emerge. It is probably appropriate at this point to pose theoretical questions, problems and cases for the student to solve. Inner conflicts may arise as a result of the necessary self-questioning that those problems may bring about. Thus, these problems must be oriented to promote this inner conflict to help the development of decision-making and problem-solving strategies to coherently integrate those heuristics and rules. The problems must look not only to present some practical applications for the new knowledge but to promote questioning and search for possible contradictions and inconsistencies between old and new concepts. Although it may be considered premature, material for this stage should start to stimulate the formulation of hypothesis, theories and examples looking for the settlement of the new knowledge. This formulation of more formal hypothesis, theories and examples will be the emphasis of the communicative and productive stages that will be discussed below.

3.3 Communicative Stage

The stages described so far concern mostly the interaction of the individual with the environment and the teacher. A next desirable step would be a more sociable one that help develop argumentation and


knowledge articulation skills. This can be promoted in a discussion environment centered on reaching consensus among the course participants. An example of a classroom experience of this sort is found in [7].

The discussion necessary to reach consensus promotes the development of complex inference mechanisms necessary for the argumentation and articulation processes. Several strategies may be used to strengthen the communicative and intellectual skills of each student. The most naive one is the articulation of an argument from one's point of view. Nevertheless, three more strategies of higher complexity need to be explicitly encouraged. One is to present arguments in favor of one's point of view from the others' perspective. The previous strategy is normally accompanied by the realization of one's weaknesses from the others' reference frame. Yet another strategy is to view the weaknesses of the arguments from one's point of view which implies the careful review of one's arguments. This last process sometimes may not even be promoted to take place, although it should occur early in the argumentation.

Articulation is a consequence of adequately expressing an argument and carefully listening to the others'. Activities in this stage should provide opportunities for the presentation of the students' own arguments and the explicit cross-explanation of the peers' ones. To complement the two previous types of activities, it is important to promote the examination and explicit presentation of counter-arguments both from the others' and one's perspective. The effort of this finding counterarguments provides deeper analysis and review of one's theories which would lead to either reformulate or find even stronger arguments which promote increasingly more complex inference mechanisms. At this stage, more elaborated theories should start to emerge which are fundamental for a sound scientific education.

I believe that much emphasis has been placed on only the first type of argumentation, i.e., from one's point of view, and very little, if any, on the other strategies. Even more, in many cases this stage is not reached and the student is not even provided with opportunities for discussion, focusing mainly on numerical answers to problems. This is particularly notorious in computer-based materials where, due to the current


limitations of technology, only right/wrong type of feedback is provided. Emerging communication facilities of computer networks provide clear opportunities for conducting this activities without space or time boundaries. Language is the last boundary that has to be overcome. Another advantage of discussion using computer mediated environments is that a record of the process is left for the teacher and the students to review over time, revealing the evolution of the communication skills of the students and a valuable review material. Also, writing is fostered as part of the scientific discipline.

3.4 Productive Stage

An advanced learning stage is the one where the students can produce well articulated arguments necessary in the elaboration of sound problem solutions and their accompanying technical documents, where resourcefulness and creativity are essential. Productivity refers here to a usable problem solution, be it theoretical, hardware or software, and the supporting documents that make it possible to replicate or further the work by the same authors or other individuals or teams, if and when· necessary. In the particular case of courses about Neural Networks, Fuzzy Systems, Genetic Algorithms and similar topics where procedures for problem solution are of an empirical nature, the only way for a student to adequately optimize and justify a solution is to pass through the four learning stages. However, innovative solutions to any type of problem also demand the same processes to achieve a high professional competence. The result ~hould be that students acquire a comprehensive vision of problems together with the multiple solution alternatives, and a systemic vision that help them to choose the solution that takes into consideration all the aspects and multiple constraints involved in real-life problem solving.

If this stage fully develops, the individual will probably return to all the previous stages inquiring for undiscovered information thus closing the loop that yields to a permanent learning attitude.


4 The Intelligent Tutoring System Architecture Revisited

The experience gained so far led us to redesign the ITS. The basic modules and functions remain, but their internal workings substantially change. Besides, the two last stages of the proposed learning model additionally demand communication services many of which are already available in computer networks but which were not included in the original system.

The processes that must take place in learning and the multitude of cognitive styles demand versatility that can not be easily obtained by centralized and fixed modules with only state-of-the-art AI. Also, for the reasons described in Section 2.4 and Section 3, the teachinglearning process demands teacher and classmates intervention. On this basis, a distributed collaborative-agent-based architecture is proposed in which the functions of each module are carried out by several of these agents. The advantage of this proposal is twofold. On one hand, each agent can be specialized in a given task or strategy thus simplifying its design and learning algorithms. On the other hand, the collaboration among AI agents and humans may produce emergent behaviors that make the system much more versatile and effective [17]. An interesting proposal in this direction can be found in [18].

Multiple knowledge-bases can be constructed from the multiple knowledge sources or experts in the subject. These may include either arbitration mechanisms incorporated in each one or the presence of external arbitrators that act when a solution or decision is required. This scheme would also facilitate multiple strategies that could intervene in the analysis of problem solutions by the students or present these multiple strategies to the student as part of the teaching strategy.

Similar arguments can be presented in regard to student modeling. Not only multiple agents can become specialized in the different modeling tasks but different modeling strategies could also be implemented. We could also envision different agents with different strategies modeling the variety of possible system users; but beyond the mere action of modeling for computerized tutoring, these agents may act as assistants


to the teachers in devising instructional methodologies for each individual or group of individuals. These functions could be grouped in what could be called the teaching management subsystem which would be part of the Computer Aided Teaching (CAT) component. The student models can also be a source of feedback information for the students self-evaluation thus becoming a learning management subsystem that would be a part of the Computer Aided Learning (CAL) component as well.

Instructional strategy agents can not only handle different strategies but, through learning and collaboration with other strategists and student modeling agents, can lead to emergent instructional methodologies not necessarily contemplated in the initial system design. Question and problem generation and solution analysis is another area where collaboration may bring us closer to feasible solutions to this very complex problem. In this case, collaboration would take place between agents in this module and the subject expert agents.

In the user interface module, agents can become learning brokers that customize the system to the needs of each user (including the teachers), and negotiate learning/teaching opportunities for them. In one case they will collaborate with strategy agents to assist in the interaction of the student with the ITS in the learning process and, in another case, will assist the teacher in designing or redesigning teaching strategies and instructional material. This customization evidently requires the cooperation of the brokers with the user modeling agents.

If we intend to help students reach the advanced stages of learning, i.e., the communicative and productive stages, the tutoring system must provide the communication tools necessary for the processes involved in these stages. An interesting possibility is the inclusion of computational tools in the ITS, such as expert system shells, for the students to build, compile and test their own knowledge-bases. The value of this option as a knowledge laboratory and feedback mechanism may be very significant. In fact, team homework could involve the design and development of cooperating agents as well.

However, many communication tools do not necessarily need AI and many such tools are already available for use through Internet. Such is


the case of discussion groups, video-conference and, in general, integrated collaborative-work environments [19]. It is through communication between teachers and students that the important peer feedback results. This feedback does not necessarily require an explicit evaluation and may also trigger self-evaluation processes through the analysis of one's own arguments against the others' view.

Technology may help improve the teaching-learning process but only if it is accompanied by changes in the real classroom} environment. The real classroom should be the place where direct human interaction with all the affective processes relevant to learning take place. Intelligent tutoring will arise not only as a result of the presence of Artificial Intelligence components in the CAT/CAL system but from the Naturally Intelligent design of all the teaching-learning activities.

Acknowledgments

I have had the pleasure of learning from many people who have dedicated their time, effort and patience to educating an engineer about the complexity of the teaching-learning process. I wish to thank Gloria Marciales for sharing with me so many gratifying and so many difficult times of this research. Thanks are due to Francisco Viveros, the Chairman of the Electronics Department, for giving me so much room for experimenting in my classes. Many of the ideas exposed in this chapter are the result of those enriching and enjoyable discussions with professors Mauricio Martinez and Jorge Sanchez, and the students in our Computer Assisted Education workshops. My gratitude is extended to Ana Cristina Miranda, and Andres Sicard who so frequently have accepted invitations to my office to rescue me out of my ignorance in education and human communication. I want to highlight the immense collaboration of my undergraduate students who have given so much of them in their final year projects to make many of my ideas and theirs a reality. As always, family has to deal with a sometimes taciturn husband and father whose mind may be wandering in the twilight zone. Thanks Rochis, Ana Marfa and Carolina.

1 Here I explicitly use the term real environment as opposed to the virtual environment provided by the ITS.


References

[1] Martin, J.J. and Odell, J. (1992), Principles of Object-Oriented Analysis and Design, Prentice Hall.

[2] Ikeda, M. (1993), "Non-monotonic inference: a formalization of student modeling," Proc. Of the 13th Intl. Joint Conference on Artificial Intelligence, Vol. 1, pp. 467-473.

[3] Fernandez-Ossa, M. del P. (1997), Modelamiento de Estudiantes para aplicaci6n en un Sistema Tutorial Inteligente, Undergraduate final project. Javeriana University, Santafe de Bogota, DC, Colombia.

[4] Verdejo, M.F. (1994), "Building a student model for an intelligent tutoring system. Student modeling: the key to individualized knowledge-based instruction," Proc. of the NATO Advance Research Workshop, pp. 127-146.

[5] Ausubel, D., Novak, J.D. and Hanesian, H. (1989), Psicologfa Educativa, Ed. Trillas, Mexico.

[6] Novak, J.D. and Gowing, D.B. (1988), Aprendiendo a Aprender. Ed. Martinez-Roca, Barcelona, Spain.

[7] Vega-Riveros, J.F., Marciales-Vivas, G.P. and Martinez-Melo, M. (1998), "Concept maps in engineering education: a case study," UICEE Global J. on Engineering Education, Vol. 2, No.1, pp. 21- . 27.

[8] Kolb, D.A. (1984), Experimental Learning: Experience on the Source of Learning and Development, Prentice Hall, Englewood Cliffs, NJ.

[9] Acero-Barrera, L.E. and Sanchez-Angel, H. (1997), Investigaci6n de Modelo de Conocimiento para la Generaci6n de Preguntas, Undergraduate final project, Javeriana University, Santafe de Bogota, DC, Colombia.


[10] Vega-Riveros, J.F., Borda-Medina, R.A. and Marciales-Vivas, G.P. (1995), Sistema Experto para Instrucci6n Asistida por Computador en un Area de la Obstetricia, Final technical report to Colciencias, Javeriana University, Santafe de Bogota, DC, Colombia.

[11] Wertsch, J.W. (1988), Vigotski y la Formaci6n Social de la Mente, Ed. Paidos, Barcelona, Spain.

[12] Gimeno, J. (1996), Comprender y Transformar la Enseiianza, Ed, Morata, Madrid, Spain.

[13] Bandura, A. (1987), Pensamiento y Acci6n: Fundamentos Sociales, Ed. Martinez-Roca, Barcelona, Spain.

[14] Bruner, J. (1978), Acci6n, Pensamiento y Lenguaje, Ed. Alianza, Madrid,Spain.

[15] Acosta-Vasquez, C.P. (1998), Sistema Tutor en Lecto-Escritura, Undergraduate Final Project, Javeriana University, Santafe de Bogota, DC, Colombia.

[16] Condemarin, M., Chadwick, M. and Milicic, N. (1981), Madurez Escolar, Ed. Andres Bello, U.K.

[17] Minar, N.M., Kramer, K.H. and Maes, P. (1998), "Cooperating Mobile Agents for Mapping Networks," Proc. of the Ft Hungarian National Conference on Agent Based Computing, John von Neumann Computer Society, Budapest, Hungary.

[18] Daily, M., Payton, D., Clifton, T., Weghorst, S. and Loftin, B. (1997), Human Computer Symbiotes; Cyberspace Entities for Active and Indirect Collaboration, Hughes Research Laboratories, Project summary, http://www.darpa.mil/ito/Summaries97 /

E3 59_0. html.

[19] Albornoz-Reina, R. and Casas, L.A. (1998), Ambiente Hipermedial Unificado para Aulas Virtuales, Undergraduate final project, Javeriana University, Santafe de Bogota, DC, Colombia.

CHAPTERS

TEACHING KNOWLEDGE MODELING AT THE GRADUATE LEVEL-

A CASE STUDY

v. Devediic FON - School of Business Administration

University of Belgrade, Belgrade Yugoslavia

A major characteristic of developments in the broad field of Artificial Intelligence (AI) during the 1990s has been an increasing integration of AI with other disciplines. A number of other computer science fields and technologies have been used in developing intelligent systems, starting from traditional information systems and databases, to modem distributed systems and the Internet. That fact is certainly reflected in curricula of different courses and tutorials on AI offered at universities, conferences, and research & development institutions.

This chapter surveys knowledge modeling techniques that are being taught within three graduate-level university courses!. The techniques presented here are a union of the techniques from all three curricula. They have been chosen for teaching because they have received most attention in recent years among developers of intelligent systems, AI practitioners and researchers. Simultaneously, most of these techniques have been successfully taught during several tutorials presented at recent international conferences.

! The courses are taught at The University of Belgrade, Yugoslavia. The course names are "Intelligent Information Systems" and "Expert Systems" (taught at FON - School of Business Administration), and "AI and Expert Systems" taught at the School of Electrical Engineering, Department of Computer Engineering.

136 V.Devedzic

The knowledge modeling techniques surveyed here are presented as parts of the corresponding courses and tutorials from two perspectives, theoretical and practical. Hence the first part of the chapter presents major theoretical and architectural concepts, design approaches, and research issues. The second part deals with several practical systems, applications, and ongoing projects that use and implement the techniques described in the first part. Through an intensive use of the Internet, the students are required to experiment with the techniques and systems that are publicly available. The laboratory assignments, exercises and projects the students have to do clearly reflect the techniques and tools they use, and also stress the advantages of using them.

1 Introduction There were several major research, development, and technological streams in computer science and engineering during the last decade. Some of them have had deep impact on development of intelligent systems. Together, they form a context within which modem intelligent systems should be discussed. Such streams include object-oriented software design, layered software architectures, development of hybrid systems, multimedia systems, and, of course, distributed systems and the Internet. The students who enroll for the courses mentioned above are always required to pass the corresponding other courses in order to get appropriate prerequisites. Of course, at least one undergraduatelevel course on AI is required as well. What follows in the next three subsections is a brief summary of introductory lectures that review such prerequisites.

1.1 Advances in Software Design and Architectures

Most of nowadays applications are object-oriented, and so are intelligent systems. Object-orientation has been implicitly present in AI since its early days, i.e., since the introduction of frames by Marvin Minsky in early 1970s, and in a certain sense has preceded popularity of object-oriented systems in other fields. However, explicit objectoriented design of intelligent systems in the software engineering sense has become popular only in the late eighties [40], [49], [52], [53], [68].

Teaching Knowledge Modeling at the Graduate Level- a Case Study 137

For the sake of completeness, this paragraph presents a brief overview of the basic concepts of object-oriented software systems in generatz Objects are structured units of code that have their state and behavior. A software object maintains its state in variables, or attributes, and implements its behavior with methods, or method procedures. Typically, the object's variables make up the center or nucleus of the object. Methods (at least some of them) surround and hide the object's nucleus from other objects in the program. Such methods make up the object's interface: objects in the program communicate only through their interfaces, i.e., by calling each other's interface methods. This communication is often interpreted as sending messages between objects. Packaging an object's variables within the protective custody of its methods is called encapsulation. Typically, encapsulation is used to hide unimportant implementation details from other objects. Each object is an instance of its class. An object's class is its abstract data type, i.e., the prototype that defines the variables and methods common to all objects of a certain kind.

In any object-oriented program, classes are organized in class· hierarchies. In a class hierarchy, a class inherits state and behavior from its superclass, and usually has some additional attributes and methods of its own. Inheritance provides a powerful and natural mechanism for organizing and structuring software programs. The practice of software engineering shows that . such an organization of object-oriented programs increases their modularity and reusability. Finally, polymorphism refers to a c~ass' possibility to implement a certain method it inherits from a superclass in a different way. This makes possible to have several subclasses with their own versions of a common method, declared in their superclass.

Architectures of object-oriented software systems have gradually evolved during 1990s from two-tier ones, to three-tier and multi-tier ones [2], [11], [67], [71], [72]. Generally, in a layered architecture there are sets of classes on the same level of abstraction, called layers. Layers are "horizontal," in the sense that there is no dependency relationship among the classes in the same layer. Layered architectures

2 We assume the reader's familiarity with object-oriented systems. They are covered in literally hundreds of books and other publications. The brief coverage in this Section is based on a recent book on object-oriented programming language Java [5].

138 V. Devedzi6

can also have "vertical" threads across their layers. Threads consist of classes at different layers implementing the same functionality, related to each other by the dependency relationship [67].

Due to the fast growth of importance of distributed computing, component-based software has become a major trend in software design and architecture [65], [76], [78]. Component-based software systems are designed from application elements constructed independently by different developers, using different languages, tools, and computing platforms [76]. Such a design process is based on the assembly of pretested, reusable, interoperable, independent software components. Interoperability and integration of such components within distributed heterogeneous environments is provided by designing components with standard communication interfaces, infrastructures, and architectures, such as CORBA [65], [78]. Communication of such compliant components with non-compliant, legacy code and systems is provided through wrappers, which implement mechanisms for launching the non-compliant applications and translation of different representations of data, which is passed between them. Thus component-based software systems enable integration of diverse object-based and other applications within distributed heterogeneous environments.

Speaking of recent advances, developments and trends in software design and architecture, one must not skip the phenomenon of Java [1], [5]. Its platform-independent bytecodes, which are interpreted by any Java interpreter (an implementation the Java virtual machine), along with its distributed nature, portability, and the possibility of easy migration of programs (applets) from servers to client machines, have made it extremely attractive for software development. It is supported by the Java API, which is a large collection of ready-made software components that provide many useful capabilities. They include graphical user interface widgets, networking, uniform access to a wide range of relational databases, and many more.

Finally, two important issues related to design of software systems feature late 1990s: design patterns and Unified Modeling Language. Design patterns are descriptions of successful solutions of common problems that occur in software design over and over again when producing applications in a particular context [11], [24], [27]. It is


important to understand that design patterns are not invented. They are discovered from experience in building practical systems. There are catalogues of design patterns, in which all of the patterns are described using some prescribed template. Using design patterns increases efficiency of the design process, and makes resulting systems more reusable, more flexible, and more robust.

The Unified Modeling Language (UML) is nowadays accepted as a language for specifying, constructing, visualizing, and documenting the artifacts of a software-intensive system [4], [27]. It is a visually expressive language for modeling object-oriented systems regardless of any specific programming language and development process. It contains a rich notation for expressing software design issues through class diagrams, use cases, state transition diagrams, component diagrams, and many more.

1.2 Hybrid Intelligent Systems and Multimedia

Hybrid intelligent systems combine two or more individual technologies for building intelligent systems [23], [56]. The initial work in this area, started in the late 1980s, addressed the integration of neural networks and expert systems, or the use of fuzzy logic and expert systems (see [56] for examples). More recently, other intelligent technologies, like case-based reasoning and genetic algorithms, have become also attractive for using them in hybrid systems. The individual technologies represent the various aspects of human intelligence that are necessary for enhancing decision making in computing systems. However, all individual technologies have their constraints and limitations. Having the possibility to put two or more of them together in a hybrid system increases the system capabilities and performance, and also leads to a better understanding of human cognition.

Several models are used for integrating intelligent systems. The one used in [23] is architecture-centered. It classifies hybrid architectures into the following four categories:

• Combination - typical hybrid architecture of this kind is a sequential combination of neural networks and rule- or fuzzy rule-based systems.

140 V. Devedzi6

• Integration - this architecture usually uses three or more individual technologies and introduces some hierarchy among the individual subsystems. For example, one subsystem may be dominant and may distribute tasks to other subsystems.

• Fusion - a tight-coupling and merging architecture, usually based on the strong mathematical optimization capability of genetic algorithms and neural networks. When other techniques incorporate these features, the learning efficiency of the resulting system is increased.

• Association - the architecture that includes different individual technologies, interchanging knowledge and facts on a pair-wise basis.

Interactive, computerized multimedia applications have grown rapidly during the 1990s. The multimedia technology has brought a new dimension to all computer systems, and intelligent systems are not an exception. By integrating multimedia, intelligent technologies have managed to create a new generation of intelligent interfaces [66]. Also, the use of multimedia as a vehicle for knowledge presentation provides a powerful interface with a full range of audio/visual features to express knowledge more completely [25]. This greatly enhances the capabilities of traditional intelligent systems, which force users to translate their models of problem attributes and characteristics into text form and then convert back to their own conceptualizations in interpreting the systems' results. Intelligent multimedia systems are applied in many domains, including simulation, data storage and information retrieval, feature identification, fault diagnosis and troubleshooting, education, computer-based instruction and training, entertainment, etc. (see [66] for details).

1.3 AI on the Internet

The increase of information technology capabilities due to the development of the Internet and the World Wide Web has made possible to develop more powerful and often widely dispersed intelligent systems. Although these new systems often merely implement in a new way some well-established AI techniques, such as planning, search, and problem solving, they definitely have their own

Teaching Know/edge Modeling at the Graduate Leve/- a Case Study 141

identity. Technologies such as intelligent agents, virtual organizations, and knowledge management (to name but a few) are tightly coupled with the Internet, and have opened new fields for application and practical use of AI during the last decade.

Intelligent agents are computer programs that function autonomously or semi autonomously in communication with other computational agents, programs, or human agents [45], [58], [69]. Agents typically communicate by exchanging messages represented in a standard format and using a standard communication language. They have the ability to identify, search, and collect resources on the Internet, to optimize the use of resources, and to perform independently and rapidly under changing conditions.

In a virtual organization, complementary resources existing in a number of cooperating companies are left in place, but are integrated to support a particular product effort [45]. They use the Internet selectively in order to create or assemble productive resources quickly, frequently and concurrently. Such productive resources include research, manufacturing, design, business, learning and training, etc. AI has been successfully used in a number of virtual organizations, including virtual laboratories, virtual office systems, concurrent engineering projects, virtual classrooms, and virtual environments for training.

Knowledge management is the process of converting knowledge from the sources available to an organization and connecting people with that knowledge [46]. Knowledge management facilitates creation, access, and reuse of knowledge, typically using advanced technology, such as World Wide Web, Lotus Notes, the Internet, and intranets. Knowledgemanagement systems contain numerous knowledge bases, with both numeric and qualitative data and knowledge (e.g., searchable Web pages). Important AI developments, such as intelligent agents, knowledge discovery in databases, and ontologies, are also important parts of knowledge-management systems.

142 V. Devedzi6

2 Concepts, Theory, Approaches, and Research

Along with the general major trends in computer engineering covered above, research and modeling efforts· in some specific fields have helped to lay the ground for development of advanced practical intelligent systems. These fields include knowledge representation, knowledge processing, intelligent databases, and distributed AI. Teaching most important topics from these fields within the graduatelevel courses on AI has the common objectives of introducing the students to advanced modeling techniques and preparing them for using the tools that enable development of practical systems. Simultaneously, the interdisciplinary nature of these fields helps them reveal how AI is getting more and more integrated with other disciplines.

2.1 Knowledge Representation

The specific objective of teaching knowledge representation at the graduate level courses mentioned above is to show the students how to apply their background knowledge in programming, software design, software architectures, distributed systems, databases and information systems to building intelligent systems. Such an objective has been put because of the professional profile they acquire during their studies at the corresponding departments. From that perspective, it has been decided to stress object-orientation and knowledge sharing efforts in teaching advanced techniques of know ledge representation.

Object-oriented approach to modeling and representing knowledge in intelligent systems supports organization and use of knowledge in the form of objects of different kinds. It also makes possible to treat knowledge and data stlUctures in a unified way. Moreover, treating knowledge and data in a unified way is an important step towards providing knowledge sharing, reuse and interoperability between different knowledge-based systems, databases and other systems. Again, our graduate students are required to have sufficient background in the fields of databases, programming, and distributed systems.


2.1.1 Object-Oriented Knowledge Modeling

Conceptually, we can think of the knowledge base of an intelligent system as of a large, complex, aggregated object [17]. Its constituent parts can contain knowledge of different kinds. Some of them are represented in Figure 1. D_knowledge stands for domain knowledge, and it refers to the application domain facts, theories, and heuristics. C_knowledge stands for control knowledge. It describes the system's problem solving strategies and its functional model, and is more or less domain independent. E_knowledge denotes explanatory knowledge. It defines the contents of explanations and justifications of the system's reasoning process, as well as the way they are generated. System knowledge (the S_knowledge part) describes the contents and structure of the knowledge base, as well as pointers to some useful programs, which should be "at hand" during the knowledge base building process, since they can provide valuable information. Examples of such programs are various application and simulation programs, encyclopedias, etc. In some intelligent systems, system knowledge also defines user models and strategies for the system's communication with its users.

Figure 1. Knowledge base contents.

Apart from these four kinds of knowledge there can also be some other specific kinds of knowledge in the knowledge base (e.g., knowledge specific to truth maintenance, or knowledge specific to the capabilities of communication and integration with other systems).

All kinds of knowledge in the knowledge base are represented using one or more knowledge representation techniques. These techniques use different and interrelated knowledge elements. The knowledge elements range from primitives (including different forms of O-A-V

144 V.Devedii6

triplets, frames, rules, logical expressions, and procedures), to complex elements. Complex knowledge elements are represented using either simple aggregations of knowledge primitives, or conceptually different techniques based on knowledge primitives and their combinations. Representing complex knowledge elements by means of primitive ones is often done applying well-known design patterns. In designing the entire knowledge base, all knowledge elements can be classified according to their type (e.g., rules, frames, or procedures) and grouped into several homogenous collections. Each such collection contains knowledge elements of the same type.

There are numerous examples of using such an approach in modeling, developing and maintaining knowledge bases. An example from the manufacturing domain is described in [17]. The example shows how object-oriented knowledge modeling is used in a real-time expert system that controls an important part of the cement production process. Huarng and Chen have proposed the Object Knowledge Canonical Form, a translation scheme that facilitates knowledge translation among different knowledge representation formats [34]. A comprehensive framework for object-oriented fuzzy knowledge representation is proposed in [47].

Object-oriented knowledge modeling is used extensively throughout all of our courses. The students are required to prepare and complete small projects involving building knowledge bases of different kinds. In all of the projects, explicit object-oriented design using UML is required. The projects may include development and programming of specific knowledge elements. The multilayered orthogonal scheme of knowledge elements defined in [17] is used as a common reference in such projects.

2.1.2 Unifying Knowledge, Information and Data

An important recent theoretical contribution to the field of knowledge modeling and representation is a formalism for modeling knowledgebased systems developed by John Debenham [13], [14], [15]. Debenham's formalism lets us unify contents of different knowledge bases and databases and treat all data, information, and knowledge "things" in them in much the same way.


The formalism is object-oriented, in the sense that it defines knowledge, information, and data items as objects. Roughly, we may think of data objects as of records in a database. Information objects are like relations in a relational database, while knowledge objects are much like If-Then rules. Generally, an object is a quadruple A(K,E,F,G), where A is the object's name, K is the set of the object's components (component objects), E denotes the object's semantics, F refers to the object's value constraints, and G represents the object's set constraints. The meaning of these terms becomes clear if we turn to the representation used in this formalism, which is that of Horn clause logic, Lambda calculus and item schemata.

part part/cost-price 0 part J cost-price x x I y

is-a[x:part-no.] costs(x,y) 100<x < 300 x < 199 ~ Y < 300

>500 V I ------- I 0

(a) (b)

Figure 2. (a) Simple data object, (b) Compound information object [14].

For example, consider the item schemata in Figure 2. Simple data objects have no components, i.e., for such an object D we have: D(0,E,F,G). The simple data object part in Figure 2.a has no components, and its semantics says that it must have a part number. Its value constraints say that the part number must be between 100 and 300, and its only set constraint defines that there must be more than 500 pieces of that kind. Compound objects inherit the properties (including constraints) of their component objects. The value constraint of the part/cost-price compound information object in Figure 2.b inherits properties and constraints from its components, part and cost-price. Its specific set constraint says that knowing the part number, one can always say what is its cost-price.

Figure 3 represents the item schema of the compound [part/sale-price, part/cost-price, mark-up} knowledge object. The object's semantics (which is essentially a rule) shows how to compute the part's sale price, given its cost price and the mark-up factor. The value constraint says that the sale price must be higher than the cost price. The set constraints

146 v. Devedzi6

mean that it is possible to compute the value of any of the three components given the other two values.

[part/sale-price, part/cost-price, mark-up] part/sale-price I part/cost-price I mark-up

(x, w) I (x, y) I z

~ (w=zxy) ~ (w>y)

\:j I \:j I --------------- I 0

0 I ---------------

------- I 0 I -------

FIgure 3. Compound knowledge object [14].

A single rule of composition is defined for objects, regardless of their type. We may have:

Objects: A(KA, EA, FA, GA), B(KB, EB, FB, GB) Components: KA = {AI, A2, ••• , An}, KB = {BI, B2, ••• , Bm} The set of k components common to both A and B: C (C can be empty) A' - {Ad rearranged, last k components are those in C B I - {Bi} rearranged, first k components are those in C A" - {Ai I Ai E C} (A" has n-k components) B" - {Bi I Bi E C} (B" has m-kcomponents) Components of A: a permutation of the ordered set of components (A", C) Components of B: a permutation ofthe ordered set of components (C, B") 1t - the permutation (denoting both of the above permutations) D - the set of components (A", C, B")

The rule of composition for the above objects is defined as the join of A andB on C:

A ®cB (D,

For example:

Axyze[EA(1t(x,y» t\ EB(1t(y,z»]e, Axyze[FA(1t(x,y» t\ FB(1t(y,z»]e, Axyze[GA(1t(x,y» t\ GB(1t(y,z»]e)

[cost-price, tax] ®{cost-pricej part/cost-price = part/cost-price/tax


This rule of composition is fairly general, and includes many wellknown special cases. For example, if A and B are two information objects with a shared domain C, then A ®cB is their conventional join on that domain. If C is empty, A ®cB is the Cartesian product of A and B. Table 1 shows the types of objects obtained as the results of applying the rule of composition to objects of different original types.

Table 1. Results of applying the rule of composition.

data I information I knowledge data data information knowledge information information information information knowledge knowledge information knowle<!ge

We use Debenham's formalism in our courses primarily for the purposes of revealing the students the common nature of knowledge, information and data. As lab exercises, the students are required to transform parts of existing databases and knowledge bases and combine them according to the rule of composition and Table 1. An interesting observation from their work is that they are reluctant to using the formalism and are even lightly confused in the beginning. A common complaint is that it looks rather artificial and constrained. However, over time they gradually learn how to appreciate the approach. Most of them then voluntarily use the approach in other projects as well.

2.2 Ontologies and Knowledge Sharing

In building knowledge-based systems, developers usually construct new knowledge bases from scratch. It is a difficult and time-consuming process. Moreover, it is usually hard to share knowledge encoded in such knowledge bases among different knowledge-based systems. There are several reasons for that. First, there is a large diversity and heterogeneity of knowledge representation formalisms. Even within a single family of knowledge representation formalisms, it can be difficult to share knowledge across systems [62]. Also, in order to provide knowledge sharing and reuse across multiple knowledge bases, we need standard protocols that provide interoperability between different knowledge-based systems and other, conventional software, such as databases. Finally, even if the other problems are eliminated,

148 V.Devedzic

there is still an important barrier to knowledge sharing at a higher, knowledge level. That is, there is often a higher-level modeling, taxonomical, and terminological mismatch of different systems, even if they belong to the same application domain.

Research in the growing field of ontological engineering [10], [22], [57], [64] offers a firm basis for solving such problems. The main idea is to establish standard models, taxonomies, vocabularies and domain terminologies, and use them to develop appropriate knowledge and reasoning modules. Such modules would then act as reusable components that can be used for assembling knowledge-based systems (instead of building them from scratch). The new systems would interoperate with existing ones, sharing their declarative knowledge, reasoning services, and problem-solving techniques [8].

An ontology is a specification of a conceptualization [28]. In other words, it is an explicit specification of some topic, or a formal and declarative representation of some subject area [20], [29]. It specifies concepts to be used for expressing knowledge in that subject area. This knowledge encompasses types of entities, attributes and properties, relations and functions, as well as various constraints. The ontology provides vocabulary (or names) for referring to the terms in that subject area, and the logical statements that describe what the terms are, how they are related to each other, and how they can or cannot be related to each other. Ontologies also provide rules for combining terms and relations to define extensions to the vocabulary, as well as the problem semantics independent of reader and context.

The purpose of ontologies is to enable knowledge sharing and reuse among knowledge based-systems and agents. Ontologies describe the concepts and relationships that can exist for an agent or a community of agents. Each such a description is like a formal specification of a program. An ontological commitment is an agreement to use the vocabulary established in an ontology in a way that is consistent (but not complete) W.r.t. the theory specified by an ontology [28]. It is also an agreement to build agents that commit to ontologies and to design ontologies so that we can share knowledge with and among the agents we build. In fact, a common ontology defines the vocabulary with which queries and assertions are exchanged among agents. Ontologies


state axioms that do constrain the possible interpretations for the defined terms.

An example of an ontology is shown in Figure 4 (adapted from [28]). It shows a partial vocabulary of the Frame ontology, and is specified using the language called KIF (Knowledge Interchange Fonnat) [26]. It is an extended version of first-order predicate calculus, and essentially an intermediary language for translating different knowledge representation languages.

class relation (?relation) class function (?function) class class (?class) relation instance-of (?individual ?class) function all-instances (?class) :-> ?set-of-instances function one-of (@instances) :-> ?class relation subclass-of (?child-class ?parent -class) relation superclass-of (?parent-class ?child-class) relation sub relation-of (?child-relation ?parent-relation) relation direct-instance-of (?individual ?class) relation direct-subclass-of (?child-class ?parent -class)

Figure 4. A part ofthe Frame ontology.

The set of objects that can be represented when the knowledge of a domain is represented in an ontology using a declarative formalism is called the universe of discourse [29]. For example, classes, relations, functions, and other objects represent the universe of discourse in the ontology of a computer program. If we think of ontologies in objectoriented way, then one possible interpretation of ontologies is that they provide taxonomic hierarchies of classes and the subsumption relation. For example, in the hierarchy of the Lesson ontology, we may have the Topic ontology, the Objective ontology, and the Pedagogical-point ontology.

Ontologies make possible to define an infrastructure for integrating intelligent systems at the knowledge level. The knowledge level is independent of particular implementations. Ontologies are especially useful in the following broad areas [20]:

150 V. Devedzic

• collaboration - ontologies provide knowledge sharing among the members of interdisciplinary teams and agent-to-agent communication;

• interoperation - ontologies facilitate information integration, especially in distributed applications;

• education - ontologies can be used as a publication medium and a source of reference;

• modeling - ontologies represent reusable building blocks in modeling systems at the knowledge level.

Ontologies themselves offer plenty of possibilities for using them in teaching knowledge modeling. Graduate students grasp the notion of ontologies easily and show a lot of interest in them. They quickly review examples of ontologies available on the Web and feel ready to try to build some other example ontologies under supervised guidance. We have found countless examples of concepts that the students would like to build ontologies for. The most interesting one is the student model ontology, which has become the major theme of one Ph.D. thesis [64]. Once it gets completed, it is supposed to be used for developing and sharing student models for different intelligent tutoring systems. The student model ontology is rather complex and several graduate students already participate in its development.

2.3 Knowledge Processing

We can define a knowledge processor as an abstract mechanism which, starting from a set of given facts and a set of given knowledge elements produces some changes in the set of facts. Concrete examples of knowledge processors include (but are not limited to) blackboard control mechanisms, heuristic classifiers, rule-based inference engines, pattern-matchers, and at the lowest level even a single neuron of a neural network.

The main goal of teaching knowledge processing within our graduatelevel courses is to discuss how different kinds of knowledge processors can be designed starting from well-established software design practices. By such practices we mean multilayered architectures and design patterns in the first place. Another goal is to exemplify an


important line of developments in AI: integrating traditional programming languages with knowledge-processing capabilities.

2.3.1 The Knowledge Processor Pattern

Many knowledge-processing mechanisms have a common global architecture. That architecture is the basis of the Knowledge processor pattern, whose structure is shown in Figure 5, using the UML notation [17].

1 1 Knowledge .-

lnteiface Facts ,.- 1 processor 1

-'1

t Concrete

knowledge ~

Knowledge

processor -;>

Figure 5. The Knowledge processor pattern.

The meanings of the participants in the Knowledge processor pattern are as follows. Knowledge processor defines an interface for using the knowledge from the Knowledge object, as well as for examining and updating facts in the Facts object. Knowledge and Facts objects are generally aggregates of different collections of knowledge elements. Parameterizing collections of knowledge elements, we can actually put collections of rules, frames, etc. in the Knowledge object, thus making it represent a knowledge base. By analogy, we can also make the Facts object represent a working memory, containing collections of working memory elements, rule and frame instantiations, etc. Knowledge processor contains also a pointer to an instantiation of the abstract Interface class. Developers can subclass Interface in order to implement an application-specific interface to a particular knowledge processor. Concrete Knowledge Processor is either a knowledge processor of a specific well-known type (see the example below), or can be specifically defined by the application designer.

152 V.Devedzi6

As an example of using the Knowledge processor design pattern, Figure 5 shows how forward-chaining inference engine for rule-based expert systems can be designed. Forward-chaining reasoning process is composed of three distinct operations: pattern matching, conflict resolution, and rule firing. There are three appropriate classes for the corresponding agents composing the Forward-chaining inference engine class. The Interface, Knowledge and Facts classes are omitted from Figure 6 for simplicity.

Knowledge Conflict processor resolver

....

1 '"

f Pattern Forward-chaining

t- 1 Rule- fIring

matcher v 1 inference engine 1 .... processor ..... 1 1 '"

Figure 6. Forward-chaining using the Knowledge processor pattern.

There are numerous other successful designs in which it is possible to recognize the use of the Knowledge processor pattern. Some examples can be found in [13], [33], [36], [42], [49], and [79]. The students enrolled for our courses are required to experiment with the Knowledge processor pattern in the context of the examples provided in the literature. The exercises have two levels. At the first level, the students are required to match the architecture of a given specific knowledge processor from the literature to the components of the Knowledge processor pattern. At that level, they are also required to analyze the specific example in the context of applicability of other design patterns from the catalogues available on the Web. At the second level, the students are required to apply the Knowledge processor pattern in some actual design exercises.


2.3.2 Embedding Intelligent Reasoning into Traditional Applications

There are several ways of embedding knowledge and intelligent reasoning into traditional applications. In the lectures we teach to graduate students we present a few such ways interchangeably, depending on the tools that we need to promote and use for that particular course. The approach shown here is based on extending traditional object-oriented languages and integrating sets, rules, and data in an object-oriented environment. Such an environment then gets features of a knowledge-base management system (KBMS).

Figure 7 shows the multilayered architecture of the Tanguy KBMS [12]. Tanguy extends the C++ language to cope with permanent object storage, production rules (data-driven programming), and uniform setoriented interfaces.

Programmer's interface: Tanguy-Extended C++

Run-time system

Classes/operations/methods (application-program specific)

Operatio n base

Inference engine

Knowledge-base manager

Permanent knowledge base

Figure 7. The Tanguy architecture.

Conceptual schema

Permanent knowledge base stores general objects, their properties, and relationships. They are shared by multiple applications. Conceptual schema specifies the classes of objects in the knowledge base. When an application stops working with the knowledge base, the modified operation-, data-, and rule-bases are saved permanently. In Tanguy's terms, the set of current instances of some class is called the extension of the class. The extensions of permanent classes are stored in the

154 V. Devedzic

permanent knowledge base. Extensions of temporary classes are not stored. Application programs automatically inherit operations to access objects in the knowledge base.

The application programmer sees Tanguy as a C++ programming environment extended by a rule-processing inference engine, a DBMS, and a KBMS. The Tanguy run-time system uses the inference engine in order to provide intelligent reasoning and communicate with the KBMS.

The rules in the knowledge base are described in an extended C++ syntax as follows:

RULE <rule name> [MATCH-VARIABLES: <C++ variable definitions> CONTEXT: <context clause> CONDITION: <boolean C++ message> ACTION: <sequence of C++ messages> RULE-TYPE: [BEFORE] or AFTER [CONTINUE] or RETURN [<result>] [PRIORITY: <integer>]

Each rule is fired only in a particular context (defined by the rule's special-purpose context clause. The context clause restricts the rule's applicability only to specific messages (messages sent to instances of a particular class, messages calling a particular operation, and messages using a particular parameter). The rule syntax also generalises the classical forward-chaining rule processing. A rule is fired before or after a message is executed. If an application program intends to execute a particular C++ message, Tanguy first computes and processes the set of active BEFORE-rules in the forward-chaining manner. Then it executes the message, unless a RETURN-rule has fired. RETURNrules return control to the sender of the message, while CONTINUErules continue rule-based inference. Finally, Tanguy computes and processes the set of active AFTER-rules, also using forward-chaining. Only active rules participate in the inference process. Operators activate and passivate change a rule's status.


Another well-known example of integrating C++ with reasoning capabilities is Rete++, the environment that embeds pattern matching based on the famous Rete algorithm into c++ [30]. Note that both Tanguy and Rete++ are true extensions of C++. They are different from products that just implement some knowledge-processing capability in the form of a class library. For example, the well-known MLC++ library, developed at Stanford University, is a library of C++ classes and tools for supervised machine learning, but not the extension of C++ as a language [41].

The projects that our graduate students work on as parts of the courses we teach involve different kinds of embedding intelligent reasoning into traditional (and working) applications. So far we have designed small exercise projects concerning embedding planning modules into traditional information systems, embedding rule-based reasoning into conventional databases (this is combined with using the Debenham's formalism described above), and embedding fuzzy logic into decisionmaking applications.

2.4 Intelligent Databases

Due to the strong background of our students in the field of databases and information systems, one of the central themes in our graduatelevel courses on AI is the extension of traditional databases using intelligent techniques. One way to make data access and manipulation in large, complex databases simple and more efficient is to integrate database management systems with knowledge processing capabilities. In that way, database designers create intelligent databases. They are featured by query optimization, intelligent search, knowledge-based navigation through huge amounts of raw data, automatic translation of higher-level (natural language) queries into sequences of SQL queries, and the possibility of making automatic discoveries [6], [63], [77].

Intelligent databases have evolved through merging of several technologies, as it is shown in Figure 8. The resulting top-level, threetier architecture of an intelligent database, Figure 9, has three levels: high-level tools, high-level user-interface, and intelligent database engine. High-level tools perform automated knowledge discovery from data, intelligent search, and data quality and integrity control [63]. The

156 V.Devedzic

users directly interact with the high-level user inteiface. It creates the model of the task and database environment. The intelligent database engine is the system's base level. It incorporates a model for a deductive object-oriented representation of multimedia information that can be expressed and operated in several ways.

Expert systems

Hypermedia

Intelligent databases

Object orientation

Figure 8. The notion of intelligent databases.

High-level tools

I High-level user interface

Intelligent database engine

Figure 9. Top-level architecture of an intelligent database.

All three levels in this architecture are of interest to our students. At the topmost level, we teach knowledge discovery in databases as a major topic within one of our graduate-level courses. The topic is trendy, and it also attracts interest of managers from industrial organizations. After learning about knowledge discovery as a process [19], the students are required to master a public-domain tool for knowledge discovery in databases of their own choice and apply it to a simple real-world problem that we design. At the middle level, the students get the notion of generic tasks for intelligent reasoning and their application to interfaces with DBMSs as well as the basics of user modeling. Finally,


at the bottom level the students learn about intelligent query optimization techniques and combining DBMSs with specific intelligent technologies. The following paragraphs illustrate these points.

A typical concrete way of merging database technology with intelligent systems is that of coupling or integration of DBMSs and intelligent systems [35], [70], [75]. Coupling is a weaker merger. It does not guarantee consistency of rules and data (a database update may not go through the intelligent system). It also raises difficulties when trying to bring the database into conformity with new rules in the knowledge base. In spite of that, there are many successful commercial applications that use coupling of intelligent systems and databases. In such applications the intelligent systems play the role of intelligent front-ends to databases. On the other hand, integration of DBMSs and intelligent systems guarantees consistency of rules and data, because a single DBMS administers both kinds of objects. Moreover, integration usually brings better performance then mere coupling. There are very well known examples of integration of rules and data in commercial DBMSs, e.g., INGRES and Sybase [63].

As an illustration of merging intelligent systems with DBMSs, consider the integration of expert systems and database systems. Instances of such an integration fall roughly into four categories [81]:

• Enhanced database systems. This can be achieved through organizing deductive databases, incorporating more semantic integrity constraints, adding rules into databases or mining rules from them, and putting an expert system component into a database system.

• Enhanced expert systems. Systems that integrate logic programming and reasoning of expert systems with databases belong to this category.

• Coupling of existing expert systems and database systems. Processing and control can be more or less equally distributed to both systems. It can be also concentrated into one of the two systems, thus making one system dominant. Finally, distributed processing can be controlled by an independent subsystem (a supervisor).

158 V.Devediic

• Expert database systems. The main issue in developing such systems is modeling, representing, and integrating knowledge and data in a uniform way, using object-oriented approach (see Section 2.1.2).

Finally, how do we actually model and represent knowledge in today's intelligent databases? In most cases, object-oriented technology is used and rules are incorporated into intelligent databases as objects. This is sometimes called rule subsumption in object bases [59]. Rules as objects can be also added to an object-oriented DBMS [7]. A rule as a composite object is an aggregate of rule-part objects (If- and Thenclauses). For a detailed design of classes for representing rules as objects, see [17].

2.5 Distributed Intelligent Systems

Distributed AI systems are concerned with the interactions of groups of intelligent agents that cooperate when solving complex problems [16]. Distributed problem solving, as a subfield of AI, deals with strategies by which the decomposition and coordination of computation in a distributed system are matched to the structural demands of the task domain [9]. Distributed intelligent systems model a number information processing phenomena that occur in the natural world [43], [45], [55], [73]. Such phenomena are a source of a number of useful metaphors for distributed processing and distributed problem solving. [21], [44]. Recent developments in distributed systems in general, and particularly the Internet, have further contributed to the importance of distributed intelligent systems [65].

This section briefly reviews recent efforts in knowledge modeling and knowledge processing in distributed intelligent systems in the way that is adopted within the well-known 13 program (Intelligent Integration of Information). The reasons are as follows. I3 is a long-term, DARPAsponsored research and development program, concerned with the development of large-scale, distributed intelligent applications [48]. It uses several specific science areas, such as knowledge-base management, object-oriented software development, and meta-language descriptions. 13 builds upon earlier work established by DARPA on knowledge representation and communication standards. All these


features of the I3 program make it a quite suitable real-world example for teaching distributed AI at the graduate level. Moreover, in our courses we have adopted the popular agent -oriented approach of representing AI problems and teaching about them [69]. I3 also perfectly illustrates the use of agent-oriented technology in developing AI systems. Finally, the I3 program is used as a background by many developers of virtual organizations [45], [48]. Our students are encouraged to learn about virtual organizations and to access them through the Web.

The main idea of the I3 program is to set sandards for transforming dispersed collections 'of heterogeneous data sources into virtual knowledge bases. The sources of data include databases, knowledge bases, sensor-based subsystems and simulation systems, scattered over different servers on the Internet and intranets. Virtual knowledge bases integrate the semantic content of such disparate sources and provide integrated information to end-user applications at the right level of abstraction, Figure 10. A whole family of intelligent agents is in charge of the mediation process. The agents are hierarchically organized and coordinated as a group of different facilitators, mediators, and translators. As a result, the user can access the data sources at a high, user-oriented level, and get the desired abstracted information without worrying at all about the underlying translation and mediation processes. Moreover, the distributed nature of the family of agents eliminates the need for having a number of human and machine resources as intermediary infOimation translators and processors.

Turning Figure 10 "upside down" and looking inside the I3 services reveals the details of this multilayered architecture, Figure 11. I3 service layers isolate applications from dynamic changes in the realworld environment and the rigidity of legacy data. The services are intended to be used by both research and application communities to design and integrate I3 components. The I3 framework defines a set of essential I3 objects that are used in the protocol of these services (e.g., Information request object, Repository object, and Ontology object). Information Services form the base level of information processing. They are responsible for interaction with client components. Repository Services manage storage and representation of objects, while Mediation Services handle queries (translation, brokering, and decomposition),

160 V. Devedzi6

data access through protocols specific to data sources, and integration of objects from multiple information sources in possibly domain dependent ways. Examples of this kind of integration include unit conversions, data calculations, name correlation, and the like.

Abstracted Information

t Mediation

I Unprocessed, Un integrated

Details

1-----------------------------------------------, I I I I I

Human & Computer Users User Services: Query, Monitor, Update, ...

: r------------------------------------, Integration Services:

I I I I I I I I I I I I I I I I I

Information Integration Agent

'-7--r-__ -.-"""""'4-----------+-'-- Coordination

+-f-t-- Semantic Integration

: L:==~====~==~===-~=+==~-===~~ Translation and

Wrapping , I I I I I I I I ,

Text, Video,

Images

Legacy

Databases

Relational Databases

Object & Knowledge

Bases

: Heterogeneous Data Sources ' I ______________________ _ ________________________ J

Figure 10. The 13 process.

Integration & Transformation

---- -----. r---- ---------- ----------- --------, Repository Services : : Mediation Services :

I I I

~------------------------------------, ____________________________________ J

r--------------------------------------------------------------------------, : Information Services: Request, Update, Subscribe I L __________________________________________________________________________ J

Figure 11. 13 service layers.

13 also defines Ontology Services, specific to knowledge modeling. These services include ontology viewer/editor, domain model viewer/ editor, merging of concepts, inference and classification. A dedicated working group within the program is concerned with the development


of an ontology meta-language (an "ontology ontology") for use in constructing sentences about ontology components.

3 Applications and Systems Based on the concepts and models discussed above, this section surveys several working applications and systems. All of them illustrate one or more typical knowledge modeling techniques that are used today in different kinds of intelligent systems. Also, all of them are used as examples of applying the techniques we teach within our graduate-level courses.

3.1 Programming Examples

As an introduction to the practical side of knowledge modeling in different applications and systems, consider how some classes for knowledge representation are designed and programmed. The two examples shown here stem from the work on knowledge modeling for industrial applications, described in [17]. It is mandatory for our students to do some practical programming of this kind themselves, during their lab exercises. The students are required to use C++ or Java, and they do the exercises using the MS Visual C++ and using the MS J++ environments.

Figure 12 shows the C++ class for representing an abstract knowledge element, CKnowledgeElement. The idea is that all other specific classes for knowledge representation (e.g., rules, frames, fuzzy sets, and neural networks) do have something in common and can be derived from the common CKnowledgeElement. An analogy to this idea can be found in the concepts of the Java language, where there is the abstract Object class on top of the hierarchy of all Java classes [1], [5]. All identifiers used in Figure 12 have self-explanatory names. Although several details are omitted for simplicity reasons, the class interface clearly shows what common services are provided by all knowledge elements, regardless of their type and nature.

162 V. Devediic

class CKnowledgeElement : public CObject (

};

public: CKnowIedgeElement( ); -CKnowledgeElement( ) { } const CString &GetName() const {return m_name;} void SetName(const CString &newString) {m_name = newString;} const unsigned long int GetID( ) const { return m_ID; } virtual const CKnowledgeElement *InUse

( const unsigned long int ID ) const { return 0; } II = 0 void Serialize( CArchive &ar ); static void SetStartID

(unsigned long int newNextID ) { m_nextID = newNextID; } static unsigned long int GetFinishID( ) { return m_nextID; }

DECLARE_SERIAL( CKnowledgeElement ) protected:

unsigned long int m_ID; private:

CString m_name; static unsigned long int m_nextID;

Figure 12. Programming example in c++.

public class CRuie extends CKnowledgeElement ( CClause m_ifClause; CClause m_thenClause; public CRule() {/* .. , *1 } public void AddIfClauseChunk( CChunk pNewChunk) {/* ... *1 } public void EditIfClauseChunk( int index, CDonDoc pDonDoc) {/* ... *1 } public void DeleteIfClauseChunk( int index) {/* .. , *1 } public void AddThenClauseChunk( CChunk pNewChunk) {/* ... *1 } public void EditThenClauseChunk( int index, CDonDoc pDonDoc) {/* ... *1 } public void DeleteThenClauseChunk( int index) {/* ... *1 } public String GetIfClauseChunkAsString( int index) {/* ... *1 } public String GetThenClauseChunkAsString ( int index) {/* ... *1 } public int IfClauseChunksNumber() {/* ... *1 } public int ThenClauseChunksNumber() {/* ... *1 } public String GetRuleAsString() {/* ... *1 } public void Serialize( CArchive ar) {/* ... *1 } public void UpdatePointers( CDonDoc pDonDoc) {/* ... *1 } public CKnowledgeElement InUse( long ID) {/* ... *1 } public CKnowledgeElement SlotlnUse(long frameID, long attributeID) {/* .. *I}

Figure 13. Programming example in Java.


Figure 13 shows how a class for representing production rules, CRule, can be programmed in Java, using the MS J++ environment. It is a direct implementation of the idea of representing rules as objects, as discussed in a previous section. Each rule's main parts are its If- and Then-clauses (m_ifClause and m_thenClause). Apart from the services provided by the base CKnowledgeElement class, common to all kinds of knowledge elements (like setting and getting names and IDs, serialization, and the like), CRule provides other services, mostly for accessing, updating and showing the rule's parts (different versions of Add, Edit, Delete, and Get functions).

Prior to doing some programming exercises of their own, we suggest our students to refer to some relevant literature where they can find many other programming examples of classes and methods for knowledge representation (e.g., [33], [36], [54], and [80]). We have found this practice very stimulating for the students' creativity. The examples they read about illustrate the techniques the students learn about during the courses, but also provoke some criticism from the students' side and an urge to program the same things better!

3.2 Coupling Expert Systems with Genetic Algorithms

We put strong emphasis on teaching hybrid techniques for building intelligent systems. Along with the basic theory from [56], we always describe a few working examples of hybrid systems during the lectures. Then the students are encouraged to prepare reports on other known hybrid systems themselves. Each student gets a separate assignment in this sense, and is required to stress the architectural details and details of hybridization in the report.

As an illustration, consider an interesting hybrid intelligent system that we often comment with the students during the corresponding parts of the courses, and that is used in steelmaking industry in Japan [31]. It is actually an in-house environment for developing applications based on the technologies of expert systems and genetic algorithms. Its main parts are:

164 V.Devedzi6

• Kl, an object-oriented expert system development tool; • Cl, a genetic algorithm development kit; • Libraries of optimization algorithms and utility classes.

In steelmaking industry, scheduling, planning, and control problems are rather complex. Naturally, they are solved more easily if divided into several subproblems. The problem domain also requires to use a heuristic approach and to have an optimization problem solver. Using commercial tools for developing expert systems in this domain doesn't provide good results. The reasons are practical: when such tools are applied to real problems in manufacturing, they usually have slow inference speed, are difficult to interface with other systems, and lack portability. These reasons have led to the in-house development of Kl. The main features of Kl include fast inference with less memory, object-oriented knowledge representation and reasoning (classes and methods for rules, working-memory elements, and inference engine), and the capability of importing of C++ classes that are defined externally. Kl also provides C++ direct coding to represent procedural knowledge easily. An example of C++ direct coding is shown in Figure 14.

Rule E {

so (W status == active) :

?? { II Start of direct coding for (int i = 0; i < SO.n; i++)

?make(X value = i); } ?? } II End of direct coding remove(SO);

Figure 14. C++ direct coding in the K1 tool.

Figure 15 shows the internal structure of the Kl environment, which reminds one of the Tanguy architecture (compare Figures 7 and 15). Using Kl to develop an expert system development includes the following steps:

• editing a rule base, using classes from the libraries as needed; • translating the entire rule base to C++ source code; • compiling the C++ source code;

Teaching Knowledge Modeling at the Graduate Level - a Case Study 165

• linking the compiled code with Kl' s inference engine and external programs;

• using the browser to debug the expert system (tracing inference steps, viewing the rules' conflict list, and viewing the working memory).

Rule base ~ ~ Editor

, Translator

I Parser I Code generator I ~,

Inference engine C++ source code Existing systems

I I::

I .. Libraries Application Browser

Figure 15. The architecture of Kl.

The Cl tool is a GA developers' kit that makes possible to solve optimization problems. When using genetic algorithms for that purpose, the problem must be represented as an evaluation (fitness) function, and solutions are obtained gradually and approximately. In each step, candidates of a solution are called chromosomes. Better candidates are obtained by applying operators (such as crossover and mutation) to the existing chromosomes. In C 1, a portable C++ class library supports various gene types (bit, integer, floating-point, case, and order). Its toplevel design is shown in Figure 16, using the UML notation. It also provides a problem-representation classes, as a skeleton to code the problem dependent part of GA (the right side part of Figure 16).

3.3 Loom and PowerLoom

The Loom system has been developed as a language and an environment for constructing intelligent applications [82], [83], [51].

166 V.Devedzic

PowerLoom is its successor. It is currently used in many research projects. Some of them are: Computational Linguistics, Groupware (Freiburg University, Germany), Linguistic Context in Image Understanding (University of Buffalo, USA), Natural Language Generation (Technical University Berlin, Germany), Medical Ontologies (Italian National Research Council, Italy), and Generic Knowledge Base Editor (SRI International, USA).

GA Problem

1 n r Pool

Specific

n J 1 problem

Chro rm so Ire

t I t BitChrormsome IntChrormsome Do ub leC hro moso Ire

I I CaseChromosome OrderChromosome

Problem-

Problem-independent part dependent part

Figure 16. The Cl tool.

We use LoomIPowerLoom (and also Rete++ and PARKAIPARKA-DB - see below) in our teaching primarily as examples of tools that increase efficiency of knowledge processing by compiling the original knowledge bases into internal network structures. It is the compiled network that is actually used by the knowledge processor at runtime, not the original knowledge base. LoomIPowerLoom's original know-


ledge modeling and representation is based on definitions, rules, facts, and default rules. The system is object-oriented, and allows for having polymorphism in its rules. This can be achieved by replacing a rule's action with a generic operation, which states the function of the rule's action. When a knowledge base is developed using LoomIPowerLoom, the knowledge objects are compiled into a network, for faster execution by the run-time system.

Our another reason for teaching about LoomIPowerLoom is its link with mathematical logic and the Prolog language, that are taught in undergraduate courses. Knowledge processing in LoomIPowerLoom is defined through a Prolog-technology deductive engine called a classifier. The classifier supports semantic unification and deductive query processing, backward- and forward-chaining, as well as objectoriented truth maintenance. The tool itself has been implemented using the language called STELLA (Strongly TypEd, Lisp-like LAnguage), which can be translated into Lisp, C++, and Java.

3.4 Generic Frame Protocol

In teaching about knowledge sharing and ontologies, we have found it useful to have some pedagogically appropriate topic in the introductory lecture. Such a topic should serve as an intermediary one between the topics concentrated on isolated knowledge bases, and the topics on shared knowledge bases. As the most suitable candidate topic in that sense, we have selected the Generic Frame Protocol (GFP).

GFP has been developed by SRI International and Stanford University as a generic interface of frame-based intelligent applications to underlying frame representation systems (FRSs) [38]. In fact, it is a generic model of different FRSs, allowing for applications' independence from a specific FRS. The model assumes the existence of a translation layer between the generic knowledge-base functions and an existing FRS-specific functional interface. This central idea of the GFP is illustrated in Figure 17. A library of object -oriented methods does the translation. It is up to the FRS developers to provide translation from their representation language to the language of the GFP. The GFP itself provides generic access functions for interacting with FRSs.

168

(get-slot-value ... )

FRS-specific Methods

ONTOLINGUA

V.Devedzi6

Knowledge Bases (in different FRS)

Figure 17. Using the Generic Frame Protocol.

The main result of the GFP project is the development of generic tools that operate on many FRSs. As a consequence, applications using the GFP are portable over a variety of systems and knowledge bases. Moreover, GFP enables knowledge sharing among different FRSs. It becomes comparatively easy to automatically translate the knowledge encoded in one FRS into another FRS.

3.5 GKB-Editor and Ontolingua

GFP has been used in two important applications, GKB-Editor (Generic Knowledge Base Editor) [38], and Ontolingua (a tool for describing ontologies) [28]. In our courses, we teach about GKB-Editor because it clearly illustrates how GFP works. GKB-Editor is a tool for browsing and editing knowledge bases graphically across multiple FRSs in a uniform manner. Using the GFP, it masks the representational details of the different underlying knowledge representation systems and presents to the user a common look-and-feel for all FRSs. This is depicted in Figure 18, and is just another instance of the general idea 1_£

multilayered software architecture. The modules of the GKB-Editl include a graphical interactive display, a library of generic knowledg base functions (GFP) , and libraries of frame-representation specific methods (translators).

Teaching Knowledge Modeling at the Graduate Level- a Case Study

User

GKB-Editor

r------------------------- - --------~ I I I

• o I , , , , , I

GUI..... GFP ~ I I I I I I I I I I

~ ______________________ ____________ J

Knowledge Bases (in different FRS )

Figure 18. Using the GKB-Editor.

169

The Onto lingua tool uses both GFP and a similar idea of translating different representations of ontologies from and to an in terlingua , or an intermediary language for ontologies. The interlingua used in Ontolingua is actually KIF. Thus we can describe Ontolingua briefly as a KIF-based interlingua for ontologies + a library of ontologies. A part of our course material on ontologies describes how Ontolingua works, includes exercises with it, and shows the importance of using first-order predicate calculus in developing ontologies.

The underlying assumptions of Ontolingua are that ontologies are generally heterogeneous (specialized for specific tasks and methods), but they still can have many concepts in common. It is useful to achieve reusability when developing new ontologies. This is possible if libraries of standard foundation ontologies are available, and if the developers have appropriate tools for composing and specializing foundation ontologies. Ontolingua provides both.

H W T W Remote Collaborators

T W

P Library

N N G G Remote Applications F F P P

Ontolingua

Trans lators : LOOM , CUPS, . . . Stand-alone Applications

Figure 19. Using Ontolingua.

170 V.Devedzit

Figure 19 shows how Ontolingua uses GFP. Stand-alone applications access Ontolingua through GFP using a library of GFP translators. Remote applications access Ontolingua server using Network GFP (NGFP). Finally, Ontolingua can be accessed from another site through the Internet and WWW, which extends its use to distributed intelligent systems and virtual organizations. Ontolingua has client-side stubs for NGFP in Java, C, and LISP.

3.6 PARKA and PARKA-DB

Along with LoomfPowerLoom and Ontolingua, another popular AI language/tool is PARKA [18], [32], [39], [74]. Its specific feature is its capability to scale to extremely large size applications. Also, PARKA allows for massively parallel knowledge representation.

PARKA is a frame-based language/tool. In the PARKA-based knowledge base, class, subclass, and property links are used to encode the ontology. Property values can be frames, strings, numeric values, or specialized data structures. Thus browsing a PARKA-based knowledge base on the screen is like accessing a huge semantic network. PARKA enables inferencing on knowledge bases containing millions of assertions.

PARKA itself is only of illustrative importance for our courses. We put more emphasis on PARKA-DB, because of its link with databases. PARKA-DB is another tool that uses DBMS technologies to support inferencing and data management [32]. It has significantly decreased primary memory requirements W.r.t. traditional knowledge representation systems, yet retaining inference capabilities. This is due to the integration of DBMS and KBMS technologies in PARKA-DB: DBMSs use external storage (disk) at runtime, while KBMSs enable inferencing and complex query evaluation. As a result, PARKA-DB relies primarily on the cheaper disk memory, consuming less of the more expensive internal memory at run-time.

PARKA-DB was developed to run on generic, single processor (or parallel) systems. It can process extremely complex conjunctive queries against a database. It also supports automatic knowledge discovery and data mining, in two ways:


• verifying hypotheses against data;

• finding relevant relationships in the database using taxonomical and other knowledge from the knowledge base.

Among the best-known applications of PARKA and PARKA-DB systems are CaPER (a case-based reasoning system), ForMAT (a casebased logistics planning system), and a set of medical information systems [39], [74].

3.7 Simulation-Based Design - SBD System

A domain-independent, distributed intelligent system/tool for simulation-based design of engineering products, is developed by Lockheed Martin [50]. The system is called SBD (for Simulation-Based Design). It is an implementation of a domain-independent concurrent engineering framework focusing on fundamental engineering processes [43]. These processes include product and process representation, collaboration and design process assembly and activation, visualization and interaction with product data, and integration of external applications.

We teach about the SBD system because it practically and clearly illustrates many general concepts of distributed intelligent systems. First of all, agent-orientation of our courses welcomes the SBD system - it provides agent-based support for all of the fundamental engineering processes mentioned above. Second, SBD is a concrete example of how the 13 framework, architecture and services are used. Finally, it also has many attributes of a virtual organization.

As a multi-agent, distributed, collaborative, virtual development environment with knowledge-based communications among its agents, SBD system is applicable throughout the product lifecycle. Due to its full-scale interoperability, it can be used in a distributed heterogeneous computing environment. In fact, SBD supports development of virtual prototypes of products and processes, as well as evaluation of these prototypes in synthetic environments that represent the entire lifecycle of the products. It is adaptable to many specific product domains.

172 V. Devedzi6

The overall architecture of the SBD system is shown in Figure 20. It provides different kinds of services that can be matched to the general 13 services. Individual services comprise multiple agents.

Integration Services

In teacttion Services

Data Services

Information Sharing

Object Managemmt

Figure 20. Architecture of the SBD system.

Application Services

Data Services deal with the product and process representation and manipulation, object-oriented approach to modeling data, and linking of component descriptions and behaviors. Interaction Services are in charge of advanced visualization of products and processes. They provide collaborative means for spatially manipulating products and processes, as well as for operator-in-the-loop simulation. Integration Services support using the system as a tool framework, collaboration and interoperation of tools, assembling collections of tools into integrated "megaprograms," and human communication through a shared electronic notebook. Finally, Application Services manage more or less static applications that perform specific roles for SBD users (e.g., animation and spreadsheet applications). These applications are mostly commercial-off-the-shelf (COTS) products.

In the lower part of Figure 20, Intelligent Information Bus provides support for communication needs of the higher level agents. In its layered architecture the Information Sharing Layer supports higherlevel communications needs between entities in the system. Its duties also include publication and subscription of interest in specific objects, attributes, or events. Object Management Layer hides the complexity of


communication from users and applications. High Performance Computing and Communications is the network interface layer. It isolates the underlying hardware and communications details from the Object Management Layer and higher level agents.

Incorporation of legacy codes into the SBD environment is enabled by means of wrappers. They input and output so called "smart data objects," which have associated translator methods. Smart data objects can determine the appropriate translation that is required. SBD works with a small set of interchange formats, with library of translators to convert between them. The notions of smart data objects, megaprograms and Application Services have a primary pedagogical importance for our students, because of the fact that we are trying to promote the idea of using component-based software design in developing AI systems.

3.8 Eon Tools for Building Knowledge-Based Tutors

In teaching AI to graduate students it is worth to "teach about teaching" as well, using the AI way. Moreover, there is a growing interest in AIbased systems for learning, particularly in the context of Web-based learning. Finally, a course on almost any subject is often additionally beneficial for the course participants if it includes a lecture or two of the form "Putting it all together" in the end. Thus we have incorporated some material about knowledge-based tutors in our courses. As an illustration, Eon tools are described here [60], [61]. They encompass knowledge modeling, knowledge processing, architectural considerations, ontologies, agents, tool design, student modeling, and many more AI-relevant issues.

Eon tools have been developed as a set of domain-independent tools for authoring all aspects of a knowledge-based tutor [60], [61]. The tools define a minimum underlying object-oriented framework for developing intelligent tutors. The framework is neutral w.r.t. application domain or instructional theory. It is possible to use these tools, including their "ontology objects" (see below), as a metaauthoring tool - the tool for designing special purpose authoring tools for specific domain types.

174 V. Devedzic

Aspects of a knowledge-based tutor include:

• domain knowledge - topics and contents that define the curriculum; • teaching strategies - pedagogical knowledge of how to present and

explain the material defined in the curriculum and how to guide the student through the problem-solving process;

• student model - knowledge of the student's current level of mastering the contents defined in the curriculum, as well as a set of indicators of the student's current progress in solving a problem put by the system;

• user interface and learning environment - usually a graphically rich environment through which the authors and the students communicate with the system.

The set of Eon tools that covers these aspects of a knowledge-based tutor is shown in Figure 21. The author uses a set of domain-knowledge editors in order to define curriculum contents and store them in the knowledge base in modular, declarative units (as Topic and Content objects). There are also two other sets of dedicated editors, for defining teaching strategies and elements of the user interface.

Beneath the surface shown in Figure 21, the domain model is represented as a semantic network of units of knowledge called Topics. The topic network defines the mapping of the learning goals to topics and their relationships. The important concept regarding topics and the topic network is the Topic ontology. It specifies:

• topic types - the types of nodes allowed (e.g., concept, fact, principle)

• topic link types - the types of links allowed (e.g., is-a, part-of, prerequisite, context-for)

• topic properties - the types of properties topics can have (e.g., importance or difficulty)

The topic network editor has different icons for representing different topic types, topic link types, and topic properties. They facilitate the authoring process making the resulting topic network visually comprehensible and easy to browse and maintain.

Teaching Knowledge Modeling at the Graduate Level- a Case Study

Topic Net

Editor Topic

Contents Editor

Presentation Contents

Editor

Figure 21. Eon tools.

Student Model Editor

175

Topics have different levels, which are also defined in the Topic ontology object. Topic levels represent different aspects or uses for the topic (e.g., introduction, summary, teach, test, beginning, difficult, etc.). For each topic, the topic contents are associated with topic levels and can be different at different topic levels. The topic contents can be a sequence or a set of applicable content objects (selection and sequencing can be left to the teaching strategy).

The student model defines mastery parameters associated with the topics from the curriculum. Eon tools assume that the values of the mastery parameters are determined by applying a set of rules, hence the Student Model Editor lets the author define the rules. Likewise, the dedicated editors for defining teaching strategies support defining rules for intelligent selection and sequencing of topics and tasks, presentation of feedback, hints, and explanations, and biasing the learning environment to maximize learning.

176 V. Devedzic

Layered control architecture underlies the knowledge model used in Eon tools, Figure 22. The figure shows how the lessons in the curriculum are composed of topic networks, how each topic is associated with topic levels, and how different presentation contents correspond to different topic levels. The presentation contents at each are treated as a series of events that occur during the topic presentation. This control architecture also affects the student model and the teaching strategies. In the student model, the values of mastery parameters are assigned to objects at each layer, and the values of objects at any level are determined by the student model rules written for that level. The rules of the student model specify how the value of an object depends on the values of the objects at the next lower level. Likewise, the teaching strategies refer to the knowledge objects and their relationships at each layer.

Lessons

Topics

C==>C==>0 {"t,:',:,;,h,:

/'.

Topic Levels

Presentation Contents

Events

Figure 22. Layered control architecture.


4 Conclusions

As conclusions of this chapter, let us compile the major points from closing lectures of our three courses. The closing lectures are designed in such a way to provide appropriate summaries of the material covered during the corresponding course.

Apart from the obvious diversity of knowledge modeling techniques in current intelligent systems, it is also possible to point out several common issues. First, most knowledge modeling techniques today are getting adapted to the general object-oriented approach from the software engineering disciplines. Regardless of any specific kind of intelligent systems and of application domains, most todays' intelligent systems model and represent knowledge in the object-oriented way. Also, knowledge modeling and processing are object-oriented regardless of whether the actual system is distributed or not. All of the concepts, theories, and practical systems covered in this chapter illustrate these facts.

Second, we note an increasing integration of AI with other software disciplines. Many issues discussed here, like unification of data, information, and knowledge objects, integration of knowledge-based systems with traditional databases, as well as embedding of intelligent reasoning into traditional applications, distributed systems and simulation-based systems, prove this observation. Unlike the situation in 1980s and before, when knowledge modeling has been largely treated as something far away from the interest of major software disciplines, it nowadays has its own notable place in most software disciplines.

Next, many intelligent systems today use a hybrid approach to knowledge modeling and representation. The reason is simple - all individual techniques have their own good sides, but also some limitations and shortcomings. Together, they increase the modeling options and the resulting system's performance. Systems like KlIC1, .and to an extent PARKA, LoomJPowerLoom, Ontolingua, and SBD, are good representatives of this trend.

178 V.Devedzi6

Also, many practical systems used a layered approach to knowledge modeling. If we consider design of ontologies and knowledge sharing, intelligent databases, distributed intelligent systems, the Tanguy architecture, and the specific architectures of systems like Eon tools, SBD, GFP/GKB Editor, Ontolingua, and KI/Cl, we easily note that they are all layered. Again, the benefits of using layered architectures are well known in the more general field of software design.

General trends of designing, developing and using distributed systems, client-server architectures and Internet computing do not bypass knowledge-based systems. Distribution and sharing of knowledge and knowledge processing across the Internet and using virtual knowledge bases are among the main goals of the 13 project.

Who does benefit from such a state-of-the-art and such trends in knowledge modeling? The simplest answer is: designers, system and tool developers, end-users, and researchers - all of them. Also, specialists and professionals from other fields, and members of interdisciplinary teams as well. One area that receives a rapidly increasing interest in recent years, in which modem knowledge modeling and knowledge sharing are of extreme importance, is that of virtual organizations. See [45] for a comprehensive treatment of this topic.

But several accomplishments are still lacking, and some future research and development trends in the field of knowledge modeling have already started to emerge. Among the missing things are standards in knowledge modeling. There are some working groups that already put a lot of efforts in this direction, but they have not completed their work yet. In between (but in parallel), researchers in many fields that overlap with knowledge modeling show an increasing interest in interoperable software components. Again, a minimum consensus is necessary on the question "What exactly should be the interoperable software components for knowledge modeling and knowledge representation?" The promising and growing area that may give an appropriate answer (or answers) to questions like that is the area of ontologies and ontology engineering. Finally, using already known design patterns for knowledge representation, as well as discovering new ones, will definitely contribute to the field of knowledge modeling and will further its penetration into other software disciplines. An ultimate goal of such


efforts and activities should be development of dedicated pattern languages for knowledge modeling. Similar work of researchers in other disciplines (e.g., [11]) may be a good starting point in this direction.

References

[1] Arnold, K. and Gosling, J. (1996), The Java Programming Language, Addison-Wesley, Reading, MA.

[2] Batory, D. and O'Malley, S. (1992), "The Design and Implementation of Hierarchical Software Systems with Reusable Components," ACM Transactions on Software Engineering and Methodology, VoU, No.4, pp. 355-398.

[3] Booch, G. (1994), Object-Oriented Analysis and Design with Applications, 2nd Edition, Benjamin/Cummings Publishing Company, Inc., Redwood City, CA.

[4] Booch, G., Rumbaugh, J. and Jacobson, I. (1998), Unified Modelling Language User's Guide, Addison-Wesley, Reading, MA.

[5] Campione, M. and Walrath, K. (1998), The Java Tutorial - ObjectOriented Programming for the Internet, Second Ed., AddisonWesley, Reading, MA.

[6] Cardenas, A.F., leong, I.T., Taira, RK., Barker, Rand Breant, C.M. (1993), "The Knowledge-Based Object-Oriented PICQUERY+ Language," IEEE Transactions on Knowledge and Data Engineering, Vo1.5, No.4, pp. 644-657.

[7] Cattell, RG.G. (Ed.) (1994), The Object Database Standard: ODMG-93, Release 1.1, Morgan Kaufmann Publishers, San Francisco, CA.

[8] Chandrasekaran, B. and Josephson, J.R (1997), "The Ontology of Tasks and Methods," Proceedings of The AAAI 1997 Spring Symposium on Ontological Engineering, Stanford University, CA, pp. 231-238.

180 V. Devedzi6

[9] Chandrasekaran, B. (1981), "Natural and Social Systems Metaphors for Distributed Problem Solving," IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-ll, No.1, pp. 1-5.

[10] Chen, W., Hayashi, Y., Kin, L., Ikeda, M., and Mizoguchi, R. (1998), "Ontological Issues on an Intelligent Authoring Tool," Proceedings of The ECAI'98 Workshop on Model-Based Reasoning for Intelligent Education Environments, Brighton, England.

[11] Coplien, J. and Schmidt, D. (1995), Pattern Languages of Program Design, Addison-Wesley, Reading, MA.

[12] Czejdo, B., Eick, c.F. and Taylor, M. (1993), "Integrating Sets, Rules, and Data in an Object-Oriented Environment," IEEE Expert, pp. 59-66, February.

[13] Debenham, J. (1998), Knowledge Engineering - Unifying Knowledge Base and Database Design, Springer, Berlin.

[14] Debenham, J.K. (1994), "Objects for Knowledge Modelling," Proceedings of The Second World Congress on Expert Systems, Lisbon, Portugal, pp. 979-985.

[15] Debenham, J. and Devedzic, V. (1996), "Knowledge Analysis in KBS Design," in Ramsay, A.M. (Ed.): Artificial Intelligence: Methodology, Systems, Applications, lOS Press, Amsterdam! OHM Ohmsha, Tokyo, pp. 178-187.

[16] Decker, K.S. (1987), "Distributed Problem Solving Techniques: a Survey," IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-17, No.5, pp. 729-740.

[17] Devedzic, V. and Radovic, D. (1999), "A Framework for Building Intelligent Manufacturing Systems," IEEE Transactions on Systems, Man, and Cybernetics (to appear in August 1999).

[18] Evett, M.P. (1994), PARKA: A System for Massively Parallel Knowledge Representation, Ph.D. dissertation, University of Maryland, College Park, U.S.A.


[19] Fayyad, U. et al., (Eds.) (1996), Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, MA.

[20] Fikes, R (1997), "Reusable ontologies: A Key Enabler for Electronic Commerce," http://ksl-web. stanford. edu/Reusable

ontol/index.html.

[21] Finin, T. et al. (1994), "KQML as an Agent Communication Language," Proceedings of The Third CIKM Conference, Galthersbourg, Maryland, U.S.A., December.

[22] Fridman-Noy, N. and Hafner, C.D. (1997), "The State of the Art in Ontology Design," AI Magazine, Fall '97, pp. 53-74.

[23] Funabashi, M., Maeda, A., Morooka, Y. and Mori, K. (1995), "Fuzzy and Neural Hybrid Expert Systems: Synergetic AI," IEEE Expert, pp. 32-40, August.

[24] Gamma, E., Helm, R, Johnson, R, and Vlissides, J. (1994), Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, Reading, MA.

[25] Garrity, E.J. and Sipior, J.C. (1994), "Multimedia as a Vehicle for Knowledge Modeling in Expert Systems," Expert Systems with Applications, Vol. 7, No.3, pp. 397-406.

[26] Genesereth, M.R and Fikes, RE. (1993), "Knowledge Interchange Format, Version 3.0, Reference Manual," Technical Report Logic-92-1, Computer Science Department, Stanford University.

[27] Grand, M. (1998), Patterns in Java - A Catalog of Reusable Design Patterns Illustrated with UML, John Wiley & Sons, New York.

[28] Gruber, T. (1993), "A Translation Approach to Portable Ontology Specifications," Knowledge Acquisition, Vol. 5, No.2, pp. 199-220.

182 V.Devedzi6

[29] Gruber, T. and Olsen, G. (1994), "An Ontology for Engineering Mathematics," Proceedings of The Fourth International Conference on Principles of Knowledge Representation and Reasoning, Bonn, Germany, pp. 137-144.

[30] Haley Enterprise (1999), "Reasoning about Rete++," White paper available at http://www.haley.com.

[31] Hamada, K, et al. (1995), "Hybridizing a Genetic Algorithm with Rule-Based Reasoning for Production Planning," IEEE Expert, pp. 60-67, October.

[32] Hendler, J., Stoffel, K, Taylor, M., Rager, D. and Kettler, B. (1997), "PARKA-DB: A Scalable Knowledge Representation System - Database PARKA," http://www.csumd.edu/parkadb.html/

[33] Hu, D. (1989), C/C++ for Expert Systems, MIS Press, Portland, Oregon.

[34] Huamg, K and Chen, M.-C. (1996), "OKCFTR: Translators for Knowledge Reuse," Proceedings of The Ninth International Conference on Industrial and Engineering Applications of Artificial Intelligence, Fukuoka, Japan, pp. 333-338.

[35] Ito, H. and Fukumura, T. (1996), "Integrating Rules and a Database by the Loose-Coupling System in Frames," Proceedings of The Third World Congress on Expert Systems, Seoul, Korea, pp. 1090-1097.

[36] Janzen, T.E. (1993), "C++ Classes for Fuzzy Logic," The C Users Journal, pp. 55-71, November.

[37] Jerini6, L. and Devedzi6, V. (1997), "OBOA Model of Explanation in an Intelligent Tutoring Shell," ACM SIGCSE Bulletin, Vol. 29, No.3, pp. 133-135.

[38] Karp, P.D., Myers, K and Gruber, T. (1995), "The Generic Frame Protocol," Proceedings of the 1995 International Joint Conference on Artificial Intelligence, pp. 768-774.


[39] Kettler, B.P., Hendler, J.A., Andersen, W.A. and Evett, M.P. (1994), "Massively Parallel Support for a Case-based Planning System," IEEE Expert, pp. 8-14, February.

[40] Knaus, R (1990), "Object-Oriented Shells," AI Expert, pp. 19-25, September.

[41] Kohavi, R, John, G., Long, R, Manley, D. and Pfleger, K. (1996), "MLC++: A Machine Learning Library in C++," Proceedings of The IEEE Conference on Tools with Artificial Intelligence, pp. 38-46.

[42] Kowalski, B. and Stipp, L. (1990), "Object Processing for Knowledge-Based Systems," AI Expert, pp. 34-41, October.

[43] Kuokka, D.R and Harada, L.T. (1995), "A Communication Infrastructure for Concurrent Engineering," Journal of Artificial Intelligence in Engineering, Design, Analysis and Manufacturing, Vol. 3, No.2, pp. 78-90.

[44] Kuokka, D. and Livezey, B. (1994), "A Collaborative Parametric Design Agent," Proceedings of The 12th National Conference on AI," pp. 387-393.

[45] O'Leary, D. Kuokka, D. and Plant, R (1997), "Artificial Intelligence and Virtual Organizations," Communications of The ACM, Vol. 40, No.1, pp. 52-59.

[46] O'Leary, D. (1998), "Knowledge-Management Systems," IEEE Intelligent Systems, pp. 30-33, May/June.

[47] Lee, Z. and Lee, J. (1996) "A Framework for Fuzzy Knowledge Representation as a Perspective of Object-Oriented Paradigm," Proceedings of The Third World Congress on Expert Systems, Vol. II, Seoul, Korea, pp. 1211-1216.

[48] Lehrer, N. et at., (1996), "Key I3 Services (KIS) Working Draft," Proceedings of The 13 Workshop, Miami, http://webext2.darpa.mil/iso/i3/

184 V.Devedzi6

[49] Leung, K.S. and Wong, M.H. (1990), "An Expert-System Shell Using Structured Knowledge - An Object-Oriented Approach," IEEE Computer, pp. 38-47, March.

[50] Lockheed Martin Artificial Intelligence Center (1997), "SBD Systems Design Paper," http://sbdhost .parl. com/sbd-paper .html.

[51] MacGregor, RM. (1994), "A Description Classifier for the Predicate Calculus," Proceedings of the Twelfth National Conference on Artificial Intelligence, (AAAI 94), pp. 213-220.

[52] Manola, F. (1990), "Object-Oriented Knowledge Bases, Part 1," AI Expert, pp. 26-36, March.

[53] Manola, F. (1990), "Object-Oriented Knowledge Bases, Part 2," AI Expert, pp. 46-57, April.

[54] Masters, T. (1994), Practical Neural Network Recipes in C++, Academic Press, New York.

[55] McGuire, J.G., Kuokka, D.R, Weber, J.e., Tenenbaum, J.M., Gruber, T.R and Olsen, G.R (1993), "SHADE: Technology for Knowledge-based Collaborative Engineering," Concurrent Engineering: Applications and Research (CERA), Vol. 1, No.3, pp.17-31.

[56] Medsker, L.R (1994) Hybrid Intelligent Systems, Kluwer Academic Publishers, Amsterdam.

[57] Mizoguchi, R and Ikeda, M. (1996), "Towards Ontology Engineering," Technical Report AI-TR-96-1, ISIR, Osaka University, Japan, 1996.

[58] Muller, J.P., Wooldridge, MJ. and Jennings, N.R (1994-1996), Intelligent Agents, 3 Volumes, Springer-Verlag, NY.

[59] Mulvenna, M.D., Murphy, M. and Hughes, J.G. (1996), "Rule Subsumption in Object-Bases," Proceedings of The Third World Congress on Expert Systems, Seoul, Korea, Vol. IT, pp. 1106-1113.


[60] Murray, T. (1997), "Authoring Knowledge Based Tutors: Tools for Content, Instructional Strategy, Student Model, and Interface Design," submitted to the Journal of the Learning Sciences, http://www.cs.urnass.edu/-trnurray/.

[61] Murray, T. (1996), "Toward a conceptual vocabulary for intelligent tutoring systems," working paper available at http: / / www.cs.urnass.edu/-trnurray/papers.htrnl.

[62] Neches, R, Fikes, R, Finin, T., Gruber, T., Patil, R, Senator T. and Swartout, W.R (1991) "EnabJing Technology for Knowledge Sharing," AI Magazine, pp. 36-56, Fall 1991.

[63] Pars aye, K. and Chignell, M. (1993), Intelligent Databases: Object-Oriented, Deductive Hypermedia Technologies, John Wiley & Sons, New York.

[64] Radovi6, D. and Devedzi6, V. (1998), "Towards Reusable Ontologies in Intelligent Tutoring Systems," Proceedings of the CONT/'98 Conference, Timisoara, Romania, pp. 138-145.

[65] Pope, A. (1997), The CORBA Reference Guide: Understanding the Common Object Request Broker Architecture, Addison-Wesley, Reading, MA.

[66] Ragusa, J.M. (1994), "Models and Applications of Multimedia, Hypermedia, and Intellimedia Integration with Expert Systems," Expert Systems with Applications, Vol. 7, No.3, pp. 407-426.

[67] Rajlich, V. and Silva, J.H. (1996), "Evolution and Reuse of Orthogonal Architecture," IEEE Transactions on Software Engineering, Vol. 22, No.2, pp. 153-157.

[68] Ramamoorthy, C.V. and Sheu, P.c. (1988), "Object-Oriented Systems," IEEE Expert, pp. 9-15, Fall 1988 .

. [69] Russell, S. and Norvig, P. (1995), Artificial Intelligence - A Modern Approach, Prentice-Hall, Englewood Cliffs, NJ.

186 V.Devedzi6

[70] Sen, A. and Choobineh, J. (1990), "Deductive Data Modeling: A New Trend in Database Management for Decision Support Systems," Decision Support Systems, Vol. 6, No.1, pp. 45-57.

[71] Shaw, M. (1995), "Making Choices: A Comparison of Styles for Software Architecture," IEEE Software, Special issue on software architecture, Vol. 12, No.6, pp. 27-41.

[72] Shaw, M. and Garlan, D. (1996), Software Architecture: Perspectives on an Emerging Discipline, Prentice-Hall, Englewood Cliffs, NJ.

[73] Singh, M.P., Huhns, M.N. and Stephens, L.M. (1993), "Declarative Representations of Multiagent Systems," IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No.5, pp. 721-739.

[74] Stoffel, K., Taylor, M. and Hendler, J. (1997), "Efficient Management of Very Large Ontologies," Proceedings of The American Association for Artificial Intelligence Conference (AAAI-97), AAAIIMIT Press, pp. 313-320.

[75] Stonebraker, M. (1992), "The Integration of Rule Systems and Database Systems," IEEE Transactions on Knowledge and Data Engineering, Vol. 4, No.5, pp. 415-423.

[76] Szyperski, C. (1998), Component Software: Beyond ObjectOriented Programming, ACM Press/Addison-Wesley, NY/ Reading, MA.

[77] Turban, E. and Aronson, J .E. (1998), Decision Support Systems and Intelligent Systems, Fifth ed., Prentice-Hall, Englewood Cliffs, NJ.

[78] Vinoski, S. (1997), "CORBA: Integrating Diverse Applications Within Distributed Heterogeneous Environments," IEEE Communications Magazine, Vol. 14, No.2, pp. 28-40.


[79] Watson, I., Haydon, G., Basden, A., Picton, M. and Brandon, P. (1994), "A Common Object-Oriented Inferencing System for Three Construction Knowledge-Based Systems," Proceedings of The Second World Congress on Expert Systems, Lisbon, Portugal, pp.966-976.

[80] Wei stead, S.T. (1994), Neural Networks and Fuzzy Logic Applications in C/C++, IEEE Computer Society Press, Los Alamitos, CA.

[81] Yang, H.-L. (1997), "A Simple Coupler to Link Expert Systems with Database Systems," Expert Systems with Applications, VoL 12, No.2, pp. 179-188.

[82] Yen, J., Neches, R. and MacGregor, R. (1991), "CLASP: Integrating Term Subsumption Systems and Production Systems," IEEE Transactions on Knowledge and Data Engineering, Vol. 3, No.1, pp. 25-32.

[83] Yen, J., Juang, H.-L. and MacGregor, R. (1991), "Using Polymorphism to Improve Expert Systems Maintainability," IEEE Expert, Vol. 6, No.2, pp. 48-55.

188 V.Devedzi6

Selected Webliography - Starting Points

OMG/CORBA www . omg.org

Design patterns st-www.cs.uiuc.edu/users/patterns/patterns.html

Knowledge representation www.medg.lcs.mit.edu/doyle/kr/

Knowledge sharing WWW-KSL.Stanford.EDU:80/knowledge-sharing/

Ontologies www.csi.uottawa.ca/dept/Ontology/ www.medg.lcs.mit.edu/doyle/top/

Knowledge processing mnemosyne.itc.it:l024/kr-links.html

Intelligent databases www.kdnuggets.com

Distributed intelligent systems web-ext2.darpa.mil/iso/i3/ sbdhost.parl.com/sbd-paper.html

Expert systems www.abdn.ac.uk/-acc025/otherai.html

Intelligent agents www.agent.org drogo.cselt.it/fipa/

Intelligent tutoring systems www.manta.ieee.org/p1484/links.htm advlearn . lrdc.pitt.edu/its-arch/

LoomIPowerLoom www.isi . edu/index.html www.isi.edu/LOOM-HOME.html

PARKA www.cs.umd.edu/parka-cm.html www.cs.umd.edu/parka-db.html

GFP, GKB www.ai.sri.com/-gfp/

AI bibliographies ai.iit.nrc.ca/ai_bib.html liinwww.ira.uka . de/bibliography/Ai/index.html

AI resources www.computer.org/pubs/expert/ai-www/ai-www.htm

CHAPTER 6

INNOVATIVE MODELING TECHNIQUES FOR INTELLIGENT

TUTORING SYSTEMS

v. Devediic FON - School of Business Administration

University of Belgrade, Belgrade Yugoslavia

D. Radovic Technical Faculty Cacak

University of Kragujevac, Cacak Yugoslavia

L. Jerinic Institute of Mathematics

University of Novi Sad, Novi Sad Yugoslavia

This chapter describes three modeling techniques that have recently started to attract the attention of researchers and developers in the domain of intelligent tutoring systems (ITSs). These techniques are: hierarchical modeling, interoperable and reusable software components, and ontologies. All three of them have been used in developing a model of ITSs called GET-BITS (GEneric Tools for Building ITSs). The GET-BITS model has been used throughout the chapter in order to illustrate the techniques. The major goal of the chapter is to show how these three techniques can be used to make the internal organization of ITSs more natural, more flexible, and more robust, to enhance their design, and to improve their performance. Important modules of any intelligent tutoring system, like domain knowledge, pedagogical knowledge, student model, and explanation

190 V. Devedzic, D. Radovic, and L. Jerinic

strategies, are discussed extensively in the context of the three modeling techniques and the GET-BITS model. Experience with using GET -BITS as the basis for building practical applications shows how the processes of computer-based tutoring and learning based on the GET-BITS model are much closer to human-based instruction. From the design perspective, major advantages of using hierarchical modeling, software components, and ontologies in developing practical ITSs include enhanced modularity, easy extensions, and important steps towards knowledge sharing and reuse.

1 Introduction Hierarchical modeling, interoperable and reusable software components, and ontologies are modeling techniques that have only recently penetrated into the ITS domain. In order to understand these three modeling techniques properly, it is useful to briefly survey the major issues of the domain of ITSs and provide an appropriate context for presenting the three techniques. This section gives an overview of applying AI techniques in education and sets the background for the major parts of the chapter. It first briefly presents traditional views, architectures, design solutions, and goals of developing AI-based software for education. Then it discusses the use of ITS shells and authoring tools for building tutoring systems. It also introduces some state-of-the-art topics in the field of ITSs. Some shortcomings of today's ITSs are discussed next. Finally, the problem of this chapter is defined more precisely.

1.1 Traditional ITSs

Instructional computer programs are being developed since the early 1970s [51]. Rapid development of computer technologies and AI methods, introduction of computers into schools, and daily use of computers by people of different vocation, education and age, have made education a very important field to AI researchers. Their main goals have been to develop programs that can teach humans and to

Innovative Modeling Techniques for Intelligent Tutoring Systems 191

achieve individualization of the educational process. That has been the dawn of the field of ITSSI.

Traditionally, ITSs are computer-based instructional systems that have separate data and knowledge bases for [53]:

• instructional content (specifying what to teach);

• teaching strategies (specifying how to teach); and

• modeling the student's mastery of the topics being taught, in order to dynamically adapt the process of instruction to the student.

These three dataJknowledge bases are often referred to as expert module, pedagogical module, and student model. Figure 1 shows the corresponding classical architecture of ITSs.

Student Figure 1. Traditional architecture of ITSs.

Design of traditional ITSs is based on two fundamental assumptions about learning. First, that individualized instruction by a competent

1 Intelligent Tutoring Systems (ITS), Intelligent Learning Environments (lLE) , KnowledgeBased Tutors (KBT), Intelligent Computer Assisted Instruction (ICAl) , and Intelligent Educational Systems (rES) are all more or less synonyms for using methods and techniques of Artificial Intelligence to improve the processes of computer-based teaching and learning.

192 V. Devedzi6 , D .. Radovi6, and L. Jerini6

tutor is far superior to classroom style learning because both the content and the style of the instruction can be continuously adapted to best meet the needs of each individual student. Second, that students learn better in situations which more closely approximate the situations in which they will use their knowledge, i.e., they "learn by doing," learn from their mistakes, and learn by constructing knowledge in a very individualized way.

ITSs use techniques that allow automated instruction to come closer to the ideal, by more closely simulating realistic situations, and by incorporating computational models (knowledge bases) of the content, the teaching process, and the student's learning state. In fact, ITSs belong to the intersection of three more general disciplines, as it is shown in Figure 2: artificial intelligence (AI), computer science (CS), and theory of instruction (TI).

• Programming techniques • Graphics

• Knowledge representation • Human-Computer Interaction • Reasoning • Simulation, etc. • Machine learning • Expert Systems, etc.

• Cognitive sciences • Pedagogy • Psychology • Instruction sciences, etc.

Figure 2. Traditional architecture of ITSs.

1.2 ITS Shells and Authoring Tools

More recent ITSs pay more attention to generic problems and concepts of the tutoring process, trying to separate architectural, methodological, and control issues from the domain knowledge as much as possible


(see, for example, [27], [28], [48], [52]). In other words, there are interactive and integrated development tools for building ITSs, i.e. for helping the developer plug-in some domain knowledge and test the prototype system, and then gradually and incrementally develop the final system. Such integrated tools are often referred to as shells (e.g., ITS shells, or ILE shells, or IES shells, etc.), which usually require a knowledge engineer in order to be fully used, or authoring tools, which can be also used by human instructors who do not necessarily have knowledge engineering experience. In their essence ITS shells and authoring tools are much like expert system shells.

1.3 Recent Advancements

A lot of current research and development efforts in the domain of ITSs is focused on ITSs for collaborative learning, on using InternetlWWW technology in order to provide comfortable, user-oriented, distributed learning and teaching facilities, and on employing intelligent agents to play important teaching and learning roles in ITSs.

Collaborative learning is the learning paradigm in which a problem is not solvable by an individual student, but it forms an adequate challenge for, and can be solved by a group of students [22], [38], [45]. Collaborative learning systems capture the aspects that identify the group of students as a whole (group beliefs, group actions, group goals, group misconceptions, differential aspects, and conflicts), as opposed to simply having a set of independent elements of individual learners. Some interesting problems in ITSs for collaborative learning include the effects of sharing student models [8], the study of the types of collaborative activities that occur when a pair of students solves a problem [22], the issue of picking the right members of a team for solving the problem collaboratively [21], and the study of artificial learning actors [1].

In distributed intelligent tutoring, two or more systems are combined in order to teach the same subject [6]. The problem of sharing student models of different systems is the key to increasing the effects of learning. The so-called open distributed learning environments integrate the ideas of traditional ITSs, collaborative learning, distributed learning systems, and open systems [34].

194 v. Devedzic , D. Radovic, and L. Jerinic

In Web-based ITSs, the goals are classroom independence and platform independence of the teaching and learning process, · as well as interactivity and adaptivity [7], [44]. The idea is that an ITS application installed at one server can be used by thousands of learners all over the world who are equipped with any kind of Internet-connected computer. This is, however, far more difficult than just putting a Web-based course on a server and let remote learners play with it. In order to be useful to individual learners, Web-based ITSs must be adaptive, since when learning from a Web-tutor there is often no colleague or a teacher around to provide assistance as in a normal classroom situation.

One of the main idea of integrating active intelligent agents into ITS is to let an active intelligent agent be a tutor to each individual student. Such pedagogical agents take into account the current progress of the student, specific learning goal of the student, specific needs of the student to communicate with other students, teaching strategies, roles of learning companions and other actors, and the perceived mental model of the student in the charting of a personalized course of learning [1], [15], [29], [50]. It has been argued that reusability and generality of student models can be increased if they are decoupled from the other modules of ITS and given the autonomy of software agents [37].

1.4 Shortcomings of Current ITSs

Traditional ITSs are concentrated on the domain knowledge they are supposed to present and teach, hence their control mechanisms are often domain-dependent [2], [28], [51], [53]. Moreover, all traditional models of ITSs, as well as the corresponding knowledge models, differ only to an extent. They still have much in common regarding the system architecture and design. However, the design methodologies employed vary a lot, and sometimes even remain blurred for the sake of the system functionality alone.

On the other hand, using a shell or an authoring tool for developing ITSs brings more systematic design, but can also become a limiting factor. The shell may not support a certain knowledge representation technique or design strategy that may be needed in a particular system. Sometimes the shell has a number of options which are seldom or never actually used in developing practical systems.


Even in the most recent ITS trends, like collaborative learning environments, pedagogical agents, and Web-based ITSs, there are important design issues that have been either skipped in current systems, or they have been left in their infancy. These issues include more regular internal organization and architecture of the system modules, knowledge sharing and reuse for ITSs, and "plug-and-play" software design for ITSs. All these issues have been already seriously treated in many other intelligent systems. However, in the domain of ITS modeling and design these issues have just recently started to emerge as important ones.

1.5 Problem Statement

A carefully chosen design methodology, combined with powerful modeling techniques, usually results in a significant improvement of the system performance, reduces development time, and facilitates maintenance. In that sense, it is important to specify the design methodology as explicitly as possible and to use modeling techniques that will enable developers to represent selected aspects of the system effectively. From that perspective, the goals of this chapter are:

• to describe how development of ITSs can be alleviated using hierarchical modeling, software components, and ontologies;

• to explain how performance of ITSs can be improved using these three modeling techniques;

• to illustrate how these three techniques are included into an existing, recently developed practical model of ITSs, the GET-BITS model;

• to show examples of using the three techniques in developing practical ITSs.

2 Hierarchical Modeling In the general domain of object-oriented software engineering, hierarchical modeling refers to layered software architectures [4], in which:

196 V. Devedzic , D. Radovic, and L. Jerinic

• each component in a system belongs at a certain conceptual layer (layers are sets of classes on the same level of abstraction);

• more complex components are designed starting from simpler components from the same layer or from the lower layers;

• drawing a hierarchically organized tree of components that spans across multiple layers can represent the architecture of the system.

One particularly important extension of the concept of layered software architecture is the orthogonal architecture [41]. In the orthogonal architecture, classes (objects) are organized into layers and threads. Threads consist of classes implementing the same functionality, related to each other by the using relationship [5]. Threads are "vertical," in the sense that their classes belong to different layers. Layers are "horizontal," and there is no using relationship among the classes in the same layer. Hence modifications within a thread do not affect other threads. Layers and threads together form a grid. By the position of a class in the architecture, it is easy to understand what level of abstraction and what functionality it implements. The architecture itself is highly reusable, since it is shared by all programs in a certain domain which have the same layers, but may have different threads.

2.1 Hierarchy of Components for ITSs

In designing an ITS, an ITS shell, or an authoring tool, it is useful to have a set of domain independent components and tools as building blocks for all parts of the system [24], [35]. Such components and tools and their relationships are the basis for a framework for ITS development. Any such framework should be neutral regarding domain or instructional theory.

There are two ways of defining hierarchies in such a framework. First, it is possible to think of a certain concept at different levels of details. For example, a topic can be considered at introduction, summary, teach, test, beginning, difficult, and similarlevels [35]. The contents of each topic can be associated with the topic level, and can be different at different topic levels. Likewise, teaching strategies and parameters of the student model can also be associated with the topic levels. Second, it is possible to design ITS components in such a way that they form a


hierarchical architecture, such as orthogonal architecture [11], [25]. For example, lessons can be designed as components at one layer of the architecture. Lessons are composed from topics, which may be defined at the same layer or at a lower layer. Topics consist of objectives and presentation contents, which may be defined at the adjacent lower layer. Finally, objectives and presentation contents can be defined as text, graphics, audio, and other elements at the lowest, most primitive layer.

2.2 Semantics and Hierarchies in the GET-BITS Model

This section illustrates how hierarchical modeling has been included into an existing model for ITS design and development, the GET-BITS model [11], [14]. The model has been derived from a more general hierarchical model of intelligent systems, that has been first applied in the manufacturing domain [10].

The GET-BITS model defines five levels of abstraction for designing ITSs, Table la. If necessary, it is also possible to define fine-grained sublevels at each level of abstraction. Each level has associated concepts, operations, knowledge representation techniques, inference methods, knowledge acquisition tools and techniques, and development tools. They are all considered as dimensions along which the levels can be analyzed, Table lb. The concepts of the levels of abstraction and dimensions have been derived starting from the orthogonal architecture.

Semantics of the levels of abstractions is easy to understand. In designing an ITS, there are primitives, which are used to compose units, which in tum are parts of blocks. Blocks themselves are used to build self-contained agents or systems, which can be further integrated into more complex systems. For getting a feeling for how the GET-BITS' levels of abstraction correspond to some well-known concepts from the ITS domain, consider the following examples. Primitives like plain text, logical expressions, attributes and numerical values are used to compose units like rules, frames, and different utility functions. These are then used as parts of certain building blocks that exist in every ITS, e.g. topics, lessons and teaching strategies. At the system level, we have self-contained systems or agents like explanation planners, student

198 v. Devedzi6 , D. Radovi6, and L. Jerini6

modeling agents, and learning actors, all composed using different building blocks. Finally, at the integration level there are collaborative learning systems, distributed learning environments, and Web-based tutoring systems.

Table 1. The GET-BITS model: (a) The levels of abstraction (b) Dimensions.

(a)

Level of Objective Semantics abstraction

Levell Integration Multiple agents or systems

Level 2 System Single agent or system

Level 3 Blocks System building blocks

Level 4 Units Units of blocks

LevelS Primitives Parts of units

(b)

Level of Dimensions abstraction Dl D2 ... Dn

Levell

Level 2

Level 3

Level 4

LevelS

It should also be noted that the borders between any two adjacent levels are not strict; they are rather approximate and "fuzzy." For example, a single ITS can be put at the system level, as a self-contained system. However, there are equally valid arguments for putting it at the integration level, since it integrates domain knowledge, a student model, and a pedagogical module. These three modules can be developed by different tools and made to interact at a higher level, as in [35] and [43]. Several other concepts can also be treated at different levels of abstraction.


The concepts, operations, methods, etc. at each level of abstraction can be directly mapped onto sets of corresponding components and tools used in ITS design. Table 2 shows some of these components and tools identified in the GET -BITS model, classified according to their corresponding level of abstraction and role in the ITS architecture.

The complexity and the number of these components and tools grow from the lower levels to the higher ones. Consequently, it is quite reasonable to expect further horizontal and vertical subdivisions at higher levels of abstraction in practical applications of the GET-BITS model for ITS design and development. Appropriate identification of such subdivisions for some particular issues of ITS design, such as collaborative learning and pedagogical agents, is the topic of our current research [11].

From the software design point of view, components and tools in Table 2 can be considered as classes of objects. It is easy to derive more specific classes from them in order to tune them to a particular application. The classes are designed in such a way that their semantics is defined horizontally by the corresponding level of abstraction and its sublevels (if any), and vertically by the appropriate key abstractions specified mostly along the concepts and knowledge representation dimensions. Class interface functions and method procedures are defined mostly from the operations and inference methods dimensions at each level. The knowledge acquisition and development tools dimensions are used to specify additional classes and methods at each level used for important ITS development tasks of knowledge elicitation, learning, and knowledge management. At each· level of abstraction, any class is defined using only the classes from that level and the lower ones. For example, the Lesson class at level 3 in Table 2 is defined using the Topic, Objective, Pedagogical point, Goal, Plan, Question, Exercise, and Quiz classes, as well as primitive data types, such as strings and numbers.


Table 2. The GET-BITS model: some components and tools for ITS design.

Level of Role Components and tools abstraction

1 - Integration Domain Curriculum composers, ontology knowledge editors

Pedagogical Communities of pedagogical agents, knowledge theories of instruction

Explanation Explanation composing tools for distributed learning environments

Student Multiple student models, group model models, cooperative student models,

shared student models

2 - System Domain Curriculum, pedagogical structure of knowledge the domain

Pedagogical Pedagogical agents, teaching knowledge planners, learning actors, learning

companions, troublemakers

Explanation Explanation planners, simulators, hint generators, example generators

Student Student modeling agents and tools model


Table 2. The GET-BITS model: some components and tools for ITS design (cont.).


3 - Blocks Domain Lesson, topic, objective, pedagogical knowledge point, goal, plan, question, exercise,

quiz

Pedagogical Teaching and learning strategies, knowledge hints, errors

Explanation Explanations (explanations of the knowledge elements, explanations of the learning process, explanations of the teaching strategies), examples, simulations

Student Overlay, enumerative, reconstructive, model generative

4 - Units Domain Rule, frame, picture knowledge

Pedagogical Problem/question templates, quiz knowledge templates, result checkers

Explanation Explanation templates, explanation presentation functions

Student State, operator, transition, problem model space, path, temporal belief,

misconception, conflict detector

202 v. Devediic , D. Radovic, and L. Jerinic

Table 2. The GET-BITS model: some components and tools for ITS design (cont.).


5 - Primitives Domain Slot, logical expression, clause knowledge

Pedagogical Exercise/problem difficulty, example knowledge suitability

Explanation Canned text, explanation criterion (what element to include in the explanation and what to skip), explanation detail (degree of details in the explanation), explanation type

Student State parameters, state transition model codes, learning speed, knowledge

level, current progress, level of concentration, level of performance, student's capacity

2.3 Discussion

The GET-BITS model is supported by a number of design patterns [17] and class libraries developed in order to support building of intelligent systems and ITSs in particular. In fact, designing and developing an ITS based on the GET-BITS model is a matter of first developing an ITS shell, and then using it for development of the ITS itself. In spite of the fact that this means starting the project without an ITS shell, it is a relatively easy design and development process, because of the precisely defined hierarchy among the tools and components, as well as the strong software engineering support of the design patterns and class libraries.

Along with the high modularity and reusability provided by the class libraries, potential design flexibility is another important advantage of using the GET-BITS model. Development of a GET-BITS-based ITS


shell means putting together only those pieces of software from the class libraries that are really needed for a given application. If any additional class is needed, it must be designed and developed by the shell developer. Fortunately, the class hierarchies and design patterns of GET-BITS provide a firm ground to start such an additional development. Most additional subclasses can be derived directly from some of the already existing classes. The classes of the GET -BITS model are designed in such a way to specify "concept families" using the least commitment principle: each class specifies only the minimum of attributes and inheritance links. That assures the minimum of constraints for designers of new classes.

As an example, consider the job of adding a new knowledge representation technique when needed. This task doesn't require significant changes in the corresponding module of the system (or the shell). It is rather a matter of finding out an appropriate place for the new class along the levels of abstraction and in the class hierarchies, and specifying a few additional attributes and links.

Finally, when developing an ITS shell, and then using it for development of the ITS itself, the shell's options are always only the necessary options. Modifications and extensions are made easily and only in accordance with the application's needs.

3 Interoperable and Reusable Software Components for ITS Design

The concept of interoperable and reusable software components has been largely used in the area of software engineering during the last decade (see, for example, [46]). However, it is only since recently that it draws significant attention in the community of researchers working in the area of ITSs (see, for example, [26] and [42]). Hence the purpose of this section is threefold:

1. It is supposed to describe from different viewpoints (architectural, design, software engineering, and utility) the concept of software components that may be useful for development of ITSs.


2. It is also intended to be a survey of important problems, questions and issues related to such components.

3. It should draw the reader's attention to the possibilities that component-software technology can offer to the field of ITSs.

3.1 Software Components from the ITS Design Perspective

Informally, a software component is a piece of a larger system that can be put in, taken out, and used together with other components to contribute to the global system's behaviour2• The following subsections describe the motivation and the need for software components in the ITS design process.

3.1.1 How Do We Usually Develop ITSs?

Many researchers have noted that current ITSs are usually built from scratch (see, for example, [23]). Moreover, knowledge embedded in ITSs does not accumulate well, and specifying functionalities of software modules of current ITSs often implies a lot of difficulties.

If an ITS shell or an authoring tools is used, the developer does have some software to start with, so strictly speaking it is not development from scratch. However, the developer is often constrained by the options offered by the shell/tool: there is usually a lot of unnecessary options and options that would be more useful if they could be modified one way or another, and some other options are usually missing.

3.1.2 What Would Be Nice When We Develop ITSs?

It would be very nice:

1. if we could easily assemble our ITSs, shells, authoring shells, agents, etc., from existing and pretested pieces of software, without the need to develop and implement them from scratch;

2 This has been the conclusion reached by a working group of ITS researchers, during the workshop "Issues in Achieving Cost-Effective and Reusable ITSs" that has been held during the AIED'97 conference in Kobe, Japan.


2. if we could have our shells and toolkits offering us only the tools and options that we really need; we don't want our shells and toolkits to lack a tool or an option that we really need in a given project, but we also do not need a whole bunch of unnecessary tools and options from them;

3. if we could easily replace any piece of software in an ITS by a similar (and possibly new) one, without any serious harm to the rest of the system; this would allow us, for example, to experiment with several versions of our system, each one having a certain functionality implemented in a different way (i.e., by a different piece of software);

4. if in order to develop a new piece of software that we find out to be necessary in our project (and it happens frequently) we could have some other piece of software to start with; the other piece of software should, of course, be logically and functionally similar to the desired one whenever it is possible;

5. if the existing pieces of software could be logically organized and catalogued in a repository (or repositories), such that we can easily access and check the pieces by means of a DBMS;

6. if we could easily enlarge the repository with new software we develop during our project, for later use in another similar project;

7. if we could automatically refresh and update the repository time after time, putting in it some new pieces and deleting some pieces of software that are no longer needed, based on the functionality and use-statistics information;

8. if the access to the repository could be as easy as possible, e.g. through Internet or an Intranet; in other words, if we could easily get, put, organize, and update software in a remote repository;

9. if pieces of software in such repositories were fully interoperable, i.e., able to execute on multiple platforms and to be easily combined with software developed by somebody else, in another programming language, etc.; in this way, when assembling an ITS, a shell, an agent, or another system, we wouldn't feel constrained by the hardware we use, the operating system installed on it, and so on.

206 V. Devediic , D. Radovic, and L. Jerinic

In short, it would be very nice if we could concentrate more on design of ITSs, and automate their implementation and maintenance as much as possible. It would be very nice if we could dedicate our work mostly to cognitive aspects, learning and teaching issues, and effectiveness of ITSs, and have most of the software changes in them done quickly.

3.1.3 Some Answers and Further Practical Questions

Component-based software design offers some answers and possibilities regarding the issues considered in the previous two subsections. Such design enables building systems from application elements (components) that were built independently by different developers using different languages, tools, and computing platforms. Once a sufficient number of software components are developed, the components can be put into a repository and catalogued. The repository could then be easily accessed from another site, enlarged by newly developed components, and updated by new versions of already existing components. Due to the interoperability of components, ITS developers could use the repository for building practical systems on a variety of existing hardware platforms. The choice of the operating system and programming language is also up to the developer.

While development of appropriate repositories of software components for building ITSs is still underway, the above description of components immediately raises several other questions, like:

• Are components, for example, a lesson, a topic, an objective, an exercise, a didactic tool, and a pedagogical point?

• Is an agent a component?

• Can a data object be a component?

• Are components services or ... ?

• Can we buy a software component in a software shop? If so, how are the components classified there?

In order to try to answer questions like these, we have to consider the features of components first.


3.1.4 Features of Software Components

The issues considered here are not the only ones that can be associated with the notion of components. However, these are considered to be necessary for a component specification.

Functionality. Any software component must be first considered from its functionality aspect; that is, what it does as a component of a larger (global) system, how it contributes to the overall system behavior? In other words, regarded as a standalone piece of software, what does it expect at its input, what output does it produce, and what conditions must be met in order to produce the expected output? For example, if a lesson is a component, what is it supposed to do in an ITS, and what are its I/O aspects as a software component?

However, it is already mentioned that specifying functionalities of components is not easy. The opinion of Mizoguchi et al. is that functionality of components should be described in terms of a common vocabulary [30]. However, the work on building such a vocabulary in the area of ITS is still underway.

Granularity. Each component is also featured by its granularity; that is, there are smaller and larger components, simple and complex components, atomic and aggregate components. In other words, simple components can be combined into a larger one in order to get a distinctly new (more complex) functionality. For example if there were components describing topics, they could be combined (together with some other components) into a lesson component.

Generality. There are more general and more specific components. Some components can be derived from more general ones by specifying additional features and/or functionalities. There are components that can be used in many different ITSs, while other components can be used only in a specific kind of ITSs. For example, and quite informally and intuitively, if a lesson is a component, we can think of an easy lesson as being derived from the lesson component.

Interaction and interoperability. Although each component has its standalone functionality, it usually communicates with other components of a larger system. It is therefore important to agree upon


an appropriate communication protocol for information exchange between components. This includes specification of data formats for input/output, timing and synchronization, conditions for interaction (if any), etc., and is of particular importance in distributed heterogeneous environments [49].

Reusability. This is one of the most important issues related to components for ITSs. In order to achieve full reusability of such components, generic, context-free components should be developed first, and then they can be tuned to a particular application [23]. For example, if lesson is a generic component, the component lesson with theorems could be easily derived from it [14].

Specification of components. Components are defined by their functionality in the first place. For example, if we think of a certain exercise in a particular ITS, then its functionality (assessment of the student's knowledge of a specific topic) defines it as a component sufficiently enough to be easily differed from other kinds of components, like lessons, topics, objectives, etc.

However, components also have their properties, i.e., their attributes and their functions. For the exercise component, obvious attributes are its difficulty, its result, and its prerequisites. Its important functions are the possibilities to show true result (on request from the student or from the pedagogical module) and show hint (if available). Full specification of any such component must include complete lists of its externally accessible attributes and its functions that can be used by other components in order to achieve the component's desired functionality.

Types of components. Types of components can be defined based on the types of their functionalities. The purpose of some components is to help assemble an ITS, regardless of the domain of knowledge that the students are supposed to learn from that ITS. These components are system components. They can be further divided into knowledge base components, student model components, and pedagogical components (see the next section).

We can also think of components that have domain-dependent functionalities which are essential for building an ITS in a given


domain, but can be useless for ITSs in other domains. Such components are domain components (e.g., algebraic equation component).

Finally, there are components whose primary purpose is to help communicate appropriate data and knowledge contents between ITSs and their users. They are called interface components (e.g., chart and diagram components).

3.1.5 How to Put Components Together?

In order to assemble an ITS from them, components must have functions that support the integration 'of the other (sub)components. There are two basic types of such functions:

1 .. functions providing means for data exchange between components;

2. functions providing means for aggregation of components.

In order to be able to specify such functions for any particular component, two kinds of considerations are necessary: architectural and communication aspects.

Architecture. This is still an open question. In general, significant amount of work has been done on software architectures for component-based systems (e.g., [49]). However, this kind of work in the context of ITSshas just begun (some examples can be found in [6], [12], [45], and [47]). One approach to this question is to use a layered software architecture for building component-based ITSs, such as the one used by the GET-BITS model [12]. Hierarchical modeling of component-based ITSs lets designers define a number of generic components at different levels of abstractions. Designers can then use these components for fine-tuning in practical applications, for defining more specific components by means of derivation and contents filling, and for defining more complex components at appropriate levels of abstractions by means of aggregation. The next section shows some examples of generic components identified in the GET-BITS model.

. Communication - How do these components communicate? The idea of having a component-based ITS implies that its components will be developed independently, in (possibly) different languages, on


(possibly) different machines, and using (most probably) different software development tools. Yet we have to put them all together, to make them communicate not only among themselves, but also with possibly quite different applications (like traditional databases, for example). If this is to be provided efficiently, some standards must be conformed to. Fortunately, such standards exist already. They specify standard interfaces for transparent object (component) communication, both on a single machine and in heterogeneous distributed environments. The most widely accepted standard in this regard to date is the CORBA standard developed by the Object Management Group (OMG) [49]. In the area of ITS, one of the first successful implementation in this sense has been the recently developed architecture for intelligent collaborative educational systems, proposed by Suthers and Jones [45].

3.2 Software Components in the GET-BITS Model

One of the goals of the GET-BITS model is to support design of component-based ITSs. An elaborated discussion on how software components are treated in GET-BITS is presented in [12]. A brief overview of it is given here.

Although repositories of software components for building ITSs are not widely available at the moment, GET-BITS identifies a number of generic components that are useful for developing a range of practical ITSs. Some of them are shown in Table 3. Note that they only roughly correspond to some items listed in Table 2, since a given class of objects does not necessarily evolve into a software component. In the context of GET -BITS, components for ITS user interfaces have not been considered yet.

3.3 Discussion

Two important facts come from the above subsections:

• specification of components for ITSs must be preceded by an agreement on a common vocabulary in the domain;

• components must be organized around a certain taxonomy.


Table 3. Partial lists of software components for ITSs (by ITS modules) in the GET-BITS model.

Domain Pedagogical Explanation Student model knowledge components components components components

Lesson Teaching strategy Explanation Motivation

Topic Teaching operator Example Concentration

Exercise Teaching planner Simulation Capacity

Question Path selector Hint generator Misconception

Goal Model of task Template Current difficulty knowledge

These facts bring us to the important question on the relation between components and ontologies. Since ontologies are discussed extensively in the next section, discussion of these facts is postponed until the end of the next section.

There are two other important open questions that need to be investigated in more details in future research and development efforts. One of them is related to the contents of components for ITS design. The question could be put simply as: In order for software components for ITS design to be really useful to designers, how specific they should be? Obviously, more general components are useful for a wider range of ITSs, but must also be further elaborated in any particular project. On the other hand, narrowing the contents of components in order to fine-tune them for specific kinds of ITSs may result in having a component that is widely ignored by the designers. In both cases, an agreement on the criteria of components usefulness is still lacking in the ITS community.

The other question is related to the problems of adhering to one standard for component development or another. Three standards are widely used today - CORBA components [49], lavaBeans [3], and Microsoft's COM/DCOMIOLE controls. In spite of the fact that, in theory, components should be fully interoperable and languageindependent, there are still practical problems in putting together components developed according to different standards [46], [49].

212 V. Devedzi6 , D. Radovi6, and L. Jerini6

4 Ontologies

An important general problem of ITS development is that of knowledge sharing and reuse [9], [23], [31], [33], [39], [45]. In spite of the fact that many useful ITSs have been developed so far, it is still a big problem to reuse in other (possibly similar) ITSs knowledge, control structures, and problem solving methods developed within a particular ITS project.

In order to provide knowledge sharing and reuse among different ITSs, explicit ontologies for the domain of ITSs should be defined. Such ontologies should provide a set of definitions of a common vocabulary that would be used and shared among community of agents participating in an ITS and its environment. All agents constituting an ITS and other agents which are collaborating with them, should commit to the same ontology. Since ontological commitment is an agreement to use a vocabulary in a way that is consistent (but not complete) with respect to the theory specified by an ontology, it makes possible for the agents to share the knowledge among themselves. This approach leads to cost-effective development of ITS and also reduces the diversity of concepts used in building learning systems.

4.1 Basic Concepts

In AI, ontology is defined as a specification of a conceptualization [20]. In other words, ontology defines all the concepts and their relations that exist for some agent or a community of agents for some area of a problem domain. When the knowledge of the domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. Ontology is often defined as a common vocabulary of formally represented knowledge, but it is much more than that. The vocabulary is language-dependent, so it lacks the universality, and also vocabulary is weak in description of the relations among the terms in it [30].

Introducing an ontology to a potential user means describing its taxonomy and its design in the first place. Formally, an ontology consists of terms, their definitions, and axioms relating them; terms are normally organized in a taxonomy. After defining ontologies for a


specific domain of discourse, we build agents that commit to the defined ontologies. This way we share knowledge with and among these agents.

Taxonomy is a very important issue, since it represents the contents of ontologies. There are three major approaches to the taxonomy issue [19]:

a. having the contents in a single tree-like concept hierarchy with multiple inheritance;

b. having several parallel dimensions along which one or more toplevel categories are sub-categorized; and

c. having a large number of small, local taxonomies, that may be linked together via relations or axioms.

As for the design of ontologies, there are general ontologies (like eyc, Sowa's, WordNet, KIF, etc.; see [16] for an overview), and domainspecific ontologies (like UMLS, GENS 1M, TOVE, Task ontology, EngMath, etc.; again, see [16] for an overview). It can be noticed that even the general ontologies don't model the "world" in the same way. There are three approaches to design of ontologies [16], [20]:

• bottom-up, starting from the most specific concepts and then grouping them in categories;

• top-down, starting from the most general concepts and creating categories; and

• middle-out, where development goes in both directions.

The development is usually based on text dictionaries while picking up all the nouns, verbs etc. [30].

Once having defined a powerful ontology for some domain of discourse, many reusable projects can arise from it. For example, the SHADE project has committed to the EngMath ontology, and also some more general ontology or domain specific ones can be created based on the EngMath ontology [19]. Other important research and practical issues in ontology design are [16]:

214 V. Devedii6 , D. Radovi6, and l. Jerini6

• Formal evaluation. It assumes evaluation of practical usefulness of the ontology, and is usually done in a form of a prototype to prove and test design ideas.

• Making ontologies sharable by developing common formalisms and tools.

• Developing contents of ontologies.

• Comparing, gathering, translating, and decomposing different ontologies.

• Integration of different ontologies, so that ontologies can use each other's concept definitions or domain definitions. It would be very beneficial if some standards emerge about content and representation specifications.

Ontology research is not only a pure theory in fundamental AI; rather, it is becoming a research field called "Ontology engineering" [32].

4.2 ITS Ontologies

Several researchers in the ITS domain have already made their extremely significant contributions to the field of ontologies [9], [23], [36], [37], [45]. As Mizoguchi et al. point out, the ITS ontology as a whole consists of domain ontology, which characterizes the domain knowledge, and task ontology, which characterizes the computational architecture of knowledge-based systems [31], [33]. Chen et al. have made an important contribution to the hierarchy of ontologies [9]. Ikeda et al. [23] and Suthers and Jones [45] have studied how the use of ontologies can contribute to the architecture of ITSs and ITS shells and authoring tools. Murray has defined the important topic ontology, based on topic types (e.g., concept, fact, principle, ... ), topic link types (e.g., is-a, part-of, prerequisite, context-for, ... ), and topic properties (e.g., importance, difficulty, ... ) [35], [36].

Some general implications for ITS design follows from all this work. First, in order to provide knowledge interchange and reuse between the agents constituting and cooperating in an ITS (pedagogical agent, student-modeling agent, user-interface agent, collaborative agents, etc.), it is necessary to represent the universe of discourse in a consistent and


coherent way. If the agents are to reuse a shared body of formally represented knowledge, there must be some agreement about a universe of discourse.

Second, it is often assumed that the ITS ontology consists of a number of separate ontologies, and that separate ontologies should be designed for each agent constituent [23], [36], [37]. Such design principle is already known and accepted in the ontology community. It is also suggested for ontologies to be available in small composable modules, so that the needed knowledge can be assembled [20]. All agents will commit to this ontology so that they can communicate about a universe of discourse, and their observable actions should be consistent with the definitions in the ontology.

Furthermore, it is often advocated that all ontologies should have the same starting point [40]. This means that the first step in representing such meta-knowledge concepts assumes having a unique formalism for representing all possible concepts. Such design provides knowledge interchange between agents inside an ITS and also between other agents wishing to communicate with the ITS. It also enables easy design of any new agents, according to the concept definitions and restrictions discussed above.

4.3 Ontologies in the GET-BITS Model

Designing ITSs using the GET-BITS model assumes adhering to the five levels of abstraction defined in Table 1. Consequently, in GETBITS an appropriate ontology level is assigned to each· level of abstraction, leading to a higher-level reusability of the model.

Furthermore, the ITS ontology is split into a number of separate ontologies for all constituent parts of an ITS (Pedagogical knowledge ontology, Student model ontology, Domain knowledge ontology, System ontology), Figure 3. Such design is approved and suggested in [16], in order to assemble each ontology'S knowledge and integrate those ontologies into more general ones, which is also one of the latest research topics in ontologies (see Section 4.1). So far, the work on ontologies in the GET -BITS model has been concentrated mostly on the

216 V. Devedfi6, D. Radovi6, and L. Jerini6

Student model ontology. Hence that ontology IS shown with more details in Figure 3.

.//---~~;~~.-.-.-... "'~

//~~:~:, /' 0"",," "\ / ~ knowledge \\

,f \ \ \. i ................................................................................... .

1 ................. .. ......... s..~udent model 1 \ ........ Student's "'ontologv i \ .' personal data Student's history . "" !

Psvchological \ ~ / \ / \ ........ \ / ~ ( Moliv Future 1

performance .!

ontologies

Intellectual capacity

Misconception

Training history

Figure 3. The ITS ontology in the GET-BITS model.

The Student model ontology in the GET -BITS model includes two other ontologies, Student's personal data and Student's history. This design enables, for example, the Student's personal data ontology to be integrated with some more general ontologies. Also, the fact that is of an extreme interest for any system that commits to this ontology is that it has to agree with the ontological commitment. So, if an ITS doesn't need to include the student's personal data, it is also not committed to it, due to the separate ontology design and ontology inclusion. If we had everything designed in a single ontology, then it would be difficult to obey any ontological commitments. All ontologies have a unique building formalism, so that they can exchange knowledge between each other (i.e., the agents committed to these ontologies can exchange the knowledge).

Ontologies in GET-BITS are designed using top-down approach - the most general concepts are defined first, and their sub-categories have


been derived. Also, there are a number of small local taxonomies that may be linked together via some concepts.

Concepts are defined in the form of formal sentences. They actually represent the sentences extracted from the real world, while modeling it. The major concepts from the GET -BITS model, expressed in natural language, are used for this purpose. These concepts are the following:

• nouns - represented as classes of objects (e.g., Intellectual capacity, Concentration, etc.);

• verbs - represented as method procedures assigned to a class or, stand-alone (Choose, Test, Suggest, etc.);

• . relations - represented as independent concepts (Less than, More, Is, Bigger than, etc.);

• attributes - represented as independent concepts as well (Significant, Beautiful, etc.);

• adjectives - represented as independent concepts as well (Unusual, Frequent, Fast, etc.);

• rules - built from the above concepts and used to represent the relations between concepts and the restrictions upon concepts.

These concepts are used to create various sentences for representing ontology definitions in the universe of discourse for each area of a problem domain. The rules are used for defining the ontology restrictions which are obtained analyzing the real world situations. Each agent commits to its ontology or to inclusions of ontologies. This ensures compatibility among the agents and uniformity in different ITSs. For example, the knowledge about the student's motivation with its restrictions will be presented in the same way in different ITSs committed to this ontology, and the knowledge sharing between different systems will be easy.

Ontologies are defined at each level of abstraction. In this way, each ontology assigned to a higher level of abstraction includes the taxonomies of the lower levels, and forms a set of inclusion lattices of ontologies.


4.4 Discussion

The main goal of the ITS ontology design is to provide theory of all vocabularies necessary for building a model of an ITS. Furthermore, those vocabularies should be easy to understand and use, and also they should be easy to integrate into some more general ones. For instance, it should be possible to incorporate the ontology of the psychological model of the student into any system based on psychology or any other intelligent system where such ontology is needed. Also, vocabularies should correspond to some emerging standards according to their area or domain (e.g., the Learner Model Standard evaluated by the Brad Goodman group, which will specify the syntax and semantics of the Learner Model and his knowledge/abilities [18]). Applying this standard to some ontology should result in having only one welldeveloped ontology that is widely accepted in the ITS community, so that the others can use it and integrate it in their projects.

Returning to the facts mentioned in the beginning of section 3.3, further discussion on the relation between software components and ontologies is needed. There is a significant commonality between these two concepts, although they are not the same. Questions that must be answered precisely are:

1. What is the correspondence between components and ontologies?

2. Can ontologies be components and vice versa?

As for the first question, both components and ontologies require common vocabulary and a certain organizational structure. On the other hand, ontologies are conceptually more abstract, in the sense that they define abstract concepts and relations between them in a problem domain, such as ITS. Components are more "down on Earth" things, being real software implementations of concepts and their functionalities at a certain level of abstraction and at a certain place in the overall software architecture. In GET -BITS, ontologies are, in a sense, a basis for component development since it is ontologies that define a certain conceptual relationship between components, i.e., the kind of relations and communication between software components [39]. For similar research ideas, see also [9], [23], [31], [33], [45].


The second question, in our opinion, requires more elaboration. As for now, it looks more or less obvious that components can be parts of ontologies. This is the only way the relation between components and ontologies has been treated in GET-BITS so far [39]. Ontologies are formalized structures (e.g., hierarchies and grids), and usually nodes or intersections of such structures represent concepts that can have more or less precisely defined function ali ties in terms of the vocabulary of the problem domain. It is also possible to develop a component that fully corresponds to certain ontology. For example, in the Eon system [35], there are "ontology objects." They are data objects, each of which defines a conceptual vocabulary for a part of the system. Topic Ontology objects are concrete examples of ontology objects for which corresponding software components can be developed. We also envision development of other software components corresponding to certain ontologies as a whole. In the context of GET -BITS, our efforts in this sense are just initiated towards development of the Student Model ontology [40]. It should be also noted that our experience shows that at a certain level of abstraction components need not necessarily fully correspond to ontologies or parts of ontologies. There are components shared by different domains and different ontologies.

5 Applications

This section illustrates how the three modeling techniques discussed above are used in practical applications. All the applications described are ITSs and ITS building tools based on the GET-BITS model.

5.1 The GET -BITS Tools

The GET-BITS model is supported by a number of practical tools for building ITSs. These tools are not integrated into a single shell or authoring tool. They are rather a collection of simple tools, collectively called the GET-BITS tools. They are used for building ITSs, ITS shells, and authoring tools. They include the COSMO tool for student modeling [39], the DON tool for ontology design (see Section 5.2) [40], and a number of other simple tools, like specialized editors, class libraries, and software components. In order to illustrate some


important details from these tools, the design of some types of knowledge elements is shown here in detail.

One of the key types of knowledge elements is the one for representing the lessons that students have to learn in a certain domain. It is assumed that each lesson is composed of several topics that the student must understand and adopt. Attributes of each lesson include its title, the topic being taught at a given moment (CurrentTopic), the current goal of the learning that has to be achieved according to a certain tutoring strategy (CurrentGoal), the student's prerequisite knowledge (StudentLevel), etc. They are all included in the Lesson class, which is designed as in Figure 4 (less important details are omitted).

Name: Visibility:

Lesson Exported

Cardinality: n Base class: Frame Derived classes: Interface

; visible outside the enclosing class category

; there can be more than one such object ; in general, a list of base classes ; in general, a list of derived classes

Operations:SetTopic, GetTopic, UpdateTopic, DeleteTopic, CreateTopicCollection, GetTopicCollection, ...

Implementation ' Uses: Topic, Goal, ... ; a list of classes, used by this one Fields: Title, CurrentTopic, CurrentGoal, StudentLevel,

TopicCollection_Ptr [ ], ... Persistency: Static ; disk files

Figure 4. Design of the Lesson class.

Another important type of knowledge is explanations generated by the system or required from the user. GET-BITS differs between several kinds of explanations (those presented to end-users - EndUserExplanation, those presented to ITS developers - DeveloperExplanation, those required from students when checking their knowledge - StudentExplanation, those concerned with explaining the system's functioning - SystemExplanation, those explaining various concepts or topics -ConceptExplanation and TopicExplanation, etc.). In generating explanations, dedicated GET-BITS tools can use knowledge from


various kinds of knowledge elements (rules, frames, knowledge chunks, etc.). The corresponding Explanation class is designed as in Figure 5.

Name: Explanation Visibility: Exported Cardinality: n ; there can be more than one such object Base class: Frame ; in general, a list of base classes Derived classes: EndUserExplanation, DeveloperExplanation,

StudentExplanation, SystemExplanation, PQExplanation, TopicExplanation, ...

Interface Operations:SetExplanation, GetExplanation, UpdateExplanation,

De leteExplanation, ... Implementation

Uses: Rule, Frame, K_chunk, Goal, Topic, ... Fields: CannedText, TopicCollection_Ptr [ J,

RuleCollection_Ptr [ J, ... Persistency: Static/Dynamic ; disk files for some parts only

Figure 5. Design of the Explanation class.

Name: Rule Visibility: Exported ; visible outside the enclosing class

category Cardinality: n ; there can be more than one such

object Base class: K_element ; in general, a list of base classes Derived classes: RuleCf, FuzzyRule, ActionRule, ... Interface

Operations:

Implementation Uses: Fields:

Persistency:

SetRule, GetRule, UpdateRule, DeleteRule, CreateRuleCollection, GetRuleCollection, AttachRuleToFrame, ...

K_chunk ; for If-clauses and Then-clauses RuleName, I/part, ThenPart Static ; disk files

Figure 6. Design of the Rule class.


The Rule class represents heuristic rules. That class is fairly general, and a number of more specific classes are derived from it (e.g., RuleCf, for representing rules with certainty factors). Figure 6 shows the design of the Rule class.

5.2 DON

In support of the process of building reusable ontologies in a costeffective way and to be able to evaluate the design principles discussed in sections 4.2 and 4.3, a PCIWindows tool named DON (Designer of ONtologies) has been developed [40]. DON is one of the GET-BITS tools. It is currently implemented in C++, and now is in a stage of translating it into Java. DON supports all three ontology design approaches (bottom-up, top-down and middle-out) and either single tree-like concept taxonomy, or having a large number of small local taxonomies that may be linked together, Figure 7.

repeattest

Personal S_Knowledg

Figure 7. Ontology design using the DON tool.


In DON, nouns are represented by frames, verbs are represented as method procedures, either assigned to a frame or stand-alone, relation and attribute concepts are represented as primitives (Level 5 concepts; see Tables 1 and 2). These simple concepts are used for building more complex sentences, like object-attribute-relation-value triplets (OARV). Such sentences are meant for formally representing real world sentences, with semantics understood by the application. The rule concept is also built from the above concepts and is used to represent the relations between concepts and the restrictions upon concepts. Rules can be either stand-alone or attached to some object frame (noun).

Such an approach to building ontology concepts lets the user define different concepts at any level of abstraction, and also makes the concept definitions reusable. The concepts from the lower, domainindependent levels can be used as building blocks for many upper-level, domain-dependent ontologies.

5.3 FLUTE

GET -BITS tools are used in developing FLUTE, an ITS in the domain of formal languages and automata. The idea of the FLUTE project is to develop sof+tware that supports systematic introduction of students into the system's domain, in accordance with both the logical structure of the domain and individual background knowledge and learning capabilities of each student. The system is discussed here only from the perspective of the three modeling techniques and the GET-BITS model. It is described in detail elsewhere [13].

The architecture of the FLUTE system is shown in Figure 8. The Expert module contains all of the domain-dependent knowledge:

1. the concepts, topics, facts and domain heuristics the student has to learn;

2. a database of examples used to illustrate the domain concepts, topics, etc.; and

3. the pedagogical structure of the domain.


Student

Figure 8. Architecture of the FLUTE system.

The pedagogical structure of the domain is considered a part of the domain knowledge rather than a part of the pedagogical module, as well as in [48]. In FLUTE, pedagogical structure of the domain is defined as a set of directed graphs showing explicitly precedence relationships of knowledge units within each lesson and among the topics of different lessons.

FLUTE always operates in one of the following three modes of operation: teaching, examination, and consulting. It is actually the Pedagogical module from Figure 8 that operates in one of these three modes. Note that the Explanation module is given a special place in FLUTE's architecture. In other ITSs, explanation generation is usually a part of the pedagogical module. However, early experiments with FLUTE have shown that most of the time students require explanations, i.e., work in the consulting mode. In fact, explanations play the major role and are a major constituent part of the whole system. That's why


they have been dedicated a special module. FLUTE's Explanation module tightly cooperates with the Pedagogical module in the consulting mode, in order to answer the student's questions and provide desired explanations [25]. Student model in FLUTE is an object of a class derived from the corresponding GET -BITS class.

All modules in FLUTE have been developed using the tools and components specified in Tables 1 and 2. The student model has been designed according to the Student model ontology, which has been developed using DON. An experimental version of the Formal language ontology is also used in FLUTE.

The following example illustrates how GET-BITS tools have been used in designing a shell to support development of FLUTE. A lesson in FLUTE is a meaningful subset of concepts, topics, facts and domain heuristics. These items in a lesson are closely coupled but they can refer to items in other lessons. Some important attributes of each FLUTE's lesson are sets of objectives and goals, sets of topics, concepts, facts, theorems, etc. taught in that lesson, a set of the corresponding teaching rules, and a set of associated problems (tests, questions and exercises). The Lesson class, as it is specified in GET-BITS and included in the current version of GET-BITS tools, supports most of the above attributes. However, when structuring the domain knowledge for implementing it in FLUTE, it turned out that many lessons could be better organized if the Lesson class had some additional features. Therefore a new class, T-Lesson, has been designed and included in the shell that is used for the development of FLUTE. The T-Lesson class supports using theorems in presenting a lesson and fine-tuning the presentation by showing/hiding theorem proofs, lemmas and corollaries (dedicated Boolean flags control this). It is shown in Figure 9.

This example simultaneously illustrates how computer-based tutoring and learning based on the GET-BITS model can be easily adapted to closely reflect the way human-based instruction is done in a given domain and given the student's background knowledge and goals. It is possible to control the setting of SkipProofs_Flag and SkipLC_Flag from the rules of the Pedagogical module. Among the other conditions and heuristics, pedagogical rules use the values of the relevant attributes of the student model in order to adapt the lesson presentation to each individual user.

226

Name: T-Lesson Base class: Lesson Derived classes: -Interface

v. Devedzic , D. Radovic, and L. Jerinic

Operations:SetTheorem, GetTheorem, DeleteTheorem, CreateTheoremCollection, GetTheoremCollection, ... , SetSkipProofs_Flag, SetSkipLC_Flag

Implementation Uses: Theorem Fields: SkipProofs_Flag, SkipLC_Flag

Persistency: Static ; disk files

Figure 9. Design of the T-Lesson class.

6 Conclusions The three modeling techniques for intelligent tutoring systems presented in this chapter - hierarchical modeling, reusable and interoperable software components, and ontologies - allow for easy and natural conceptualization and design of a wide range of ITS applications. They have been used in the domain of ITSs only since recently, and still require further elaboration. All three techniques suggest only general guidelines for developing ITSs, and are open for fine-tuning and adaptation to particular applications. ITSs developed using these techniques are easier to maintain and extend, and are much more reusable than other similar techniques and tools.

All these three modeling techniques are illustrated in the chapter by presenting their use within the GET-BITS model of ITSs. The model starts from the general object-oriented design principle, and is fully reusable and extensible. It has been already used for building several successful applications.

Hierarchical modeling is particularly suitable for use by ITS shell developers. Starting from a library of classes for knowledge representation and control needed in the majority of ITSs, it is a straightforward task to design additional classes needed for a particular shell. Hierarchical modeling also supports development of component-


based ITSs, which have started to attract increasing attention among the researchers in the field.

Component-based design of ITSs can bring a number of benefits do the developers, including enhanced reusability, ease of development, modifications, and maintenance, enhanced interoperability, and further practical support for knowledge sharing (together with ontologies). There is a large degree of correspondence between components and ontologies, and both require agreement on the vocabulary of the domain of ITS, the work that is already underway by several researchers and research groups.

Recent research and development efforts have managed to bring some results regarding the ontologies for ITSs and initial taxonomies of the domain. Proposals of ontology standards will be soon installed on the Web, and should be a guideline for any ITS builder. Furthermore, ITS ontologies will enable knowledge interchange and reuse between different ITSs. Since many ITS systems have much domainindependent knowledge that should be reused, the process of building component-based ITSs that commit to the specified ontologies would be more simplified and standardized.

Further development of these modeling techniques should be concentrated on development of appropriate fine-grained hierarchical levels of abstraction, as well as classes and components to support a number of more detailed concepts. It requires a lot of work in further studying and development of taxonomies and ontologies at different levels. Again, a mandatory prerequisite is the adoption and standardization of a common vocabulary in the field.

Another objective of further research and development of further development of the three modeling techniques is the question of the contents of components for ITS design. It is tightly coupled with the development of ontologies for different aspects of ITSs. In spite of considerable research efforts in that area, many elaborated and practical solutions are still to come. Another interesting open question concerns the relationship between software components and ontologies, which still needs to be precisely defined.


References

[1] Aimeur, E., et al. (1997), "Some Justifications for the Learning by Disturbing Strategy," in: du Boulay, B., Mizoguchi, R. (eds.): Artificial Intelligence in Education, lOS Press, Amsterdam / OHM Ohmsha, Tokyo, pp. 119-126.

[2] Anderson, J.R., Boyle, C.F., Corbett, AT., and Lewis, M.W. (1990)"Cognitive Modelling and Intelligent Tutoring," Artificial Intelligence, Vol.42, No.1, pp. 7-49.

[3] Arnold, K. and Gosling, J. (1996), The Java Programming Language, Addison-Wesley, Reading, MA

[4] Batory, D. and O'Malley, S. (1992), "The Design and Implementation of Hierarchical Software Systems with Reusable Components," ACM Transactions on Software Engineering and Methodology, Vol. 1, No.4, pp. 355-398.

[5] Booch, G. (1994), Object-Oriented Analysis and Design with Applications, 2nd Edition, Benjamin/Cummings Publishing Company, Inc., Redwood City, CA

[6] Brusilovsky, P., Ritter, S., and Schwarz, E. (1997), "Distributed Intelligent Tutoring on the Web," in: du Boulay, B., Mizoguchi, R. (eds.): Artificial Intelligence in Education, lOS Press, Amsterdam / OHM Ohmsha, Tokyo, pp. 482-489.

[7] Brusilovsky, P. (1998), "Adaptive Educational Systems on the World-Wide-Web: A Review of Available Technologies," Proceedings of the 1998 Workshop on Intelligent Tutoring Systems on the Web, San Antonio, Texas, USA

[8] Bull, S. and Broady, E. (1997), "Spontaneous Peer Tutoring from Sharing Student Models," in: du Boulay, B., Mizoguchi, R. (eds.): Artificial Intelligence in Education, lOS Press, Amsterdam / OHM Ohmsha, Tokyo, pp. 143-150.


[9] Chen, W., Hayashi, Y., Kin, L., Ikeda, M., and Mizoguchi, R. (1998), "Ontological Issues on an Intelligent Authoring Tool," Proceedings of The ECAI'98 Workshop on Model-Based Reasoning for Intelligent Education Environments, Brighton, England.

[10] Devedzic, V. and Radovic, D. (1999), "A Framework for Building Intelligent Manufacturing Systems," accepted for publication in IEEE Transactions on Systems, Man, and Cybernetics (to appear).

[11] Devedzic, V. (1998), "Components of Pedagogical Knowledge," Proceedings of The Fourth World Congress on Expert Systems, WCES4, Vol.2, Mexico City, pp. 715-722.

[12] Devedzic, V., Radovic, D., and Jerinic, Lj. (1998), "On the Notion of Components for Intelligent Tutoring Systems," in: Goettl, B.R., Halff, H.M., Redfield, c.L., Shute, V.J. (eds.): "Lecture Notes in Computer Science, 1452," Proceedings of The Fourth International Conference on Intelligent Tutoring Systems, ITS'98, San Antonio, Texas, USA, Springer-Verlag, NY, pp. 504-513.

[13] Devedzic, V. and Debenham, J. (1998), "An Intelligent Tutoring System for Teaching Formal Languages," in: Goettl, B.R., Halff, H.M., Redfield, c.L., Shute, V.J. (eds.): "Lecture Notes in Computer Science, 1452," Proceedings of The Fourth International Conference on Intelligent Tutoring Systems, ITS '98, San Antonio, Texas, USA, Springer-Verlag, NY, pp. 514-523.

[14] Devedzic, V. and Jerinic, Lj. (1997), "Knowledge Representation for Intelligent Tutoring Systems: The GET-BITS Model," in: du Boulay, B., Mizoguchi, R. (eds.): Artificial Intelligence in Education, lOS Press, Amsterdam / OHM Ohmsha, Tokyo, pp. 63-70.

[15] Frasson, c., Mengelle, T., and Aimeur, E. (1997), "Using Pedagogical Agents in a Multi-Strategic Intelligent Tutoring System," Proceedings of the AIED'97 Workshop on Pedagogical Agents, Kobe, Japan, pp. 40-47.

230 v. Devedzic , D. Radovic, and L. Jerinic

[16] Fridman-Noy, N. and Hafner, C.D. (1997), "The State of the Art in Ontology Design," AI Magazine, Fall 97, pp. 53-74.

[17] Gamma, E., Helm, R, Johnson, R, and Vlissides, J. (1994), Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, Reading, MA.

[18] Goodman, B., et aI., "Encouraging Student Reflection and Articulation Using a Learning Companion," in: du Boulay, B., Mizoguchi, R (eds.): Artificial Intelligence in Education, lOS Press, Amsterdam / OHM Ohmsha, Tokyo, pp. 151-158.

[19] Gruber, T. and Olsen, G. (1994), "An Ontology for Engineering Mathematics," Proceedings of The Fourth International Conference on Principles of Knowledge Representation and Reasoning, Bonn, Germany, pp. 137-144.

[20] Gruber, T. (1993), "Toward Principles for the Design of Ontologies Used for Knowledge Sharing," in: N. Guarino, R Po Ii (eds.), Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer Academic Publishers, Amsterdam.

[21] Hietala, P. and Niemirepo, T. (1997), "Collaboration with Software Agents: What if the Learning Companion Agent Makes Errors?," in: du Boulay, B., Mizoguchi, R (eds.): Artificial Intelligence in Education, lOS Press, Amsterdam / OHM Ohmsha, Tokyo,pp.159-166.

[22] Hoppe, H.U. (1995), "The Use of Multiple Student Modelling to Parameterize Group Learning," Proceedings of the i h World Conference on Artificial Intelligence in Education, Washington D.C., USA, pp. 421-428.

[23] Ikeda, M., Kazuhisa, S., Mizoguchi, R (1997), "Task Ontology Makes It Easier To Use Authoring Tools," Proceedings of The Fifteenth International Joint Conference on Artificial Intelligence, Nagoya, Japan.


[24] Ikeda, M. and Mizoguchi, R (1994), "FITS: A Framework for ITS - A Computational Model of Tutoring," Journal of Artificial Intelligence in Education, Vo1.5, No.3, pp. 319-348.

[25] Jerini6, Lj. and Devedzi6, V. (1997), "OBOA Model of Explanation in an Intelligent Tutoring Shell," ACM SIGCSE Bulletin, Vo1.29, No.3, pp. 133-135.

[26] Koedinger, K.R, Suthers, D.D., and Forbus, K.D. (1998), "Component-Based Construction of a Science Learning Space," in: Goettl, B.R, Halff, H.M., Redfield, C.L., Shute, V.J. (eds.): "Lecture Notes in Computer Science, 1452," Proceedings of The Fourth International Conference on Intelligent Tutoring Systems, ITS'98, San Antonio, Texas, USA, Springer-Verlag, NY, pp. 166-175.

[27] Kong, H.P. (1994), "An Intelligent, Multimedia-Supported Instructional System," Expert Systems with Applications, Vol. 7 , No.3, pp. 451-465.

[28] Lajoie, S. and Derry, S., eds. (1993), Computers as Cognitive Tools, Lawrence Erlbaum Associates, Hillsdale, NJ.

[29] Mengelle, T., de Lean, c., and Frasson, C. (1998), "Teaching and Learning with Intelligent Agents," in: Goettl, B.R, Halff, H.M., Redfield, c.L., Shute, V.J. (eds.): "Lecture Notes in Computer Science, 1452," Proceedings of The Fourth International Conference on Intelligent Tutoring Systems, ITS '98, San Antonio, Texas, USA, Springer-Verlag, NY, pp. 284-293.

[30] Mizoguchi, R, Sinitsa, K., and Ikeda, M. (1996a), "Task Ontology Design for Intelligent EducationallTraining Systems," Proceedings of the Workshop "Architectures and Methods for Designing CostEffective and Reusable ITSs, " Montreal, Canada.

[31] Mizoguchi, R, Sinitsa, K., and Ikeda, M. (1996b), "Knowledge Engineering of Educational Systems for Authoring System Design - Preliminary Results of Task Ontology Design," Proceedings of


The European Conference on Artificial Intelligence in Education, Lisbon, Portugal.

[32] Mizoguchi, R and Ikeda, M. (1996), "Towards Ontology Engineering," Technical Report AI-TR-96-1, ISIR, Osaka University, Japan, 1996.

[33] Mizoguchi, R, Tijerino, Y., and Ikeda, M. (1995), "Task Analysis Interview Based on Task Ontology," Expert Systems with Applications, Vol.9, No.1, pp.15-25.

[34] Muhlenbrock, M., Tewissen, F., and Hoppe, H.U. (1997), "A Framework for Intelligent Support in Open Distributed Learning Environments," in: du Boulay, B., Mizoguchi, R (eds.): Artificial Intelligence in Education, lOS Press, Amsterdam / OHM Ohmsha, Tokyo, pp. 191-198.

[35] Murray, T. (1997), "Authoring Knowledge Based Tutors: Tools for Content, Instructional Strategy, Student Model, and Interface Design," Submitted to the Journal of the Learning Sciences, http:// www.cs.umass.eduJ-tmurray/.

[36] Murray, T. (1996), "Toward a conceptual vocabulary for intelligent tutoring systems," Working paper available at http://www.cs.umass.edul-tmurray/papers.html.

[37] Paiva, A. (1996), "Learner Modelling Agents," Proceedings of the European Conference on Artificial Intelligence in Education, Lisbon, Portugal.

[38] Paiva, A. (1997), "Learner Modelling for Collaborative Learning Environments," in: du Boulay, B., Mizoguchi, R. (eds.): Artificial Intelligence in Education, lOS Press, Amsterdam / OHM Ohmsha, Tokyo, pp. 215-222.

[39] Radovi6, D., Devedzi6, V., and Jerini6, Lj. (1998), "ComponentBased Student Modeling," Proceedings of the Workshop on Current Trends and Applications of Artificial Intelligence in Education, Mexico City, pp. 73-82.


[40] Radovi6, D. and Devedzi6, V. (1998), "Towards Reusable Ontologies in Intelligent Tutoring Systems," Proceedings of the CONTI'98 Conference, Timisoara, Romania (to appear).

[41] Rajlich, V. and Silva, J.H. (1996), "Evolution and Reuse of Orthogonal Architecture," IEEE Transactions on Software Engineering, Vo1.22, No.2, pp. 153-157.

[42] Ritter, S., Brusilovsky, P., and Medvedeva, O. (1998), "Creating More Versatile Intelligent Learning Environments with a Component-Based Architecture,". in: Goettl, B.R., Halff, H.M., Redfield, c.L., Shute, V.J. (eds.): "Lecture Notes in Computer Science, 1452," Proceedings of The Fourth International Conference on Intelligent Tutoring Systems, ITS'98, San Antonio, Texas, USA, Springer-Verlag, NY, pp. 554-563.

[43] Shute, V. (1995), "SMART: Student Modeling Approach for Responsive Tutoring," User Modeling and User-Adapted Interaction, Vo1.5, No.1, pp. 1-44.

[44] Stern, M.K. and Woolf, B.P. (1998), "Curriculum Sequencing in a Web-Based Tutor," in Goettl, B.R., Halff, H.M., Redfield, c.L., Shute, V.J. (Eds.), "Lecture Notes in Computer Science, 1452," Proceedings of The Fourth International Conference on Intelligent Tutoring Systems, ITS '98, San Antonio, Texas, USA, SpringerVerlag, NY, pp. 574-583.

[45] Suthers, D. and Jones, D. (1997), "An Architecture for Intelligent Collaborative Educational Systems," in du Boulay, B., Mizoguchi, R. (Eds.), Artificial Intelligence in Education, lOS Press, Amsterdam / OHM Ohmsha, Tokyo, pp. 55-62.

[46] Szyperski, C. (1998), Component Software: Beyond Object-Oriented Programming, ACM Press/Addison-Wesley, NY /Reading, MA.

. [47] Van Joolingen, W., King, S., and De Jong, T. (1997), "The SimQuest Authoring System for Simulation-Based Discovery Learning," ," in du Boulay, B., Mizoguchi, R. (Eds.), Artificial


Intelligence in Education, lOS Press, Amsterdam / OHM Ohmsha, Tokyo, pp. 79-86.

[48] Vassileva, J. (1990), "An Architecture and Methodology for Creating a Domain-Independent, Plan-Based Intelligent Tutoring System," Educational and Training Technology International, Vo1.27, No.4, pp. 386-397.

[49] Vinoski, S. (1997), "CORBA: Integrating Diverse Applications Within Distributed Heterogeneous Environments," IEEE Communications Magazine, Vol. 14, No.2, pp. 28-40.

[50] Wang, H. (1997), "LeamOOP: An Active Agent-Based Educational System," Expert Systems with Applications, Vol. 12, No.2, pp. 153-162.

[51] Wenger, E. (1987), Artificial intelligence and tutoring systems: Computational approaches to the communication of knowledge, Morgan/Kaufmann Publishing Co., Los Altos, CA.

[52] Wong, L.-H., Looi, c.-K., and Quek, H.-C. (1996), "Design of an ITS for Inquiry Teaching, Proceedings of The Third World Congress on Expert Systems, Seoul, Korea, pp. 1263-1270.

[53] Woolf, B.P. (1992), "AI in Education," in: Encyclopedia of Artificial Intelligence, 2nd Edition, John Wiley & Sons, NY, pp. 434-444.

CHAPTER 7

TEACHING COURSE ON ARTIFICIAL NEURAL NETWORKS

J. Fulcher School of Information Technology and Computer Science

University of Wollongong NSW 2522 Australia

[email protected]

The more commonly used Artificial Neural Network models are first characterized. These characteristics - training parameters and the like -are related to high-level language constructs (C/C++). The necessity of Graphical User Interfaces, from an educational perspective, is highlighted. Experiences are then recounted gained from a decade of teaching a graduate-level course on ANNs. Representative public domain and commercial ANN software simulators are covered (some of the former types accompanying ANN textbooks). Particular emphasis is placed on BackPropagationlMulti-Layered Perceptrons using NeuralWare software.

1 Background Theory Artificial Neural Networks (ANNs) are simplified models based on (inspired by) biological neural networks, i.e., brains. Despite their rudimentary and simplistic nature, they nevertheless exhibit characteristics in common with their more elaborate biological counterparts, such as associativity, self-organization, generalizability, noise- and fault-tolerance. Significantly, another important characteristic they share in common is that no one really understands their inner workings. Despite this, many useful results have been produced to date using such simplistic models. The simplified nature of ANN models, whilst not actually observed in nature, can nevertheless lead to effective devices in their own right

236 J. Fulcher

Fundamentally, ANNs comprise a large number of (analog) neurons, connected together by an even larger number of Interconnections (weights). In computer architecture terms, they can be regarded as massively fine-grained parallel systems. Unlike with conventional computers however, they are not programmed in a high-level language in order to realize algorithmic solutions to problems of interest. Rather, they "learn" associations between input and output patterns (vectors) presented to the network over time. In supervised ANN s, the error differences between actual and desired outputs are used to adjust the internal weights. "Learning" corresponds to this error difference falling to an acceptable level (which ideally indicates system convergence to a local - preferably the global - minimum). ANNs typically take a long time to train, but once trained respond virtually instantaneously to new inputs presented to them.

Not all applications lend themselves to solutions involving ANNs. If they are, success often depends on obtaining a sufficient number of (representative) input-output training pattern pairs (exemplars).

Preprocessing of this labeled data into an appropriate form prior to training is also critical. Such preprocessing could range from scaling (to exploit the full range of the neuron activation function), filtering/thresholding (to remove background noise), to input fuzzification (into overlapping fuzzy sets), through to the application of sophisticated techniques like Kalman Filtering of speech data, or Fourier Transforms in the case of images. Indeed the latter could involve even more complex preprocessing, such as edge detection and/or feature extraction [1].

With some applications, only input data might be available - in such cases unsupervised (self-organizing) ANNs can yield useful results.

Generally speaking however, the types of problems with which ANNs are known to excel are pattern classification and/or association.

Figure 1 shows the simplified McCulloch and Pitts model of an individual neuron [2]. The neuron output (axon) "fires" or changes state (assuming binary-level outputs), whenever the weighted sum of the inputs (dendrites) exceeds a certain preset threshold (bias) level. Note that the neuron activation functionjis a non-linear one (a common one

Teaching Course on Artificial Neural Networks 237

being sigmoid: 1/(1+e-input), which maps all inputs in the range -00 .. +00 within the interval 0 .. 1 (if bipolar representation is preferred: ±1, then hyperbolic tangent is more appropriate).

Xl

X2-l ...... Zr--. , J

cell (neuron)

body

: I -" Xn-l Z~~

;;> dendrite Wn

adaptive weights

f

threshold (bias)

axon /----+ Yi

Figure 1. McCulloch and Pitts Neuron Model.

1.1 Multiple Layer Networks

Entire networks of such neurons are easily constructed by connecting the outputs OJ directly to (multiple) inputs Xl...i. Usually such networks are not fully connected. Moreover, the neurons are arranged in layers, as indicated in Figure 2. The particular configuration shown here uses three layers: input, "hidden" and output (M:N:P). Moreover, all of the interconnections are in a forward direction only. Such multi-layering is not observed in biological neural networks, by the way.

Supervised learning is appropriate for this type of multi-layered, feedforward network or Multi-Layer Perceptron (the latter being an historical reference to an earlier 2-layer model [3,4]). Training consists of repeatedly presenting input-output pattern pairs, and allowing the internal weights to be adjusted as a function of the error signal (= difference between actual and desired outputs). More specifically, the weight change from one iteration to the next is chosen to be proportional to the (negative) gradient of the total error:

wn+! = Wn -l1VE (1)

where 11 = learning rate parameter (0 .. 1)

238

Input Vector

connections

hidden layer

errors (actual- desired)

Figure 2. Multi-Layered (feedforward) Perceptron.

J. Fulcher

Output Vector

Unfortunately, by the time we have presented all training exemplars (one epoch), the internal weights will have been pulled in quite different directions from their initial values. However, if we repeat this process, the training procedure is eventually guaranteed to converge (perhaps not within an "acceptable" time frame though!). Actually, the weight adjustment is a two-stage process - firstly the weights joining neurons in the hidden layer to the output layer are adjusted, then the weights between the input and hidden layers are altered. The following weight update rule applies for (error) BackPropagation - specifically, for the weight connecting neuron#i to neuron#j, for pattern pair#P [5]:

(2)

For weights connecting nodes (neurons) in the hidden layer to nodes in the output layer, ot = (dt - yt)*F'(xt), where F' is the derivative of the sigmoid activation (transfer) function = yt(1-yt) (actually one reason for choosing a sigmoid activation function is that it possesses an easily computed first derivative).


For weights connecting nodes in the input layer to nodes in the hidden layer, this delta (difference) term is determined recursively, as follows:

(3)

Minimization of the error difference corresponds to gradient descent in weight space. Moreover, in common with global optimization problems generally, the BackPropagation algorithm takes a long time to converge in practice. Step size is therefore a compromise - too large and we could first overshoot then oscillate about the minimum; too small would lead to unacceptably long convergence times. Furthermore, we could become stuck in a local, rather than the global minimum. In some cases this is not critical - convergence to a nearby local minimum can yield the desired network performance.

It can be instructive to visualize such gradient descent in weight space much as a spherical ball would roll over a 3-dimensional physical surface. Of course this only applies to networks comprising two weights, with the squared-error plotted along the z-axis, and the weights along the x- and y-axes, as indicated. In some applications, this error surface would correspond to a simple paraboloid and comprise a single (global) minimum. In other applications it could comprise several minima. Taking the "ball" analogy a stage further, notice that a large ball (step size) would reach the minimum faster than a small one; by the same token if it approached the minimum with too much momentum, it could overshoot and indeed continue to oscillate about that point.

In order to speed up convergence(training), there are several modifications that can be made to the basic BP algorithm described earlier. For instance, we could add a momentum term (a) proportional to the previous weight change. Alternatively, we could incorporate knowledge about the second derivative - in other words, how fast(slow) the weights are changing. We could also resort to numerical approximations in order to speed up calculations. Indeed, numerous improvements to BP have been proposed over the years.

We conclude from the above discussion that the important parameters in gradient descent are step size and learning rate. These are readily related to High-Level (programming) Language constructs, most

240 J. Fulcher

notably C (procedural) and C++ (object oriented). For example, Blum defines four fundamental C++ classes: vector, matrix, vector pair and neural network [6]. The BackPropagation network class is a subset of the latter, and contains the following parameters (in bp. def): inputs, hidden, outputs, rate, momentum and tolerance. Three other files are needed - training (fact * . fct) and test (* . in and * . out). Masters and Welstead, by contrast, focus on C-code implementations, although the latter's Graphical User Interface is implemented in C++ [7, 8]. Other, more general textbooks often incorporate source code listings of the more popular ANN algorithms (typically BP).

Many other ANN models besides MLPIBP have been successfully applied to a wide range of applications. ANNs can be classified along a number of different dimensions. Network topologies can employ feedforward, lateral and/or feedback (recurrent) connections (weights). It should be pointed out that in biological systems, localized rather than fully connected networks are more usual. Learning paradigms can be supervised, unsupervised or reinforcement. The popular MLPIBP described earlier is a supervised, feedforward network. Unsupervised networks produce their own classifications, without the assistance of a teacher(supervisor). In practice, such classifications mayor may not be meaningful. Reinforcement networks adjust their weights in order to maximize some scalar performance index.

Some of the more popular ANN models are summarized in Table 1 [9]. We see that General Regression Networks, as their name suggests, perform well at prediction, yet are quite poor classifiers. Conversely, Kohonen's (supervised) Learning Vector Quantization is good for classification, yet not very good at prediction. The overall "generalpurpose" ANN model, according to Table 1, turns out to be MLPIBPin other words, a good classifier and a good predictor.

2 Lectures A Computer Science honors/graduate subject - CSCI964 Neural Networks - has been taught for the past decade at the University of Wollongong. This single semester subject comprises two hours of lectures per week, together with several laboratory assignments. Students are allotted two weeks in which to complete each of the latter.


Table 1. ANN Model Comparison.

ANN Model Classification Prediction Multi Layered Perceptron! Good V.Good BackPropagation Network

CounterPropagation Good Poor Fuzzy ARTMap Good Poor

General Regression V.Poor V.Good Learning Vector Quantization V.Good V.Poor

Radial Basis Function Good Good Reinforcement Fair Fair

Self Organizing Map Good Fair

Conventional lecturing techniques have been employed up until very recently (whitelblackboard + overhead projector slides). In 1996, electronic copies of each weekly lecture were placed on the lecturer's web site - http://www.itacs.uow.edu.au/people/john. Late in 1997, the School of Information Technology and Computer Science was relocated into a new building - one housing lecture rooms with inbuilt computer-controlled projectors and video players. Accordingly, for 1998, PowerPoint versions of the slides used in lectures were placed on servers accessible to students, and which they were able to read at their leisure (since MicroSoft Corp. allows PowerPoint Viewer to be freely distributed).

The syllabus covered in the CSI964 subject is as follows:

• Biological neurons (cell, synapses, threshold, firing) • Origins of Neural Computing (Hebbian Learning, McCulloch

and Pitts Model, Rosenblatt's Perceptron, Widrow and Hoff Adaline, deltalLeast Mean Squared learning)

• Supervised Networks (Multi-Layered Feedforward, MLP, (error) BackPropagation, gradient descent in weight space, convergence, learning rate, improvements to BP)

• Recurrent Networks (Hopfield, Boltzmann Machine, simulated annealing, Jordan, Elman)

• Unsupervised Networks (competitive learning, Kohonen's SelfOrganizing Map, Grossberg's Adaptive Resonance Theory, Kosko's Bidirectional Associative Memory)

242 J. Fulcher

• Other ANN models (Time Delay, CounterPropagation, Radial Basis Function, Neocognitron, Wisard, reinforcement)

• ANN applications (image processing, character recognition, stock market prediction)

• Fuzzy Logic, NeuroFuzzy Systems, Genetic Algorithms • Connectionism versus traditional (heuristic, rule-based)

Artificial Intelligence, hybrid neural expert systems • Hardware realization of ANNs (analog/digital VLSI, optical)

Student feedback on this subject has been consistently positive; they especially appreciate the opportunity to work on representative ANN applications.

3 Textbooks Many textbooks have been evaluated over the years. Early on, Beale and Jackson was selected as the prescribed text [10]. Other suitable alternatives were identified as Aleksander and Morton [11], Dayhoff [12], Pao [13] and Wasserman [14], see [15]. Potential textbooks have continued to be evaluated ever since. Two comparative reviews were published in ACM Computing Reviews as a result [16], [17]. Although Beale and Jackson is still recommended, in more recent times the books by Wasserman [18] and Haykin [19] have been adopted (the latter, whilst comprehensive, is a little expensive from a student's perspective however). Other suitable books can be gleaned from the network newsgroup camp. ai. neural-nets (alternatively http://tp . sas. cam/ pub/neural /FAQ. html)- FAQ, Part 4.

4 Laboratory Component Now while ANNs are fundamentally massively parallel (analog) hardware devices, they have been primarily implemented to date by way of software simulations on general-purpose (digital) computers.

In common with other Computer Science subjects taught at the University of Wollongong, CSCI964 Neural Networks is laboratorybased. We have believed for some time that students learn best by doing (including learning from their mistakes). Such an approach is consistent with the situated learning model [20]-[22].


4.1 Pedagogical Context

In the constructionist view of learning, focus is placed on the learner's stages of development, from childhood through to adolescence [23]. The mode of learning is thought to be one of perturbation of existing conceptual understanding. In so doing, the learner develops conceptual networks and/or schemata which serve as the basis for further extension of their existing knowledge; they also serve as a foundation for the development of ever more sophisticated cognitive skills.

In his post-modernist critique of modern rationalism, Oakeshott [24] points out that there are in fact two aspects to knowledge, namely information and judgement; the former comprises facts and rules (theory), whereas the latter consists of practical knowledge. Moreover, such knowledge cannot be taught, but is imparted via apprenticeship -in other words, learning by doing. Situations are seen as co-producing knowledge through activity [20]. This leads naturally to the basic tenets of situated learning, or as Lave and Wegner prefer "legitimate peripheral participation" [21].

The situated learning model comprises eight key components, namely [20]:

1. stories 2. reflection 3. cognitive apprenticeship 4. collaboration 5. coaching 6. multiple practice 7. articulation of learning skills 8. technology

In the context of this chapter, multiple practice is singled out as being the central tenet of this model. Skills are honed through practice, where the student moves toward "flying solo," without the support of a teacher or coach.

In situated learning, knowledge is viewed as a product of the activity, context and culture in which it is developed and used. The key mechanism of transformation in this model is the gradual assimilation

244 J. Fulcher

of the learner into a "culture of expert practice". In other words, learning involves embodied, historically-rooted communities of practice.

In the context of this chapter, this "culture of a knowledge domain" is embedded within the specific ANN software simulator used to support the laboratory component of the CSCI964 subject. More specifically, this community of practice is encoded within the defaults of NeuralWorks Professional-II+ (Section 4.3). Time spent in undertaking assignments has the added benefit of familiarizing students with these defaults (or culture of expert practice).

In keeping with the above philosophy, students are required to undertake four laboratory assignments, which constitute 40% of the subject assessment (the remaining 60% comes from a final written examination) .

Here we have a fundamental problem, because of the varying backgrounds of students enrolled in CSCI964. Our own honors students have a good background in Unix and C/C++; by contrast, some of our full fee-paying overseas coursework Masters students don't. The subject is also offered to non-CS students within the Faculty of Informatics (such as Mathematics/Statistics and Computer Engineering). It is therefore unreasonable to expect all students enrolled in the subject to be able to write their own ANN simulator software. In any case, we desire to keep the focus on ANNs, rather than focusing on this as a programming exercise per se.

On occasion we have set such a programming task as a semester assignment, however this has proved less than satisfactory. Other semester assignments have been attempted over the years, including a written essay on the application of ANNs to a problem area of interest, or alternatively the evaluation of a public domain ANN software simulator. The end results in each case have been uniformly unsatisfactory. We find that students in the main tend to be deadline driven - in other words if the assignment is not due till the end of semester, then leave it for the time being! The end results accordingly reflect 2 weeks, rather than 14 weeks worth of effort.


4.2 ANN Software Simulators

One area that has proved reasonably successful however has been utilizing public domain training data as part of the laboratory assignment work [ref. comp. ai. neural-nets network news group] (also http://tp. sas. com/pub/neural /FAQ. html, FAQ, Part 4).

For these reasons, we decided early on that ANN software simulators needed to be an integral part of the CSCI964 subject. The next obvious consideration was which particular ANN simulator(s) to use. Budgetary constraints prevented our using commercial simulators in the early years. We were thus left with two choices - either specifying a textbook which comes bundled with a software simulator, or use a public domain one.

Unfortunately, a critical evaluation of available ANN textbooks led us to conclude that the better books did not come bundled with simulators. Moreover, of those that did, the good simulators were packaged with sub-standard books! [17], [25].

Over the years we have had varying degrees of success using the UnixlX-Windows platforms PlaNet [26], Aspirin for Migraines [27] and the Stuttgart Neural Network Simulator [28] - [see 29]. From an educational perspective SNNS lacks an introductory tutorial, which necessitates a substantial learning curve on the part of students. The underlying engine is nevertheless quite powerful. In fact, we have found the latter to be also useful as a research tool, especially due to its incorporation of the Resilient backPROPagation learning algorithm [30, 31]. Figure 3 shows a comparison of BP (with momentum), QuickProp and Rprop resulting from these studies.

QuickProp is a local, rather than global, adaptation technique. It uses both first and second-order partial derivatives (with the latter corresponding to the rate of weight change). It also employs an approximation to Newton's (numerical) method for calculating weight changes, in order to reduce computational overhead. QuickProp attempts to move each weight all the way to the error minimum, rather than using a small step size (dictated by 11), which should in principle lead to faster convergence. In practice, however, we have often observed large oscillations during training, and even at times failure to converge altogether.

246 J. Fulcher

~ ! CLEAA I Scal. x: IIlI:B Scal. Y: IIlI:B Display: SSE/out

.. Jl

Figure 3. Learning Rate Comparison (SNNS).

Resilient backPROPagation - RProp - is a local adaptation technique which employs (batch) learning, rather than learning-by-example. It eliminates the harmful influence of the size of the partial derivative on the weight step; only the sign is used to determine the direction of the weight update. Rprop also uses a weight-specific update value, which varies over time.

Since each of the above simulators incorporates an X-Windows Graphical User Interface, students are able to quickly gain an appreciation of network architecture, dynamic weight change, variation of error versus time (convergence) and the like. Such visualization proves to be a key factor in students coming to understand basic ANN principles.

To quote from the preface of my earlier textbook on microcomputer interfacing [32]:

"I have always been a firm believer in the proverb 'one picture is worth more than a thousand words' . Accordingly, I have deliberately made extensive use of

Teaching Course on Artificial Neural Networks

diagrams throughout this text. It is my belief that seeing how a particular peripheral device works is half the battle in interfacing it to a computer."

247

We have subsequently found that being able to visualize what is taking place within a neural network during training greatly assists in learning about them.

Realistically speaking, public domain software tends to be limited in terms of functionality, and suffers from a lack of testing and debugging (leading to unexpected crashes). They also incorporate rudimentary user interfaces and lack adequate documentation. Unfortunately, some of these limitations do not manifest until well into the semester - in other words, the extent of their limitations doesn't become clear until attempting to use them to undertake real world tasks.

4.3 NeuralWorks Professional-II+

In more recent times, we have been fortunate to obtain sufficient funding to equip a laboratory set of commercial ANN simulators. The first incarnation used NeuralWorks Professional-II+ running on a centralized SUN server, and accessible via a laboratory of X-Terminals [33]. Figure 4 shows a 4:4:3 MLP part way through being trained in NeuralWorks to discriminate between three different iris (flower) patterns (outputs) - setosa, versicolor and virginica - on the basis of sepal and petal length and width measurements (i.e., 4 inputs).

More recently, thanks to a University Teaching Grant from Aspen Technology Inc., we have been able to equip a laboratory ofPCs with not only NeuralWorks Professional-II+, but also the Designer Pack, User-Defined NeuroDynamics and Predict add-ons.

Predict automates the manipulation, selection and pruning of data, from within either MS Visual Basic or Excel. Options are also available which allow the user to build a Case Based Reasoning network or perform an Explain analysis of a selected model.

A typical application is time series prediction, in which we compare the performance of multiple regression with ANNs (trained using either Kalman Filtering or Adaptive Gradient).

248 J. Fulcher

Hiddenl

n

Figure 4. Iris Classification (NeuralWorks Professional-II+).

When predicting the exchange rate between the Canadian and US dollars, the predicted value followed the actual exchange rate very closely (with only 0.0219 r.m.s. error in fact, achieved after 10,000 epochs). The total cumulative values were 147.37 (actual) and 147.25 (predicted), respectively. Moreover, the largest deviation was only 4c over a 125 days period, which further indicates that MNNs perform well at this particular task.

Designer Pack analyzes the data flow within a nominated model (* . nnd), then translates this into three C source files (*. c, *. hand * . dat). UDND allows users to modify existing ANN paradigms, or alternatively to create their own, in terms of C math functions (e.g., summation, transfer, output, error, learn, noise, checkpoint processing, etc.).

Teaching Course on Artificial Neural Networks

4.4 Laboratory Assignments

Students attempt the following four laboratory assignments:

(I) Familiarization with NeuralWorks package, MLPIBPN and Iris pattern recognition

(II) MLPIBPN and printed character recognition, exclusiveor, encoder

(III) Boltzmann, SOM and LVQ comparison (Iris pattern classification)

(IV) RBF, Fuzzy ARTmap comparison (pattern classification), Genetic Reinforcement Learning (Iris classification, inverted pendulum)

249

Before discussing these individual laboratory assignments in detail, we need to make some general observations regarding ANN training. First and foremost, ANN s comprise large numbers of nodes (neurons/units/Processing Elements), and even larger numbers of interconnecting weights, as outlined in Section 1. In supervised ANNs, training involves the modification of weights so that the network converges (eventually) to a (local, hopefully global) minimum. Not surprisingly, the number of weight adjustments increases exponentially as a function of the number of network interconnections.

Any reduction in this number before we commence training is therefore essential, in order to reduce training times. This leads us naturally to a consideration of preprocessing, which attempts to reduce the dimensionality of the input data, thus converting the original problem into one which is much more readily solved using ANNs. Preprocessing is so important that the solution to many real-world problems is primarily a matter of applying appropriate preprocessing; training proper is then a relatively straightforward exercise.

At the other end of the spectrum, appropriate preprocessing can involve as little as simply inverting one of the network inputs - which can significantly improve convergence times with XOR, for example (Assignment-II). Alternatively, converting the training data from unipolar (0/1) to bipolar (±1) representation, and selecting a tanh rather than sigmoid activation function, can yield significant improvements (e.g. Iris Classification in Assignment-I; XOR in Assignment-II).

250 J. Fulcher

Prior to presenting training patterns (exemplars), ANN weights are set to small, random values (so the individual neurons do not saturate early in the training process). By re-initializing network parameters and retraining, students can observe a particular ANN taking different routes (from different starting points), arriving at the same end point - using InstaNet display tools within the NeuralWorks Professional-II+ simulator.

Termination criteria for network convergence can be specified within NeuralWorks in terms of either overall r.m.s. error, or total number of training epochs. Using the appropriate InstaNet display tools, the training/convergence of different ANN models can be readily compared.

The emphasis in the laboratory assignments is on actually trying out various permutations of network parameters and the like. This tends to make the assignments a little time-consuming, however their degree-ofdifficulty is only moderate. The obvious network parameters to adjust are learning rate (11) and momentum (ex), in the case of MLPIBP. For example, in the encoder problem of Assignment-II, high values of both (-0.8; range 0 .. 1.0) are found to be optimum.

ANN architectures can also be readily compared. For example, in the XOR part of Assignment-II, students are asked to compare the performance of the 2:1:1 and 2:2:1 MLPs of Figure 5. The 2:1:1 configuration performs a logical-AND of two separate linear discriminants (corresponding to the two halves of the XOR Truth Table), whereas the 2:2: 1 architecture behaves more like a "true" MLP - in other words, one incorporating hidden layer neurons. Not surprisingly, the 2:2: 1 MLP significantly outperforms the 2: 1: 1 architecture (typical epochs to convergence being 27 and 40, respectively).

Exclusive-OR is a difficult classification task, due to its linear inseparability; in other words, no straight line discriminant can be drawn in the 2D solution space which separates out (0,0) and (1,1) {i.e. class#l} from (0,1) and (1,0) {i.e. class#2}. The original (2-layer) Rosenblatt Perceptron was incapable of solving such linearly inseparable problems; the addition of a third (hidden) layer becomes necessary,


[a] [b]

Figure 5. XOR using MLPs. (a) 2:1:1; (b) 2:2:1.

Remaining with the XOR problem, students learn another interesting feature, this time related to the activation function: the usual sigmoid function fails to yield 100% recognition, however reverting to a step function does.

Assignment-II also involves students applying the MLP/BP model to printed Black-and-White character recognition. More specifically, the ANN learns to classify (24*24 pixel patterns) into l-of-75 characters (pattern classes). Two versions of each character are contained within the training set, one "clean" and the other "noisy". Thus the available data can be split in two, with one half used for training and the other for testing the network once trained (this being a common approach with training ANNs).

Note that we are dealing with an unrealistically small number of patterns per class here, especially given the large ANN configuration (576:H:75). The resulting generalization ability is thus not great. Figure 6 illustrates the effect of noise on lower resolution (5 column * 7 row) printed characters. In this example, a noisy pixel at the lower left of the

252 J. Fulcher

'T' would be miss-classified as a 'J', whereas the same pixel acting on the 'I' would most likely still be recognized as an T. Such a single inverted pixel corresponds to 2.857% noise acting on the system. By contrast, for the high resolution characters used in Assignment-n, a single inverted pixel corresponds to only 0.01736% noise.

Figure 6. (5*7 pixel) Printed Characters ['1' 'T' '1'].

Students are further asked to compare the performance of the BackPropagation, BP+momentum and QuickProp learning algorithms on this printed character recognition task.

If the output layer of the MLP in Figure 2 is connected to the input layer, then it can be trained to act as an autoassociator - in other words associating patterns with themselves, rather than with other patterns (heteroassociativity). Moreover, if the number of "hidden" nodes is less than the number of input (output) nodes, then the ANN acts as a data compressor - 8:3 compression is used in Assignment-IT (16:4 could alternatively be used). Note that unlike with conventional (discrete combinatorial logic) compression, the number of hidden nodes is not restricted to log2N (where N = number of input/output neurons).

The SoftMax output function is used on the output layer in this exercise, in preference to sigmoid, since it yields both a superior (1-ofn) classification rate and faster convergence.

Having become familiar with the MLPIBP model in the first two assignments, students are exposed to alternate ANN models in Assignment-III, whilst remaining with the same pattern classification task (i.e. iris recognition). Unlike the MLP, the Hopfield Network is a feedback ANN. Like the MLP though, Hopfield is also supervised. The Boltzmann Machine - BM - is a variant of the Hopfield Network which uses a technique called "simulated annealing" to escape from local


minima in the energy landscape. During training, "thermal noise" is added to energize the "ball" sufficiently to leap out of the local "valley" in the energy landscape.

The unsupervised Self-Organizing Map ANN - SOM - utilizes a single 2D (Kohonen) layer, which forms regions or "neighborhoods" corresponding to various input data classes (categories). SOMs are often used as preprocessors, to perform rudimentary pattern classification, prior to feeding into a supervised ANN. Actually a SOM variant (provided within N euralW orks Professional-II +) is used in Assignment-Ill, which comprises 4 input nodes, and 12 nodes (arranged as 4 rows * 3 columns) in the output (Kohonen) layer. The latter feeds into a "coordinate layer," which produces the x-y coordinates of the winning node. This is then used as the input to a 1-hidden layer BP network. The composite ANN switches from Kohonen to MLP learning after 2,250 iterations.

The Learning Vector Quantization ANN comprises an input layer, a Kohonen layer (which learns and performs classifications), and an output layer. The input layer contains one neuron per input parameter; the output layer contains one neuron per class. LVQl learning is used for the first 2,250 iterations, LVQ2 for the next 750 (and is used to perform fine tuning of the network).

In Assignment-III, these three ANN models - BM, a supervised SOM variant and L VQ - are compared on the iris classification problem introduced earlier, in terms of RMS error, convergence and classification rate. Students observe better classification with LVQ than with the SOM variant, and experience the long training times inherent with BMiSimulated Annealing.

Two further ANN models are introduced in Assignment-IV, namely RBF and FuzzyARTmap. Radial Basis Function networks use the same architecture as MLPs, but the neurons employ different activations -mathematical functions, rather than sigmoid. Both are good universal approximators, but RBF excels at interpolation and/or mathematical function approximation. Now while MLPs take a long time to train yet respond to new inputs almost instantaneously, RBFs train much faster but recognition times are considerably longer.

254 J. Fulcher

FuzzyARTmap comprises two (digital, unsupervised) ARTl networks, whose F2 layers are connected together by a fuzzy match tracking subsystem. The resulting supervised ANN is able to handle analog inputs and outputs.

In Assignment-IV, RBF FuzzyARTmap and MLP networks are applied to a simple pattern classification task - on this occasion rather than iris classification, the Leonard-Kramer simple diagnostic problem is investigated. Two process parameters are used as inputs, together with three outputs - nonnal (class#l), faultl (class#2) and fault2 (class#3). Training exemplars are provided as part of the NeuralWorks Professional-II+ package. RBF and FuzzyARTmap are found to outperform MLP on this particular classification task.

Genetic algorithms (evolutionary computation) are also investigated in Assignment-V, in the form of Genetic Reinforcement Networks. Students are immediately made aware of the long times associated with evolving a solution to a given problem - in the first instance iris pattern classification.

GRNs are also applied to a second problem, namely the cart-pole (inverted-pendulum or broomstick balancer) system. This is a classically hard control problem. As indicated in Figure 7, the relevant control parameters for this system are linear position and velocity (x, x'), and angular position and velocity (e, e') - e being limited to ±35° here. The objective is to apply corrections in order to keep the system in a balanced, steady-state condition, following a sudden transient disturbance (F). Even longer training times are found to be necessary in order to train this GRN cart-pole controller.

4.5 MatLab Neural Network Toolbox

A viable alternative to NeuralWorks Professional-II+ is the Neural Network Toolbox add-on to MatLab [34]. An earlier comparative review of these commercial ANN software simulators (and others) is presented in [35]. Figure 8 illustrates gradient descent in 3D weight space, with the "ball" moving in small steps from the upper "plateau" to the global minimum over time (courtesy of regular "snapshots" taken within MatLab). Actually one of the weights here is the threshold or bias term.


~

g 0.5 w -0 Q)

ro 0 :::; 0-

(/)

E -0.5 :::;

(/)

-4

F

Weight W

; ........................................................ ················· .. ······ .. ····· .. ··········· .. ·········X

Figure 7. Inverted Pendulum (Cart-Pole).

Error Surface

-4 Bias B

Sum-Squared Network Error for 49 Epochs 10°r---r---T---~--'---'

g 10'1 W -0

~ ro :::; 0-

(/)

~ 10'2 (/)

10,0 '--__ -'--__ ..J.-__ -'--__ -'---I

o 10 20 30 40 Epoch

Figure 8. Gradient Descent in Weight Space (MatLab).

By contrast, Figure 9 illustrates the effect of incorporating a momentum term into BP, this time on a 2D surface. More specifically, momentum is able to "energize" the ball sufficiently to move out of the (higher-

256 J. Fulcher

energy) local minimum, and over the surrounding "hill" to the global minimum.

Both these examples, whilst simplistic and a little contrived, provide students with a good feel of gradient descent in weight space (only two weights can be visualized in Figure 8, for example - in practice of course, thousands of weights are more usual). The 3D surface modeling facility within MatLab is a good visualization tool generally, and helps students consolidate basic principles.

Another advantage of the MatLab Neural Network Toolbox is the ability for users to add their own C-code neural network descriptions.

Figure 9. Effect of Momentum (Matlab).

5 Summary Based on our experiences at the University of Wollongong, we would definitely recommend offering a neural network subject along the lines described above. The CSCI964 subject has proved popular with both experienced programmers and novices alike. A key component of this subject is the use of ANN simulator software to undertake laboratory


assignments. The visualization of key network parameters via the simulator GUI has been found to significantly aid the students' learning process.

Acknowledgment

This work was funded in part by an Aspen Technology Inc. University Teaching Grant.

References

[1] Fulcher, J. (1997), "Image Processing," Chapter F1.6 in E. Fiesler and R. Beale (Eds.); Handbook of Neural Computation, Oxford University Press, New York.

[2] McCulloch, W. and Pitts, W. (1943), "A Logical Calculus of the Ideas Immanent in Nervous Activity," Bulletin Mathematical Sciences, Vol. 5, pp. 115-133.

[3] Rosenblatt, F. (1958), "The Perceptron: a Probabilistic Model for Information Storage and Organization in the Brain," Psychological Review, Vol. 65, pp. 386-408.

[4] Minsky, M. and Papert, S. (1969,88), Perceptrons (expanded ed), MIT Press, Cambridge, MA.

[5] Werbos, P. (1994), The Roots of Backpropagation, John Wiley and Sons, New York.

[6] Blum, A. (1992), Neural Networks in C++: an Object-Oriented Framework for Building Connectionist Systems, John Wiley and Sons, New York.

[7] Masters, T. (1993), Practical Neural Network Recipes in C++, Academic Press, San Diego, CA.

[8] Welstead, S. (1994), Neural Network and Fuzzy Logic Applications in C/C++, John Wiley and Sons, New York.

258 J. Fulcher

[9] NeuralWare Inc. (1995), Supplementfor Professional II/Plus V5.1, Technical Publications Group, NeuralWare Inc., Pittsburgh, PA.

[10] Beale, R. and Jackson, T. (1990), Neural Computing: an Introduction, Adam Hilger, Bristol, UK.

[11] Aleksander, 1. and Morton, H. (1990), An Introduction to Neural Computing, Chapman and Hall, London, UK.

[12] Dayhoff, J. (1990), Neural Networks: an Introduction, Van Nostrand Reinhold, New York.

[13] Pao. Y. (1989), Adaptive Pattern recognztwn and Neural Networks, Addison Wesley, Reading, MA.

[14] Wasserman, P. (1989), Neural Computing: Theory and Practice, Van Nostrand Reinhold, New York.

[15] Fulcher, J. (1992), "Experience with Teaching a Graduate Neural Networks Course," Computer Science Education, Vol.3, No.3, pp. 297-314.

[16] Fulcher, J. (1993), "Comparative Neural Network Book Review -I," ACM Computing Reviews, Vol.34, No.lO, pp. 54-56 [9301-0009].

[17] Fulcher, J. (1993), "Comparative Neural Network Book Review -II," ACM Computing Reviews, Vol.34, No.5, pp. 230-233 [9305-0266].

[18] Wasserman, P. (1993), Advanced Methods in Neural Networks, Van Nostrand Reinhold, New York.

[19] Haykin, S. (1999), Neural Networks: a Comprehensive Foundation (2nd ed.), Prentice Hall, Upper Saddle River, NJ.

[20] Brown, J.S., Collins, A. and Duguid, S. (1989), "Situated Cognition and the Culture of Learning," Educational Researcher, Vo1.l8, No.1, pp. 32-42.


[21] Lave, J. and Wenger, E. (1991), Situated Learning: Legitimate Peripheral Participation, Cambridge University Press, Cambridge, UK.

[22j McLellan, H. (Ed.) (1996), Situated Learning Perspectives, Prentice Hall, Englewood Cliffs, NJ.

[23] Harel, I. and Papert, S. (1992), Constructionism, Ablex, Norwood, MA.

[24] Oakeshott, M. (1962), Rationalism in Politics, Methuen and Co., London.

[25] Fulcher, J. (1992), "McClelland, J. and Rumelhart, D., "Explorations in Parallel Distributed Processing: a Handbook of Models, Programs and Exercises (PClMacintosh), MIT Press, Cambridge, MA, 1988(9)," ACM Computing Reviews, Vol.33, No.l1, pp. 593-594 [9211-0841].

[26] Miyata, Y. (1991), A User's Guide to PLANet V5.7, Dept Computer Science, University of Colorado, http://boulder.colorado.edu.

[27] Leighton, R. and Weiland, A. (1998), The Aspirin/Migraines Software Tool Users' Manual V6.0, Mitre Corp, http://taylor.digex.net/am6.

[28] Zell, A. et al. (1995), Stuttgart Neural Network Simulator User Manual V4.1, Institute for Parallel and Distributed High Performance Systems, University of Stuttgart, Germany, http://www-ra.informatik.uni-tuebingen.de/SNNS/.

[29] Fulcher, J. (1998), "Laboratory Support for the Teaching of Neural Networks," Intl. J. Electrical Engineering Education, Vol.35, No.1, pp. 29-36.

[30] Hagenbuchner, M. and Fulcher, J. (1997), "Noise Removal in Ionograms by Neural Network," Neural Computing and Applications, Vol.6, pp. 165-172.

260 J. Fulcher

[31] Fisher, R. and Fulcher, J. (1998), "Improving the Inversion of Ionograms by Combining Neural Networks and Data Fusion Techniques," Neural Computing and Applications, Vol.7, pp. 3-16.

[32] Fulcher, J. (1989), An Introduction to Microcomputer Systems: Architecture and Inteifacing, Addison Wesley, Reading, MA.

[33] NeuralWare (1996), Using NeuralWorks, NeuralWare Inc., Technical Publications Group, Pittsburgh, P A.

[34] MatLab (1996), Neural Network Toolbox for use with MatLab, The MathWorks Inc., Natick, MA.

[35] Fulcher, J. (1994), "A Comparative Review of Commercial ANN Simulators," Computer Standards and Inteifaces, Vo1.16, No.3, pp.241-251.

CHAPTERS

INNOVATIVE EDUCATION FOR FUZZY LOGIC STABILIZATION OF ELECTRIC POWER SYSTEMS IN A

MATLAB/SIMULINK ENVIRONMENT

T. Hiyama Department of Electrical and Computer Engineering

Kumamoto University Kumamoto 860-8555, Japan

Matlab/Simulink-based transient stability simulation programs for multi-machine power systems are introduced. The program can be used for teaching the concept of transient stability of electric power systems, the fundamental functions of generator controllers such as excitation systems and speed governing systems, and also for research works at laboratories especially to develop generator controllers using advanced technologies such as fuzzy logic, neural network, and so on. Simulink is utilized for the modeling of the entire power system including generating units with the excitation control system and also the speed governing control system, and power transmission networks. The real time operation of the developed transient stability simulation programs requires RTW (Real Time Workshop) environment together with a DSP (Digital Signal Processor) board. For the real time operation, the AD and the DA conversion interfaces on the DSP board have very important roles as the interfaces between the developed real time simulator and personal computer based external generator controllers. The control performance of the generator controllers using advanced technologies such as fuzzy logic can easily be tested on the real time simulator at laboratories.

262 T. Hiyama

1 Introduction

A quick glance at what we teach today and what we were taught in the past will reveal the vast difference. This illustrates the rapidity of the progress being made in the Electrical Engineering fields. We have introduced simulation packages developed by ourselves in our teaching in order to shorten the gap between what has become as industry practice and what we teach in the classroom. An example of this is the variety of Matlab/Simulink programs that became widely used in industry before their wide spread introduction in the universities. An extension of Matlab/Simulink is the analysis and the design programs for various dynamic systems that has schematic capture capability. These tools are being increasingly used in industry and should be introduced into the curriculum. In addition, in order to introduce promising technologies that will have wide application in industry into the curriculum, it is required to develop efficient and innovative teaching methods. The objective is not only to present and to allude to new applications of new technologies but also to re-enforce fundamental concepts imparted to students during their previous university years. For example, when teaching the fuzzy logic control systems, the basis of filters, sampling theorem, Z-transform, etc., are reinforced.

For teaching the concept of the transient stability of electric power systems, the roles of various generator controllers such as automatic voltage regulator (A VR), power system stabilizer (PSS), governor (GO V), and also for the research works at laboratories to develop generator controllers using new technologies such as fuzzy logic, neural network, a Matlab/Simulink based transient stability simulation programs have been developed. For their real time operation, the Real Time Workshop (RTW) is also utilized together with an additional digital signal processor (DSP) board. The transient stability simulations are available on the proposed simulation programs for multi-machine power systems. The proposed simulation program consists of several blocks. The basic configuration of the system under study is set up by using the Simulink blocks. The initial condition of the system is specified using the power flow calculations given by a Matlab program. Through the numerical integration, additional nonlinear equations should be solved to determine the d-q components of both the terminal

Innovative Education for Fuzzy Logic Stabilization of Electric Power Systems 263

voltage and the generator current. The Simulink block includes these equations as a Matlab function block. Typical excitation systems and also typical speed governing systems are ready to be used as tool components, therefore, users can easily modify or replace the generating units to alternative ones. The graphical interfaces are also prepared in the main Matlab program, therefore, users can easily check the simulation results on the display.

In addition, a real time power system simulator is developed by using a personal computer and a DSP board with AD and DA conversion interfaces. For the real time operation of the developed simulator, Real Time Workshop (RTW) is necessary.

To demonstrate the efficiency of the proposed transient stability simulation programs, the control performance of an integrated fuzzy logic generator controller has been investigated for a longitudinal fourmachine study system. Comparative studies have also been performed between the fuzzy logic controller and conventional ones. Furthermore, to demonstrate the efficiency of the proposed real time power system simulator, the control performance of the fuzzy logic generator controller has been tested on the developed simulator. The fuzzy logic controller is separately set up by using a personal computer with AD and DA conversion interfaces. We believe that by using the developed simulation programs and also by using the developed real time power system simulator, the learning of the transient stability of electric power systems is improved. Furthermore, the teaching of the fundamentals of experimental studies are improved by using these tools. These tools will also be useful for research work at laboratories to design generator controllers using new technologies such as fuzzy logic, neural networks, Hinfinity, and so on. The newly designed controllers are tested on the real time power system stabilizer for the evaluation of control performance at laboratories before testing them on actual generators.

2 Fuzzy Logic Control Using Polar Information

In a conventional controller, what is modeled is the system or process being controlled, whereas in a fuzzy logic controller, the focus is on the

264 T. Hiyama

human operator's behavior. In the first case, the system is modeled analytically by a set of differential equations from which the control parameters are adjusted to satisfy the controller specification. In the fuzzy logic controller, these adjustments are handled by a fuzzy rulebased expert system.

2.1 Fundamentals

After choosing proper variables as input and output of fuzzy controller, it is required to decide on the linguistic variables. These variables transform the numerical values of the input of the fuzzy controller, to fuzzy quantities. The number of these linguistic variables specifies the quality of the control which can be achieved by using the fuzzy logic controller. As the number of the linguistic variables increases, the computational time and required memory increase. Therefore, a compromise between the quality of control and computational time is needed to choose the number of linguistic variables. When considering a constant speed control, seven linguistic variables, such as LP (large positive), MP (medium positive), SP (small positive), Z (zero), SN (small negative), MN (medium negative), LN (large negative), are required for each one of the speed deviation X, the acceleration Y, and the control signal U from the controller to achieve better performance. The system state is given by p(k) as:

p(k) = [ X(k), As Y(k) ] (1)

Table 1 illustrates a decision table for the controller, where a positive control signal is for the deceleration control, and a negative control signal is for the acceleration control. There are totally 49 rules. One of them is as follows: If the speed deviation X is LP (large positive) and the acceleration As Y is LN (large negative) then the controller output is Z (zero).

2.2 Fuzzy Logic Control Scheme

Observation of Table 1 shows that the diagonal elements indicate the switching line which separates positive and negative stabilizing signals, the right upper triangular region gives a positive control signal for the deceleration control, and the left lower triangUlar region gives a


negative control signal for the acceleration control. From the observation, simple fuzzy logic control rules are introduced. The rules are simplified by using the polar variables, i.e., radius D(k) and angle 8(k), instead of using rectangular information X(k) and AsY(k).

Table 1. Fuzzy decision table for constant speed controller.

X LN MN SN Z SP MP LP AsY

LP Z SP MP LP LP LP LP MP SN Z SP MP MP LP LP SP MN SN Z SP SP MP LP Z MN MN SN Z SP MP MP

SN LN MN SN SN Z SP MP MN LN LN MN MN SN Z SP LN LN LN LN LN MN SN Z

As: Scaling factor for the acceleration Y

D(k) = ~X(k)2 + (As·Y(k))2 (2)

8(k) = tan-1 (As ·YCk)/ X(k)) (3)

Figure 1 shows the state where the radius D is constant and also the state where the angle 8 is constant. By overlapping these figures on Table 1, we can easily find alternative simple rules as follows.

Rule 1: In the first quadrant, only the deceleration control is required to reduce the positive speed deviation. Namely, the stabilizing signal U(k) should be positive. On the contrary, only the acceleration control is required in the third quadrant, and the stabilizing signal should be negative. Both in the second and the fourth quadrants, either one of the deceleration and the acceleration control is applied according to the location of the operating point. In the second quadrant, a gradual switching from negative to positive is necessary, and also in the fourth quadrant, a gradual switching from positive to negative is performed because of the clockwise movement of trajectory. Figure 2 gives two angle membership functions for the state p(k) where the radius D(k) is constant. The function N(8(k) gives the angle member of the

266 T. Hiyama

deceleration control, and the function P(8(k)) gives the angle member of the acceleration control. Moreover, the angle a gives the overlap angle between these two angle members.

Switching Line

Negative Control Signal

As Y

Acceleration

As Y

x

x o Speed Deviation

(a) D = constant (b) e = constant

Figure 1. Polar notation.

Rule 2: When the angle 8(k) is constant, the absolute magnitude of the stabilizing signal should increase according to the increasing radius D(k). In case (b) of Figure 1, positive stabilizing signals are required for both the states pi (k) and p2(k), however, the stabilizing signal at the point pi (k) is larger than the one at the point p2(k) because the point piCk) is far from the equilibrium, the origin O. Figure 3 and equation 4 give the radius member G(D(k)) which is related to the gain factor of the control.

G(D(k)) = D(k)/ Dr for D(k) $.; Dr

G(D(k)) = 1.0 for D(k) '?Dr (4)

Rule 3: The location of the switching line is modified by different settings of the scaling factor As for the acceleration Y.

These rules are straightforward so as not to require heavy computation on a micro-computer based fuzzy logic controller. This is one of the major advantages of the proposed fuzzy logic control scheme when considering its real time implementation.


grade

1 I N(e;

o

a =90' pre; >J :x o 90 135 180 270 315 360

() [degrees]

Figure 2. Two angle membership functions N(8(k» and P(8(k».

grade

11/1 G(D(k))

o Dr Distance D(k)

Figure 3. Radius membership function G(D(k».

3 Fuzzy Logic Stabilizing Controller

3.1 Configuration of Generator Controller in Matlab/Simulink Environment

The basic configuration of an integrated fuzzy logic generator controller is shown in Figure 4. The proposed controller consists of three blocks: voltage control loop, damping control loop, and speed control loop. The input signals are the terminal voltage, the generator real power output, and the generator speed for the voltage, the damping, and the speed governing control loops, respectively. The proposed controller is set up by using a personal computer with AD and DA conversion interfaces.

3.1.1 Voltage Control Loop

The detailed configuration of the voltage control loop is shown in Figure 5. The PD information of the voltage error signal e, which gives the difference between the reference voltage V r and the actual terminal voltage Vt, is utilized to get the voltage state and to determine the

268 T. Hiyama

voltage control signal Uv*. In addition, a PI control loop is also considered to shift the excitation voltage to its new steady state value according to the change of the reference voltage Yr.

Voltage Control Loop

-...;.....;--11-----1 Damping Control Loop

Speed Control Loop

Uv : Voltage control signal, Ud: Damping control signal, Ue : Excitation control signal, Ug : Speed governing control signal

Figure 4. Integrated fuzzy logic controller

Voltage Control Block

dVl X

PI Control Loop Uv : Voltage Control Signal

Vr: Reference Voltage, Vt : Terminal Voltage

dVt: Voltage Deviation, dVtldt : Derivative of Voltage Deviation

Figure 5. Simulink block for fuzzy logic voltage control loop (FAVR).

3.1.2 Damping Control Loop

Figure 6 shows the configuration of the damping control loop. The damping control signal Ud is derived from the generator real power output. Here, Za is a measure of the acceleration of generator, and Zs is a measure of the speed deviation. Za and Zs are derived from the sampled generator output Pe through the reset filters and an integrator.

The sum of the voltage control signal Uv and the damping control signal Ud, i.e., the excitation control signal Ue, is fed back to the excitation system.


Damping Control Block

~Trl

sTrl/(l+sTrl) Trl Reset Filter

Filtering Block

Ud : Damping Control Signal

Reset Filter

l1;1---~X

Ud CD Ud

Fuzzy Logic Control Block

Pe: Real Power Ontput, Pto : Initial Output Setting

Figure 6. Simulink block for fuzzy logic damping control loop (FPSS).

3.1.3 Speed Control Loop

The configuration of the speed governing system is shown in Figure 7. The speed control signal Ug is added to the steam valve servo system of a thermal plant. The PD information of the generator speed is utilized to determine the speed control signal Ug•

1 dw dwl X

As Ug CD

Ug

Fuzzy Logic Control Block

Figure 7. Simulink block for fuzzy logic governor (FGOV).

3.2 Fuzzy Logic Control Block

The same fuzzy logic control rules are applied to all the control loops for the excitation and speed governing systems. Here, it must be noted that the corresponding system states are given by

X(k) = e(k) = Vr - Vt(k) and Y(k) = (e(k) - e(k-l))/ iJ.T: voltage control loop X(k) = Zs(k) and Y(k) = Za(k) : damping control loop X(k) = Lico(k) and Y(k) = (Liro(k) - LioX.k-l))/ LiT: speed control loop

270 T. Hiyama

Here, As, which gives a scaling factor for Y(k) , As is one of the adjustable control parameters. The origin 0 is the equilibrium, therefore, all the control action should be directed to shift the point p(k) to the origin.

In this study, the generator state is given by the polar information instead of the rectangular information, i.e., the radius D(k) and the phase 8(k) to simplify the control rules.

D(k) = ~X(k)2 + (As·Y(k»2 (5)

8(k) = tan-1 (As.y(k)/ X(k» (6)

To derive the control scheme, the phase plane is divided into two Sectors, i.e., Sector A and Sector B. Here, a is the overlap angle between these two sectors. When considering the excitation control, Sector A, especially the first quadrant, gives the region where the excitation should be increased to rise the terminal voltage, and also to achieve the deceleration control for damping oscillations. On the contrary, Sector B, especially the third quadrant, gives the region where the excitation should be decreased to reduce the terminal voltage, and also to achieve the acceleration control. When considering the speed governing system, Sector A gives the region where the increase of the turbine output is required for the acceleration of the generator speed, and Sector B gives the region where the decrease of the turbine output is required for the deceleration of the generator speed. These two sectors are defined by using the two angle membership functions N(8(k)) and P(8(k)). For the excitation control system, the function N(8(k)) gives the grade of increasing the excitation voltage, and P(8(k)) gives the grade of decreasing the excitation voltage. In addition, these functions also give the grade to increase or to decrease the turbine output for the speed governing control. By using these two membership functions, the control signal U(k) from each fuzzy logic control loop is given by

U(k) N(9(k))-P(9(k)). G(D(k)). U N(9(k))+P(9(k)) max (7)

=[1-2P(9(k))] . G(D(k)) . U max

Innovative Education for Fuzzy Logic Stabilization of Electric Power Systems

G(D(k)) = D(k)/ Dr for D(k)::; Dr

G(D(k)) = 1.0 for D(k) ~ Dr

271

(8)

where G(D(k)) is another radius membership function, which is determined by the radius D(k) and the distance parameter Dr. Umax gives the maximum size of the output signal U(k) from each control loop. By using these equations, the control signals from all the control loops are determined as follows:

Excitation Control System (FEX: Fuzzy Logic Excitation) Voltage Control Loop: U(k) = Uv(k)* Damping Control Loop: U(k) = Ud(k)

Governor Control System (FGOV: Fuzzy Governor) Speed Control Loop: U(k) = Ug(k)

Here, it must be noted that the all the control parameters As, Dr, and a should be tuned separately for each control loop. Namely, the setting of these adjustable control parameters depends on the control loops. The angle and the radius membership functions are shown in Section 2.

The proposed control scheme has three basic parameters- the scaling factor As for Za(k), the overlap angle a of the angle membership functions, and the fuzzy distance level Dr for the radial member. Two other factors, the maximum control effort Umax, and the sampling interval f1T are also involved. These factors are often determined by external criteria.

The adjustable parameters As, Dr, and a are tuned at a specific operating point subject to a specific disturbance, and those parameters are fixed throughout the simulations and the experiments to demonstrate the robustness of the proposed fuzzy logic control scheme.

Figure 8 illustrates the Simulink Block for the fuzzy logic control block. All the components consist of standard components in the Simulink library. The adjustable control parameters can easily be modified from the main Matlab program to evaluate the controller performance.

272 T. Hiyama

Angle Membership gain

Figure 8. Simulink block for fuzzy logic control block.

4 Configuration of Matlab/Simulink-Based Transient Stability Simulation Program

4.1 Case Study

A longitudinal four machine infinite bus system is selected as a case study to demonstrate the efficiency of the proposed transient stability simulation program. The proposed system is illustrated in Figure 9.

Unit 1 Unit 2 Unit 3 Unit 4

O.3pu O.6pu O.6pu

Figure 9. Longitudinal four machine case study system.

Each unit is a thermal unit; Units 1 and 4 have self-excited excitation control systems, and Units 2 and 3 have separately excited excitation control systems. Each unit has a full set of governor-turbine system:


governor, steam valve servo system, high pressure turbine, intermediate pressure turbine, and low pressure turbine. In the transient stability simulations, a three-phase to ground fault is considered as a disturbance, and the faulted line is isolated from the system after O.07s.

4.2 Transient Stability Simulation Program

The main part of the proposed transient simulation program consists of Simulink blocks for solving differential equations which represent the dynamics of generator together with associated generator controllers.

The main Simulink block for the longitudinal four machine infinite bus system is illustrated in Figure 10. The main block has four generating units. It has a block for specifying network admittance matrices required for several conditions of the power transmission network: before, during, and after a specified fault. These admittance matrices are calculated by using an associated Matlab program before starting transient stability simulations, and fault locations can be arbitrarily specified. Before starting transient stability simulations, it is also required to specify the initial conditions of various quantities for each generating unit, therefore, a power flow calculation is performed by using the same associated Matlab program. In the program, the fault duration time is also specified.

During the simulations, the network conditions should be specified at every integration of the differential equations, therefore, a Matlab function block is set in the main Simulink block, where the network equations are solved by using phase angles 8 i, and induced transient voltages in d-q axes, Edi' and Eqi', for all the unit i (i = 1 to 4) to obtain the d-q axes components of the generator terminal voltages vd and Vq, and the generator terminal currents id and iq. This Matlab function block is also described by a Matlab program. The main Simulink block is easily modified for multi-machine power systems with different number of generators and also different network configurations. In addition, the Mux block is a block to make a vector from the input scalar variables, and the Demux block is the one where a vector is decomposed into scalar variables.

274 T. Hiyama

-rid

~iq untI

0:= delta, Ed', and Eq'

Unit 1 ~ Signals from Unit 1

~id

~Iq unit2 Signals from Unit 2 rr: Signals from Unit 3

Outputs -Unit 2 Signasl from Unit 4

~ Demux ~id lnfinite Bus

~iq unit3 -r--+ 0:= Admittance Matrices

Network Equation Block Unit 3 to Determine

vII, vq, ill, and iq r-----. id

~iq unit4 t--u= Unit 4

From Network f-t>::t. Block Admittauce Matrices ~

Infinite Bus

Figure 10. Main Simulink block for case study system.

Figure 11 shows the block of a generating unit. The block of the generating unit has several sub-blocks: a generator block associated with various generator controllers including a complete set of turbine system. In the case of Figure 11, an integrated fuzzy logic generator controller is equipped to the generating unit. The integrated fuzzy logic generator controller consists of a fuzzy logic AVR (FAVR), a fuzzy logic PSS (FPSS), and a fuzzy logic governor (FGOV). The input signals to the generator block are the d-q axes voltages, Vd and Vq, d-q axes currents, id and iq, the excitation voltage Efd, and the turbine output Pt, and the output signals are the real and the imaginary power outputs, P e and Qe, the terminal voltage Vt, the speed deviation

dw( ..1m), the phase angle de lta ( 0), the induced transient voltages in d-q axes Ed' and Eq'. The last three signals, delta, Ed', and Eq ' are sent to


the Matlab function block to solve the network equations to determine the generator voltages Vd and Vq and the currents id and iq.

Damping Control Block

Fuzzy Logic Governor (FGOV)

Turbine System

Excitation System

VI

Pel----'

delta 1---+1

Ed'I---+I

Eq'I---+I

Mux

vd }--=----+Jvq dw Vectorization

Generator Block

CD delta. Ed' and Eq'

Figure 11. Simulink sub-block for generating unit with fuzzy logic A VR (FA VR), FPSS, and fuzzy logic governor (FOOV).

1

Vt

CD EId

Te

Figure 12. Thyrister excitation system and Simulink block.

276 T. Hiyama

Figure 12 shows the associated thyristor exciter. Any other exciters can be used by replacing the excitation system block to alterrtative ones.

Figure 13 illustrates the stearn valve servo system and the turbine system, where the valve opening speed is restricted by Uland the closing speed is restricted by L1.

Setting of Turbine Output

LVG

Figure 13. Simulink block for steam valve servo systems and reheat type turbine system.

The associated generator controllers, such as A VR, PSS, and Governor can be replaced to various types of controllers including conventional ones shown in Figure 14 to Figure 16.

~+ Q5--f-

Vt

1------.JdVt

1---l~Kl

K2/(I+sT2)

Figure 14. Simulink block for conventional A VR (CA VR).

CD Uv


Conventional PSS Block (CPSS)

Ud : PSS Siganl Pe : Real Power Output, Pto : Initial Setting of Generator Output

Figure 15. Simulink block for conventional PSS.

1I2*pi*fo*R

O.1s+1

O.2s+1

Phase Lead Compensator

CD Ug

1-+11-+-1+0 Ud

Figure 16. Simulink block for conventional speed governing system (CaGV).

In the proposed simulation program, various combinations of controllers are available, for example,

• Generating unit with FA VR, FLPSS, FGOV • Generating unit with FAVR, FLPSS, CGOV • Generating unit with CA VR, FLPSS, CGOV • Generating unit with CA VR, CPSS, CGOV • Generating unit with CA VR, CGOV • Generating unit with CA VR

where CA VR and CGOV denote the conventional A VR and governor.

Newly developed generator controllers using neural networks. Hinfinity, and the other advanced technologies can easily be installed to the simulation program just by replacing the controller blocks to their new blocks.

4.3 Typical Transient Stability Simulation Results

Transient stability simulations can be performed by using the proposed Matlab/Simulink based transient simulation program, for example, to

278 T. Hiyama

investigate the advantages of the newly proposed integrated fuzzy logic generator controller. Typical simulation results are shown in Figure 17. In the simulation, Units 2, 3, and 4 are equipped with only the conventional excitation system (CA VR), and Unit 1 has the newly proposed integrated fuzzy logic generator controller (FA VR + FLPSS + FGOV). The figure illustrates the response of Unit 1. The real power output (pu), the generator speed deviation (rad/s), the generator terminal voltage (pu), the excitation voltage (pu), the excitation control signal (pu), the speed governing control signal (pu) are illustrated from the top to the bottom.

Figure 17. Simulation results after applying integrated fuzzy logic generator controller (FA VR+FLPSS+FGOV).

The profiles of variations of various quantities shown in the figure are quite similar to the ones monitored on the Analog Simulator using the same four machine system, where the PC based integrated fuzzy logic generator controller was tested experimentally.


4.4 Graphical User Interface

Figure 18 shows the overview of a typical display on the CRT. Users can replace the controllers by alternative ones, and they can evaluate the controller performance by the time responses on the display. In addition, the tuning of controller parameters is also performed by checking the simulation results and/or from the values of several performance indices specified as measures of the transient stability.

Figure 18. Graphical user interface.

4.5 Concluding Remarks

The efficiency of the proposed Matlab/Simulink-based transient simulation program has been demonstrated through the transient stability simulations for the four machine system. The proposed program is a very powerful tool to learn the transient stability of electric power systems, and also to study the function of various generator controllers from the entire point of view and also the details of each

280 T. Hiyama

components such as filters and compensators. The program is also useful for testing newly developed controller using advanced technologies.

5 Real Time Power System Simulator A simple one machine infinite-bus system is selected as a system to develop the real time power system simulator. The system under study is illustrated in Figure 19. The study unit is a thermal unit, and the unit has a self-excited excitation system. The study unit also has a full set of governor-turbine system: governor, steam valve servo system, high pressure turbine, intermediate pressure turbine, and low pressure turbine. In the real time transient stability simulations, a three-phase to ground fault is selected as a disturbance, and the faulted line is isolated from the system after 0.07s.

e--1 A H 1--------7~---i Infinite

]. Bus

Figure 19. One machine study system.

5.1 Configuration of Real Time Simulator

The main part of the proposed real time transient simulation program consists of Simulink blocks for solving differential equations which represent the dynamics of generator together with associated network equations.

The main Simulink block for the study system is illustrated in Figure 20. For the real time simulations, AD and DA conversion interface blocks are also set in the Simulink block. Through the DA conversion block, required signals to generate supplementary control signals, are sent to another personal computer (PC) based external controller. These signals are inputted to another personal computer based controller to generate control signals. The control signals generated on the external PC based controller are fed back to the proposed real time simulator through the AD conversion interface block.


Trig

Plo

Ue

Ug

AD conversion

.-------li>lDemux

t----I"iid

Pe t----li>liq

t-----t-lvd VI

t-----t-lvq dw

Generating Untt

Network Equation

Mux

:>--IIOolDAC#1 Pe

-:>--Iool DAC #2 VI

>--10{ DAC #3 dw

DA conversion

Figure 20. Main Simulink block for real time simulation.

In Figure 20, the signals Ue and Ug denote the excitation and the speed control signals, respectively. The signal Trig gives the triggering signal to start the three-phase to ground fault sequence. The signal Pto gives the setting of the generator output. In addition, the Mux block is a block to make a vector from the input scalar variables, and the Demux block is the one where a vector is decomposed into scalar variables.

The main Simulink block can be extended for multi-machine power systems with different number of generators and also different network configurations as shown in Section 4.

Figure 21 shows the block of a generating unit. The block of the generating unit has several sub-blocks as shown in Section 4: a generator block associated with a excitation system and a complete set of turbine system. The input signals to the generator block are the d-q axes voltages, Vd and vq, d-q axes currents, id and i q, the excitation voltage Efd, and the turbine output P t, and the output signals are the real power outputs, Pe, the terminal voltage Vt , the speed deviation dw(.dm), the phase angle delta( 8), the induced transient voltages in d-q axes E/ and Eq'. The last three signals, delta( 8), Ed', and Eq' are sent to the Network Equation block to solve the network equations to determine the generator voltages Vd and vq, and the currents id and i q•

282 T. Hiyama

The excitation control signal Ue and the speed control signal Ug are fed back from external PC based controllers.

2 Ue

Vt

E1d Ue

Excitation System

Pto Pt

control signal

Governor Turbine

e1d Vt

VI Pt Pe

Pe

de~a

1-----JIIo{ 3

Eq'

Generator

Figure 21. Simulink sub-block for generating unit.

Newly developed generator controllers using fuzzy logic, neural networks. H-infinity, and the other advanced technologies can be tested on the developed real time power system simulator.

5.2 Matlab/Simulink-Based Real-Time Controller

Similarly, generator controllers are also set up in the Matlab/Simulink environment. Figure 22 and Figure 23 illustrate examples of controller blocks. An integrated fuzzy logic generator controller is set up in the Simulink environment together with AD and DA conversion interface blocks. The controllers can be operated in real time by using Real Time Workshop (RTW) and a DSP board with AD and DA conversion interfaces. The generator controller be easily be replaced by the other types of controllers. In addition, control parameters can be modified in real time without resetting the simulation program, and the variations of various generator quantities can be displayed on the CRT in real time, therefore, the evaluation of the generator controllers becomes quite efficient.

The control parameters can be modified in real time without resetting the control programs. Figure 24 shows the user interface on the display.


The present setting of control parameters, and the input signals to the controller and the output signal from the controller are also displayed on the CRT to check the controller.

ADellI >---r--.JPe

Terminator

AD

Conventional PSS Block

IntegraJed FuUJI Logic Generator Controller Block

Ground

DA AD Conversion

Block DA Conversion

Block

Figure 22. Real-time controllers in Matlab/Simulink environment (I); conventional PSS & integrated fuzzy logic generator controller.

Configuration of Integrated FuttY logic Excitation Control System in M,ltlolb/SlmuHnk Environment Rul Time Wolkshop(RT\IV) oInd DSP BO,lud SUI required forris lui time opt, .. lion.

.AnC '1

ADeN2

ADC'3

ADCJ4

DS1102ADC1

AD Convel$jon

Thru Dimillnsionolil Fuzzy Logic Control Rules

Compensation of Vollage MlI!asuitment Dl!:lay

'--~-I

Sg2 for Zp

u.I----~

Three·dlmenslonal Integrated rEX

(New)

Glound2

Figure 23. Real-time controller in Matlab/Simulink environment (II); integrated fuzzy logic excitation controller.

PS11020AC

DA Conversion

5.3 Testing of Real-Time Power System Simulator

The real time transient stability simulations have been performed on the developed real time power system simulator to investigate the simulator performance.

284 T. Hiyama

Figure 24. Control parameters and signals to/from controller displayed on CRT.

CJ ~

~e AD Real Time DAI-I_Vt

~~ Simulator f

Ue CJ Ug PC Based i+

DA AD ~ - Controller

Figure 25. Setting of real-time simulator tests.


The setting of the real time power system simulator tests is shown in Figure 25, where a PC based generator controller is tested. Throughout the simulator tests, the sampling interval.1T is set to O.Ols.

Typical real time simulation results are shown in Figure 26. The figure shows the overview of the CRT display for the real time. The time responses of the generator output, the terminal voltage, the external control signal, and the setting of generator output are shown on the display in real time. The performance of the external controller can easily be evaluated through the real time simulation.

Figure 26. Overview of simulator CRT display.

5.4 Concluding Remarks

The efficiency of the proposed Matlab/Simulink based real time power system simulator has been demonstrated through the real time transient stability simulations by using the one machine study system. The proposed simulator is a very powerful tool to learn the transient stability of electric power systems, and also to learn the function of various generator controllers in the classrooms. The simulator is also

286 T. Hiyama

very useful at laboratories for testing newly developed controller using advanced technologies. Before testing on actual generators, the controller performance can be evaluated and the controller performance can be improved from the testing. By modifying the generator parameters and by replacing excitation and governor-turbine systems, various types of generating units can be considered on the real time simulator. Users will be able to have lots of practical experience from the testing on the simulator which is very important in industries to improve engineers ability.

6 Conclusion The developed simulation programs and also the developed real time power system simulator highly improve the learning of the transient stability of electric power systems and more efficient education of electric power applications of new technologies becomes possible in the classroom. Furthermore, the teaching of the fundamentals of experimental research works is improved by using the developed real time power system simulator. These developed tools are also useful for the research work at laboratories to design and to set up generator controllers using new technologies. The newly designed controllers can be tested on the real time power system simulator for the evaluation of control performance at laboratories before testing them on actual generators.


References

[1] Hiyama, T. and Sameshima, T. (1991), "Fuzzy logic control scheme for on-line stabilization of multi-machine power system," Fuzzy Sets and Systems, Vol. 39, pp. 181-194.

[2] Hiyama, T., Oniki, S. and Nagashima, H. (1996), "Evaluation of advanced fuzzy logic PSS on analog network simulator and actual installation on hydro generators," IEEE Trans. on Energy Conversion, Vol. 11, No. 1, pp. 125-131, March.

[3] Hiyama, T., Miyazaki, K. and Satoh, H. (1996), "A fuzzy logic excitation system for stability enhancement of power systems with multi-mode oscillations," IEEE Trans. on Energy Conversion, Vol. 11, No.2, pp. 449-454, June.

[4] Hiyama, T. and Ueki, Y. (1996), "Fuzzy logic excitation and speed governing control system for stability enhancement of power systems," Australian lournal of Intelligent Information Processing Systems, Vol. 3, No. 1, pp. 32-38.

[5] Hiyama, T., Ueki, Y. and Andou, H. (1997), "Integrated Fuzzy Logic Generator Controller for Stability Improvement," IEEE Trans. on Energy Conversion, Vol. 12, No.4, pp. 400-406, Dec.

[6] Hiyama, T., Miyake, T., Kita, T., and Andou, H. (1998), "Evaluation of Integrated Fuzzy Logic Generator Controller on Analog Simulator," IEEl Trans., Vol. 118-B, No.1, pp. 37-43, Jan.

CHAPTER 9

A NEURAL NETWORK WORKBENCH FOR TEACHING AND LEARNING

W.L. Gob and S.K. Amarasingbe Division of Infonnation Engineering

School of Electrical and Ele.ctronic Engineering Nanyang Technological University

Nanyang Avenue, Singapore 639798 w 19oh @ntu.edu.sg, [email protected]

Understanding artificial neural network (ANN) theories and their applications would not be complete without hands-on experience in neural network problems. As such, there is a growing need for an environment for teaching and learning that allows users to create, train and test various neural network algorithms without spending time in rewriting programs. The proposed neural network workbench addresses this problem suitably. The workbench was developed using Visual c++ version 5.0 and is able to run on either Windows 95/98 or Windows NT 4.0 platfonn. It provides a collection of graphical user interfaces, each dedicated to the training and testing of specific ANN algorithms. One unique feature of this workbench is the use of real time displays for tracking progress when training a neural network. The successful implementation of the workbench is demonstrated by its ability to be applied to real world applications such as pattern classification, function modelling and digital logic gates. In addition, each algorithm can be evaluated in tenns of its efficiency, accuracy and suitable application through the use of the workbench.

1 Introduction Interests in ANN [1]-[4] have led to the need for an efficient and reliable means whereby ANN models can be built, trained and tested. These ANN models could potentially be implemented using either

290 W.L. Goh and S.K. Amarasinghe

hardware or software. For hardware implementation, field programmable gate arrays (FPGA) and other digital circuits could be used. Another method, which has been proven to be equally important and performs just as well, is through the use of software. The use of software has proven to be indispensable as models can be re-modified at no extra cost and with minimal delay. As such, many software programs have surfaced over the years to facilitate the implementations of ANN. Most of these share a cornmon objective: to aid in the learning of neural networks. Each is equipped with its own unique features that gives it an added advantage over others. However, the lack of a well written program which features a user friendly interface and provides an easy comprehension of neural networks, has hindered the learning progress of many newcomers to the field. The proposed ANN workbench attempts to fill this void.

The two features mentioned above (to model, train and test neural network algorithms; and to function as a teaching/learning aid in the field of neural networks) form the two main objectives in the design of the proposed ANN workbench. Much emphasis has also been placed on using a graphical interface to help enhance user friendliness. This is achieved through the use of a real time display that shows how training progresses.

The proposed workbench attempts to explore, to a certain depth, the various algorithms available and their possible implementations. Not all available ANN algorithms are incorporated into the workbench. Only those that form the basic building blocks of ANN and the more commonly used ones are chosen. Classical algorithms are the obvious choices as many of their non-classical counterparts tend to originate from these.

2 Program Specification The ANN workbench has been written with the following specifications:

Graphical workspace - The first step in modelling any neural network is to create the network itself. The program should allow users to create and modify neural networks visually, and also, to be able to see how

A Neural Network Workbench for Teaching and Learning 291

neurons are interconnected. The various layers of the network should be distinguishable, preferably by the use of a colour code. Zoom functions should be provided to focus on particular aspects of neural networks. Most of these tasks are to be carried out using the mouse.

Training neural network - The program must be able to train a neural network based on a certain algorithm [5]-[8]. This is essential as training forms the backbone of ANN. Users should be able to choose from a list of available algorithms. Training progress must be displayed in real time. This will allow users to study and observe the behaviour of an algorithm at any stage of the training process.

Testing the neural network - After a network has been trained, it is necessary to check whether it has learned correctly. The workbench is to provide two methods of testing: general testing and application specific testing. General testing simply generates a set of outputs by applying the trained network on input values. The outputs are compared with the expected results to gauge how successfully the network has been trained. Application specific testing makes use of some real world applications to test the network. It also helps users to understand how parameters can be adjusted with respect to specific applications. The applications to be implemented include pattern classification, function modelling, logic gates and the "traveling salesman problem".

Efficient storage and retrieval of data - This is an essential feature that the program must provide as keying in the same data repeatedly is impractical. In addition, the trained information should be stored so that the same training needs to be carried out only once. Data files will need to follow a certain format such that users will have additional means of entering data. This will enhance the program's flexibility.

User friendliness - The program must be designed such that the user is always in command. Messages should be abundant to inform the user when he/she has made an incorrect selection. Data validation must be sufficient such that the program does not crash when fed with unexpected data. The detailed mechanisms of how the program works, their interactions with users and the applications to real world examples are shown in Figure 1.

292

Use Hebbian to solve problem

Specify suitable parameters and

start training

Use Single Layer

Perceptron to solve problem


start training

W.L. Goh and S.K. Amarasinghe

Determine problem and formulate solution

Determine suitable network

topology

Use Back Propagation to solve problem


start training

Obtain results frol11 network

Verify results obfained

Use package to test using real

world applications

Satisfaction achieved. ready

for next problem

Use Kohonen to solve problem


start training

Use ART1 to solve problem


start training

Figure 1. Overall block diagram of implementation and application.

A Neural Network Workbench for Teaching and Learning

3 Basic Components of Neural Network Models

293

A complete neural network model can be defined in terms of its physical parts and the algorithms. The physical components consist of nodes or processing units, and edges, defining its topology, or connectivity. Figure 2 shows the diagram of a processing unit, better known as a neuron.

Local feedback

WI

I Xl

Out uts

Output function

Inputs

Figure 2. General scheme of a neuron.

Bias \ 1

The basic components of a neural network include the processing units (or neurons), the network topology, learning rule and learning type.

3.1 A Single Processing Element

The basic building block of all neural networks is the neuron. It serves not only as part of the structure of the network, but also as a functional unit capable of performing computations and processing of signals.


That explains why it is often called the processing element (PE) or the processing unit (PU).

The physical structure of a neuron is shown in Figure 2. It consists of:

• Input connections from other units, from itself or from the environment.

• Output connections from which it sends signals to other processing units or to the environment.

• The internal state of the unit.

• The rule for computing the next state from the current state.

• The rule for computing the output from the new state.

• A bias connection.

• A feedback connection (for some specific neural network paradigm).

Each input, x, has a weight, w, attached to its connection. Thus, every input signal will be multiplied with its corresponding weight before being channeled into the neuron. The sum of the weighted inputs gives the internal state of a processing unit, and, is referred to as the unit's activation, y , where y is defined as:

Y j = I.xiWij +b j i

The term bj in the above equation is known as the bias term. This is the weight of an extra input included to simulate external influence. This extra input always has a fixed value of 1. Next, an activation function, followed by a threshold function are applied to the activation, y, to generate an output signal. The purpose of the activation function is to allow the neuron to produce the next state with respect to time. In most cases, the activation function used is the identity function, where the next state is the same as the current state. The threshold function is normally a non-linear function used to produce the output. Examples include the step function, ramp function and sigmoid function. Each neuron, however, can have many input signals, but only one output signal.


In general, all neurons perform the same basic tasks. They collect inputs, assign strength to each input, sum them together, compare the result with some threshold level, and determine the type of output to produce.

3.2 Network Topology

Neurons do not exist alone. They are often connected and linked together to form a network. For instance, neurons can be placed adjacent to each other to form a layer. In a layer, all neurons will receive the same signal at the same time. Lateral connections mayor may not exist between neurons in the same layer.

Usually, in a network, there are at least two layers: the input layer and the output layer. The input layer's function is simply to load the current input data and hold them there (a form of buffering) for the output layer to process. Since the input layer does not perform any computation, it is not considered a legitimate layer. As such, the network is called a single-layer neural network.

If there are layers between the input and output layers, then the network is called a multi-layer neural network. Those layers in between are known as hidden layers.

The entire interconnectivity of layers and neurons and their interactions determine what is called the network topology. There are three main types of topology in existence:

1. Feedforward 2. Feedback 3. Competitive or feedlateral.

Infeedforward networks, data is passed forward from input to output. It can be seen as mapping an n-dimensional input to an m-dimensional output. For feedback networks, the output of the layer can be passed back to itself or to the previous layer as inputs. Feedback networks with closed loops are called recurrent networks. Last but not least, feedlateral networks have connections between neurons in the same layer. The signals in feedlateral networks are normally inhibitory or excitory in nature. As a result, neurons tend to compete with each other.


The concepts of layered networks and types of connectivity are often combined together to describe a network, examples of which include single-layer feedback networks and multi-layer feedforward networks. Figure 3 illustrates the concept of input, hidden and output units in a layered feedforward network. It should be noted that a layered feedforward network can have bi-directional connections thereby giving rise to a layered feedback network.

Input __ ~ units

Hidden units

Figure 3. Layered feed forward network.

Output 1+---

units

There are other ways of describing a network, such as the density of interconnections. The simplest case is the completely interconnected network. This is the most general case since the non-existence of a particular connection can always be emulated by omitting a connection from the activation function of the processing unit where the connection enters, or by setting the corresponding weight to zero. It also allows for the use of feedback structures. Figure 4(a) shows an example of a completely interconnected network, as compared to a feedforward network (Figure 4(b)).

3.3 Learning Rule and Paradigm

One of the strengths of neural networks is their ability to learn. When presented with a problem, they are able to generate their own function and methods without any human intervention or programming. The most obvious question every neural network beginner would ask is how a neural network learns. The answer lies in the weights of each


connection. Neural networks learn by adjusting the strength of these weights. The learning rule simply specifies how weights should be updated. Some common rules include:

• Hebb Rule • Delta Rule or Widrow-Hoff Rule • Perceptron Rule • Kohonen' s Rule • Backpropagation Rule

(a) (b)

Figure 4. Two kinds of network topology - (a) completely interconnected network, (b) feedforward network.

3.4 Learning Types

The types of learning can be categorized into three major classes:

• Supervised learning • Unsupervised learning • Reinforcement learning

Supervised Learning is an extreme case of learning as the quantity of information supplied is the largest among the three types. It is characterized by knowing exactly what response has to be associated with each pattern. For classification purposes, the response will be the exact class of each pattern. For functional mapping purposes, it is the function value. For forecasting purposes, it is the forecast value. The presence of the supplied information gives rise to the possibility of comparing the performance with the predefined responses, allowing changes to the learning system in the direction in which the error diminishes.


Unsupervised Learning is exactly the opposite of supervised learning, as no information is supplied. Since the system is given no information about the goal of learning, all that is learned is a consequence of the selected learning rule, together with the individual training data. As such, this form of learning is often referred to as self-organization.

Reinforcement Learning is a combination of supervised and unsupervised learning. In reinforcement learning, each pattern is provided with information (in a supervised manner), but this information is in a very restricted in form. It consists merely of a statement as to whether the response associated with a particular pattern is "good" or "bad". The learning algorithm has to make the best of this information, typically by simply making good associations more probable.

4 Implementation

4.1 The Workspace

The workspace consists of 2500 cells arranged in a 50x50 twodimensional format. Each cell contains one neuron. This means that the maximum number of layers in any network is 50, and each layer can have up to 50 neurons. The interconnecting links and the labeling of each neuron are created automatically. Users need only to activate the feature from the command menu. Parameters of the network such as weights, bias, inputs and outputs can be entered through the use of various dialog boxes associated with each neuron. Once created, users can have a complete picture of the network topology, interconnectivity and network parameters.

4.2 Algorithms

4.2.1 Hebbian

The form of Hebbian learning [9] implemented in this workbench is the forced (supervised) Hebb rule. This version of the Hebb rule is commonly used in pattern classification and pattern association applications. It is chosen as it provides a clear and simple demonstration of the learning process.


In Hebbian learning, one is concerned with how the weights change. Thus, the display will focus solely on providing the weight status at every instant of the training process. Users can choose to monitor the weights of up to 5 output neurons in real time. For each chosen output neuron, all the weights associated with it are shown in the form of a bar chart. These bar charts are updated after each set has been presented to the network and trained. The new weight and bias values are obtained from the following equations:

Wij (new) = Wij (old) + XiYj

bj (new) = bj (old) + Yj

where wij are weights and bj is a bias.

In the display of Figure 5, red is used to indicate that the weight is of negative value and light blue is used to indicate that the weight has positive value. If the bar chart is purely white, this means that the weight's value is zero. In either case, the actual value of the weight is printed to further enhance its readability.

For each output neuron, the weight that has the maximum magnitude is used as a reference for determining the heights of the other bar charts. The height of the bar chart, associated with other weights, will be in direct proportion to the maximum weight value. All past weights in the network can be viewed at any time by pressing the button with the caption "View All Weights." However, this new display will not be shown in real time.

4.2.2 Single Layer Perceptron

Single-layer perceptrons [10] attempt to linearly separate the output into various classes. In its simplest form, the output can consist of 2 basic classes, which are separable in 2-dimensional planes. One typical example of such applications will be the use of logic gates. Figure 6 shows the training progress for an OR logic function and Figure 7 shows the corresponding truth table of the OR function.

The line attempting to separate the two classes of output is updated after every cycle. This means, for the case of OR logic function, after the fourth set of input data has been presented the line is updated. The equation of the line can be obtained through the following formula:

300

Legend

Actual value of weights

List of output neurons whose weights will be monitored.

W.L. Goh and S.K. Amarasinghe

Bar chart showing the weight values. Color coding, as specified by legend is used to indicate the sign of the value.

List of selected output neurons

Figure 5. Hebbian training display.

The term WoXo refers to the product of the weight and input value of the first input neuron. Similarly, WIXI refers to the product of the weight and input value of the second input neuron. The term b refers to the bias attached to the output neuron. From the above equation, we can extract the necessary information to draw the separating line as follows:

Gradient = -Wo

WI

y-intercept = -b WI


The first type of output to be separated.

Line used to separate the two linearly separable types of output.

The second type of output to be separated.

Figure 6. Training progress for linearly separable output classes using the Single Layer Perceptron approach.

Input Output 1 1 1 1 0 1 0 1 1 0 0 -1

Figure 7. Truth table for OR logic gate.

If the training is successful, the line will linearly separate the two classes of output (see Figure 6). Although the above method is very feasible, it can be implemented only if the input values can be represented in a 2-dimensional environment. As the number of input neuron increases, so does its dimensionality. It will be very difficult, if not impossible, to show the separability in such a multi-dimensional environment. Thus, an additional way of representing the training is required.


The second method of showing the training progress is to give an indication of how the weights are converging (see Figure 8). Weight convergence is important as it is one of the basic indicators of the training process. Also, from the statistics obtained, it is possible to tell whether or not the training is approaching the desired solution.

indicating the percentage of convergence for various weights. The number indicates the actual percentage of convergence.

Figure 8. Training progress for multiple input single-layer perceptron.

It should be noted that the bar charts are not an accurate indicator for convergence. They are only an approximation derived from averaging the convergence of each individual weight associated with an output neuron.

4.2.3 Backpropagation

The learning mechanism of backpropagation [10] reduces the squared error between the target and actual output. With that in mind, the training interface is designed to show the average squared error (y-axis) after a certain number of cycles (x-axis). The algorithm uses the


average error as one of the terminating conditions as shown by the green line in Figure 9.

Line showing progress of average error.

Minimum average error required to stop the training

Figure 9. Training interface showing the average error for Back Propagation.

For each set of training data, an error value,

Error = Target - Actual Output,

is generated at each output neuron. The squared value of this error is summed up across all the output neurons on the output layer to produce a positive scalar known as Error -per _Pattern, which is defined as:

For each set of training data, there will be one value of Error -per _ Pattern. In other words, there will be as many values of Error yer _ Pattern as there are number of data sets. A summation of Error -per_


Pattern will be calculated at the end of one epoch (one epoch consists of one cycle through the full set of training data):

TotaCError = L( Error -per_Pattern )

Finally, the average value of the error, known as the Average Error (or A verage Squared Error), can be calculated by dividing the TotaCError by the total number of data sets, N:

Average Error = TotaCError / N

This value of the average error is displayed as a red line to show how the backpropagation learning rule reduces the error between the actual and the desired results. In short, it shows the convergence of the training process towards a desired state.

4.2.4 Kohonen Network

Unlike the previous three algorithms, Kohonen's [11] rule performs unsupervised learning, which involves competition between neurons on the output layer. As such, it is important to show which neuron has won the competition to learn about a particular set of inputs. The training interface uses squares to represent the output neurons' status. Blue squares represent a winner while yellow ones show the winner's chosen neighbors. Both winner and neighbors will participate in learning by adjusting their weights. The remaining white squares represent neurons that do not learn for this particular input set.

The weights associated with the winner neuron are also displayed, in the form of a vertical bar chart. To improve the accuracy of the display, the numerical value of each weight is included underneath each bar chart.

As can be seen from Figure 10, the bottom left portion of the training dialogue box shows an alternative form of displaying the training progress. However, it is only applicable to specific cases of twocomponent input vectors (networks using only two neurons for the input layer). Each data (input) can be plotted as a single point on a twodimensional plane. Similarly, each weight vector can also be represented as a point or dot on the 2-D plane. As such, it is very easy to see how Kohonen learning shifts the weights (interconnected green


dots) with respect to the inputs (red dots). This feature allows users to comprehend and analyze the real-time behavior of the network graphically. The lines connecting each green dot highlight each output neurons' neighbors, or lateral relationship.

Figure 10. An instant ofthe training interface for a Kohonen Network.

Kohonen's method uses a calculated variable known as the Euclidean distance to check its convergence status. After each set of input is presented, the squared Euclidean distance, D2, of every output unit is calculated. The unit with the smallest Euclidean distance is chosen as the winner. The squared Euclidean Distance is calculated as follows:

The winner's squared Euclidean Distance determined for every training set is given by:

DJ = L (weight;j - input; )2 ;

The total accumulated at the end of one cycle (a single iteration through .the full set of training data) can be expressed as:

DTOTAL = LD;;nner set

306 W.L. Goh and SK Amarasinghe

The average distance is simply the square root of DTOTAL divided by the number of sets (training data):

D - DTOTAL/ AVERAGE - 7total_set

The average distance displayed on the interface at the end of each cycle is the DAVERAGE calculated above. As mentioned earlier, it is used as one of the terminating conditions to measure the convergence of the weights.

4.2.5 ART! Network

The training display for ARTI [12] shows three things: the neuron selected as the winner, the top-down weights of the winner neuron, and the current input pattern. The top-down weights are shown to allow a comparison to be made with the input pattern. The tendency of the cluster's weights to follow the same pattern as its inputs is a unique feature of ARTI. In Figure 11, the second row of squares represents the current input pattern. Those in blue indicate that the input value associated with that neuron is equal to '1', and those in white represent a value of '0'. This is due to the fact that ARTI uses binary input. The last row of squares represents the top-down weights from the output. Similarly, the blue squares indicate that the weight has a value of '1'. The training display is updated as the training progresses from one set to another. Once the weights have stabilized, for each output neuron that is being highlighted in the first row, the third row of squares will show the common pattern of all the input patterns clustered to it.

5 Strength of Workbench over Available Programs

What exactly are the qualities that the workbench possesses that make it different from other neural network programs? The strengths of the neural network workbench over other available programs were analyzed by comparison with other well known software programs such as MATLAB [13] which are able to handle neural networks problems.


It has been found that the neural network workbench offers a number of advantages over the others. It offers first and foremost, a real time display of the training progress. For every fixed interval, the program will update the user on the status of the training. This will enable the user to take early actions if the training is not heading towards the desired solution. The purpose of incorporating real time display is to allow users to make changes if the training does not give any indication that it is reaching a solution.

Another very powerful feature of the workbench is that training can be stopped temporarily and changes made to the network before restarting.

The workbench is also able to give algorithm-specific display of the training progress. The displays are kept simple, and yet meaningful enough to convey the messages correctly. Such customized user interface enables users to relate immediately to the progress of training for the various. algorithms.

One unique feature of the workbench which is not available in most other programs is that it allows the user to view the network topology as he creates the network. It also gives the user the option to view the interconnections between neurons. These tasks are just a mouse click away.

Another plus point of the workbench is its ability to handle up to 48 hidden layers. This is a rather huge number of hidden layers, which implies that the network is actually able to handle more complex tasks.

The workbench comes with additional features that are not found in most other programs, like functions to zoom in and out of the network. The workbench also provides a special toolbar for easy access to the various functions, and this is not found in any of the other software. To facilitate real time display of the training progress, the workbench provides a timer function to control the rate of training. This is especially useful if it is necessary to follow closely the training progress. Also, to enhance the user interface, the workbench provides a time pane at the right bottom comer of the window. Although this does not have any direct effect on the accuracy of the program, it does improve its appearance.


6 Summary

A neural network workbench has been successfully implemented. One unique feature of this workbench is its ability to display, in real time, the progress of training of a neural network. It can be applied to real world applications such as pattern classification, function modeling and digital logic gates. In addition, each algorithm can be evaluated in terms of its efficiency, accuracy and suitable application through the use of the workbench. In terms of performance, it compares favorably with other well known neural network programs such as MATLAB.

References

[1] Fausett, L. (1994), Fundamentals of Neural Networks: Architectures, Algorithms, and Applications, Prentice-Hall.

[2] Davalo, E. and Nairn, P. (1991), Neural Networks, The Macmillan Press Ltd.

[3] McCord Nelson, M. and lllingworth, W.T. (1991), A Practical Guide to Neural Nets, Addison-Wesley Publishing Company, Inc.

[4] Mehra, P. and Wah, B.W. (1992), Artificial Neural Networks: Concepts and Theory, IEEE Computer Society Press, Los Alamitos, California.

[5] Rao, V., and Rao, H. (1995), C++ Neural Networks and Fuzzy Logic, 2nd Edition, MIS:Press.

[6] Chester, M. (1993), Neural Networks: A Tutorial, PTR Prentice Hall, Englewood Cliffs, New Jersey.

[7] Diamantaras, K.I. and Kung, S.Y. (1996), Principal Component Neural Networks, Theory and Applications, a Wiley-Interscience Publication, John Wiley & Sons, Inc.

[8] Hrycej, T. (1992), Modular Learning in Neural Networks: A Modularized Approach to Neural Network Classification, a WileyInterscience Publication, John Wiley & Sons, Inc.


[9] Hebb, D.O. (1949), The Organisation of Behaviour, John Wiley, New York.

[10] Eberhart, R.C. and Dobbins, R.W. (1990), Neural Network PC Tools - A Practical Guide, Academic Press Inc.

[11] Kohonen, T. (1989), "Tutorial on Self-organising Feature Maps," International Joint Conference on Neural Networks, Washington, D.C.

[12] Grossberg, S.A. (1988), Neural Networks and Natural Intelligence, MIT Press, Cambridge, MA.

[13] Demuth, H. and Beale, M. (1994), Neural Network Toolboxfor use with Matlab: User's Guide, The Math Works Inc.

CHAPTER 10

PRAM: A COURSEWARE SYSTEM FOR THE AUTOMATIC ASSESSMENT

OF AI PROGRAMS

C.A. Higgins and F.Z. Mansouri School of Computer Science and IT

University of Nottingham Nottingham, N07 2RD

u.K.

In industry, metrics are extremely important and are used to anticipate errors and problems, for instance. These frequently arise at a later stage during the use of products developed by teams of programmers and designers; applying metrics can thus save costs particularly for "maintenance." However, metrics are also useful in academia. For example they can be used in tools to measure students programs, improving learning, and allowing the marking and assessment of students' progress while learning a particular programming language.

The focal point of this chapter is the utilization of metrics in automatic marking tools, particularly for Prolog. For this purpose, PRAM (PRolog Automatic Marker) was developed at the University of Nottingham as part of the Ceilidh1 system (a courseware management system that also marks students' programs in a variety of other languages such as C and C++). PRAM is a system for marking student programs written in Prolog. The system presents a mark covering style, complexity and the correctness of a program to the student, along with some comments on hislher code and a breakdown of how the mark was formed.

Students find this extremely useful in correcting their mistakes and for giving feedback during the learning process. Lecturers find this useful

1 Ceilidh has since been renamed CourseMaster

312 C.A. Higgins and F.Z. Mansouri

for monitoring the progress of students and identifying students with problems at an early stage.

There are many other advantages to an automatic marker. For example, the teacher is alleviated from the burden of marking which consumes time that could be better spent on other activities such as assisting the students. Additionally students have immediate marking and feedback, thereby helping them to move on to the next assignment quickly and with confidence. The perfect tool to assist in teaching programming languages like Prolog would be a synergism of an automatic marker and an interactive debugger; this is, however, beyond the scope of the current work.

We first present the motivation behind our work and give an overview of existing metrics for software, and in particular, Prolog. The next section presents a full description of PRAM, where the different metrics used by the system are detailed and described. The penultimate section presents the first evaluation of PRAM after its use by students during the first semester of the academic year 199711998 and is followed by a conclusion.

1 Metrics for Prolog In order to highlight the usefulness of a system like PRAM, a brief survey on the state of Prolog in U.K. universities was undertaken. A summary of the results follows. Among the 29 universities that participated in the survey, 65% taught Prolog at an undergraduate level and used Prolog for research purposes. Another 21 % taught Prolog at undergraduate level without using it for research. Finally, 14% of the universities surveyed neither taught Prolog nor applied it for their research. Prolog is mainly taught in the 2nd and 3rd undergraduate years generally from an AI (Artificial Intelligence) perspective.

1.1 Existing Software and Prolog Measures

To develop metrics for our marking system, various well known, existing metrics, such as line of code and the Halstead measure (software science), were considered. Despite the many drawbacks of the

PRAM: A Courseware System for the Automatic Assessment of AI Programs 313

line of code metric [5], it can be a good indicator of the complexity of Prolog programs. However, the line of code metric is not considered as the sole indicator of the complexity of programs. Instead, it is combined with measures of other attributes such as the presence or absence of recursive constructs. Software science, is the most widely known complexity measure. It is a function of the number of operators and operands in the code. It was found that this metric is not applicable for Prolog programs. For instance, it is hard to distinguish between operators and operands. McCabe [28] developed another metric based on graph theory which is centered on the control flow of a program. McCabe considered the program as a directed graph in which the edges are lines of control flow and the nodes are segments of code. Using an AND/OR graph [7], this metric was tried on Prolog programs with limited success.

Henry and Kafura [16] produced a metric meant for large systems that relies on the information flow between the modules of a program. They considered the complexity of a procedure dependent on the code in the procedure and the connection between the procedures components. For the code complexity, a simple length measure was used and for the connection complexity, the fan-in and fan-out were considered. We do not think this metric is adequate for intelligible students' Prolog programs. In fact some programs might contain only one procedure with no fan-in. In this case the complexity would equal zero even if the procedure contained recursion or other features that make Prolog programs complex.

From the above, it can be concluded that the metrics developed so far for procedural languages are inadequate for a declarative language such as Prolog.

Not many metrics have been applied or developed especially for Prolog. Kaplan [20] attempted to discuss ways in which pleas for readable Prolog programming style may be written so that they may be clearly understood and applied. He presented many examples where he applied his/her pleas. He focused on what makes a Prolog program readable and unreadable.


Among the limited work about metrics and Prolog, Markusz's work is the best known. It focuses on the psychological complexity of logic programming that is defined as "the measure of the difficulty of the process of design, comprehension and modification of a program."

In their attempt to determine reliability for Prolog programs, Azem and Belli [1] defined two complexity measures: one relates to the static part of a program and the other one reflects dynamic characteristics. According to them, Prolog programs are composed of segments or procedures. The structural complexity of a segment was defined as the sum of the structural complexity of all its clauses. They found that the more complex a program is, the less reliable it will be. It should be noted that it is hard to automate the computation of this complexity metric for students' Prolog programs. Others such as Myers [29] and Matsumoto [26] have also looked at Metrics for Prolog with varying degrees of success.

1.2 Existing Marking Systems

AUTOMARK [31] is an experimental system developed in Canada for marking students programs in FORTRAN. The grading program relies on factors, norms, tolerances and interpretation-pairs. The marking process is similar to that used in Ceilidh (see below) and eventually in PRAM, but it adds tolerances defined by the teacher. First some properties (factors) are measured from a student program. Norms are derived from the model solution and the teacher defines tolerances and interpretation-pairs. This system marks the following: programming style, meaningful comments, correct and well laid-out output and adequate testing.

APROPOS or APROPOS2 (a later version) [23] is mainly a debugging system for Prolog programs developed at the University of Edinburgh, UK. Its main application is in tutoring via detecting errors in students program and proposing solutions. APROPOS assumes that the students have enough basic knowledge in Prolog to write syntactically correct programs and have an idea of Prolog's control flow. It is also concerned with exercises dealing with lists and number manipulations. It detects errors such as non-termination, infinite loops, wrong argument types and misspelling errors. Prolog programs written by students were used


as samples for a study. It was found that APROPOS was good at detecting algorithms used by the students and one of the main reasons was the nature of the programming task. APROPOS was also good at spotting bugs.

2 The CourseMaster (Ceilidh) System

We use the Introduction to programming 1 summary for testing purposes proQI'amrrlinli@l =======================================================

This course Is aimed at the complete novice to programming, and teaches the first essen~als if the Java language,

At the completion of this "Java" course, the student should know the baSic concepts of Java for use when a program consists of a single module, You should be fluent in writing simple programs,

The course is dIVIded Into 10 units.

Unit 1 : Introduction (2 lectures) 1.1. High level languages 1.2. Compilation versus interpretation 1.3. A briefhistory of C++ 1.4. A minimal C++ program 1.5. Creating, compiling and running your program 1.5,1. Creating the program 1.5.2. Compilation

Figure 1. The CourseMaster main window.

Ceilidh (now called CourseMaster) is a marking system that has been used at the University of Nottingham since 1988 (see Figure 1). It is a courseware management system that initially collected students' work against strict deadlines and progressed to comprehensively mark their computer programs. Advanced Ceilidh facilities pertain to auditing students' work and monitoring in-depth progress reports. Currently, Ceilidh marks exercises for the compiled programming languages C, C++ and very recently Java amongst many. Prolog and some other courses are being developed along with their marking sub-systems and their collection of exercises. The Prolog sub-system is called the


PRolog Automatic Marker (PRAM). Within Ceilidh students have a wide range of facilities available to them such as viewing and printing course notes, asking questions, commenting through electronic mail to the teacher and remarking their submitted work in order to improve its quality. Various metrics are involved in the marking process and these can be classified into two categories: static (style and complexity) and dynamic (complexity and correctness).

2.1 Metrics in PRAM

2.1.1 Style Metrics

Baker [2] says, "A bad program is one whose programming style is so poor that its opacity forces the reader to rewrite it from scratch, rather than going to the trouble to understand it and/or debug it." Style is perceived as an intuitive concept that is difficult to define or quantify because it is very subjective and depends on the programmer's taste. Many style rules have been defined as a result of indirect agreement among experienced programmers. However, the notion of style itself is hard to define not to mention measure. Table 1 shows some style rules or guidelines derived from experience and from works of [6], [7], [20] and [30]. These are the rules presented to students when they are learning Prolog programming for the AlP course at Nottingham.

These rules were chosen because they make a program readable. The only disadvantage found after applying these rules of style to Prolog programs is that the code tends to become longer.

The topic of style and Prolog could have been left at this point with no further analysis or quantification and the above guidelines would have been enough, with the hope that they will be observed by students while writing their code. However, the experience we had with students showed they do not follow guidelines even if they have been around for a long time. On the other hand a mark that results from a typographical analysis of their program could implicitly help them to follow those rules or guidelines.


Table 1. Style rules used in PRAM.

Write comments in while writing the program.

Comment should contain information such as what the program is about, examples of using the program, what are the user-defined predicates, the programs creator, date and time, programs name, execution time of the program, program limitations and a brief algorithmic description of the code.

Short comments should be written between % and end of line.

Short comments are put at the right side of the code or at the beginning of the procedure definition.

Long comments should be written between 1* and *1. Separate comments from code by white space (one or two before and after the comment).

Separate the code from long passage of comments.

Use descriptive comments before every procedure.

Put comments near cuts to specify red and green cuts and to explain why they were used.

Do not over comment by adding meaningless comments.

Add a space between the head of a procedure and the notation :-.

Put a space after each coma, opening parenthesis, opening square bracket and opening curly brackets and a space before closing brackets, closing parenthesis and closing curly brackets.

Separate each clause with the same head name by a single blank line.

Separate two procedures with different names by two or more blank lines.

Do not use too many blank lines, as this will increase the length of the code.

Use an uppercase letter to start the name of a variable.

Use anonymous variables denoted by "_" when necessary.

Variable names should be meaningful. They should be more than one letter long.

Predicate names should bear a relationship to the relation they describe.

Predicate names should not be more than about 20 characters long.

A void semicolons.

Avoid the built-in predicates: assert and retract.

Use green cuts rather than red ones.

Align the head of clauses at the left margin of a line.

Indent the body of a clause.

Put each condition in the body on a separate line except for cuts and new line "nl".

Relying on the idea of Berry and Meeking's [6] style grader for C programs, a style grader for Prolog program was developed. The


process of measuring style consists of performing a static analysis on the student's code and gathering a set of measures. These style measures are based on how a program conforms to the above set of style rules. The maximum marks awarded are in the case where the metric falls within a certain 'perfect' range (i.e., between Sand F as shown in Figure 2).

Percentage Contribution

Maximum

L s F

Figure 2. Marking trapezium.

H Metric Value Contribution

The weights of the characteristics described above and the values of L, S, F and H were a result of experience. L is the minimum value below which a score of zero is obtained. Values between Sand F are the exemplary range for the metric, where the student is awarded a maximum mark. H is the maximum value above, which no score is awarded. Values between [L, S] and [F, H], are scored via interpolation. This model for marking programs is very useful and flexible. For instance, the teacher has the freedom to alter these values according to the exercise or according to the level of students.

Regarding the rest of the metrics we use (i.e., the number of semicolons, percentage of user defined operators, average clause length, percentage of cuts, percentage of grammar definition, percentage of builtin predicates and percentage of built-in operators), a maximum mark is fixed, but the values of L, S, F and H are relative to the model solution presented by the teacher. It is quite difficult to find good values of S, L, F and H as this depends on experience and the type of the exercise.


2.1.2 Complexity

Despite a variety of complexity measures, very few have been directed toward measuring the complexity of Prolog programs. Relying on the definition of complexity proposed by Basili in [18], we consider the complexity of Prolog programs as a synergism of the measure of the resources expended by a system in interacting with a Prolog program and the measure of the effort required for a person to read and understand this same program. The effort expended by the machine is designated as the machine complexity or dynamic complexity and the human effort to understand the code as the human complexity.

It is difficult to isolate each factor affecting complexity. Eventually, the modification of one factor results in changes to others. For instance, reducing the length of a clause will reduce the line of code measure or getting rid of some cuts will require the alteration of the code to ensure the correct logic and thus reduce or increase its size. It is difficult to present a definite complexity measure without first trying to understand the behavior of the programmer and the program. In our case the programmer is a student who has no knowledge of Prolog but has knowledge of some procedural and object-oriented programming languages. It is believed that the complexity is affected by the programmer, the program (programming task) and the environment used. The measures gathered so far relating to Prolog programs are categorized in Table 2.

Table 2. Complexity categories.

Built-in identifiers' measures that involve the measurement of the percentage of user defined operators in total tokens, the number of semicolons, the percentage of "assert" in total built-in predicates, the percentage of "retract" in total built-in predicates, the percentage of "retractall" in total built-in predicates and the percentage of cuts in total built-in predicates.

User defined identifiers' measures that consist of computing the average number of arguments in predicates and the percentage of user defined predicates in total tokens.

Clause measures that entail the calculation of the average length of clauses, the average amount of recursion in number of clauses and the total lines of code.

Profiling measures that involve the average number of calls, the percentage of backtracking in total calls, the percentage of choice-points in total calls.


Before starting the complexity analysis of the student code there is one proviso: the code must be syntactically correct. The correctness test is then performed by running the program against test data. At this stage some profiling results such as the number of calls and the amount of backtracking are gathered.

The marking process for complexity is similar to the marking process for style. The above measures are gathered from the model solution presented by the teacher. The values of S, L, F and H are computed as it is done for style. For instance if from the model a measure M different from zero is gathered then, the values of L, S, F and H would respectively be M!3, 2M!3 , 4M!3 and 5M!3. In the case where M is equal to zero the values of L and S would be equal to zero and the values of F and H would respectively be M+l and M+2 if M is an average measure and M + 1 00 and M + 200 if M is a percentage measure.

2.1.3 Correctness

For testing correctness the student's program is run against test data provided by the teacher. Then a comparison between the output of the student's program and the expected output from the model solution is made via regular expression matching. This simple method was found to be both very practical and very effective.

Dynamic correctness attempts to measure a program's conformance to specification. Test data provided by the teacher are passed to the student's program and the output is assessed according to what the marking tool expect it to be (via regular expression matching). After the comparison of a program's output with the output of the model program the following counts are made: the number of data expected and not found (edn), the number of data not expected, but found (ned) and the number of data expected and found (edf). In order for a program to be correct the measures (edn) and (ned) should equal zero and the measure (edf) should be greater than or equal to one.

We have discovered that the set of test data must be very carefully chosen to cover as many and as wide a range of test candidates as possible.


3 Results

3.1 Subjects and Data

Two sets of results are analyzed and an evaluation of the system is given. The subjects of this research were 2nd or 3rd year undergraduate students in computer science who had no previous knowledge of Logic Programming or Prolog. Most of them had a procedural or an object oriented view of programming, but not a declarative one.

The results spanned two successive academic years. In the year 1996/1997 around fifty students attended the course. They were given approximately two assignments per unit and a mark was awarded manually. Students had access to the specification but not to test data nor to skeleton solutions. Help was provided indirectly through e-mail or directly through the course assistant during lab sessions. The students were forwarded the general style guidelines along with some comments on how to avoid writing too complex a Prolog program. The marking focused on style, complexity and correctness.

In the year 1997/98, thirty-five students attended the course. They were given approximately two assignments per unit that were marked automatically and immediately by PRAM. The students had access to the specifications; a program skeleton and the test data. They were encouraged to request additional help through e-mails. A general style guideline, a help facility on most Prolog built-in predicates and operators and a brief description of the main metrics considered were provided on-line. As for the previous year, marking focused on style, complexity and correctness.

For both academic years, students were presented with around seventeen exercises and a large project at the end of the course. The difficulty of those exercises increased during the course with the introduction of new concepts and language constructs. The final project was to write a check parser using the Definite Clause Grammar (DC G) facility in Prolog. The solutions to this final problem ranged from programs of about 100 to 270 lines of code.


3.2 Interaction with Prolog

Figure 3 summarizes the results of a student survey. The main difficulty the students faced was to move from a procedural view to a declarative one. The students that found Prolog easy liked the simplicity with which Prolog expresses complex reasoning and the logical approach used to solve most problems. Other students liked grammar (DCG) in Prolog and the power of recursion.

oil

80

70

E 60 01

~ 50

~ 40 CIS

'll 30 Q) u ~ 20 III

10

o

Comparing learning Prolog to learning other languages (96/97-

97198)

easier harder Different

Figure 3. Difficulty of Learning Prolog.

Students were also asked to rank different aspects of Prolog. In both years, students found that backtracking was one of the hardest concepts to understand and recursion was one of the easiest.

3.3 Interaction with PRAM

3.3.1 Indirect Interaction 199611997

PRAM was applied retrospectively to the 1996/1997 year's students' code in order to gather some preliminary results. PRAM was evaluated on over three hundred exercises, then a comparison was made between the subjective (human) and objective (computer) marks for each


exercise. The results were encouraging. For example, 53% of PRAM's marks (objective) were similar to those marked manually (subjective), with an error margin of five points between them. 37% of the exercises had an error margin greater than five points. On close inspection of the programs, it appeared PRAM was outperforming the manual marking in areas such as style. We attribute this to the attention to detail that automatic markers can bring to static analysis and what human marking invariably overlooks. PRAM was also good at detecting errors, such as the wrong redefinition of some built-in predicates. Additionally PRAM discovered some infinite loops that were sometimes disregarded by manual marking.

With traditional marking the teacher tries to understand the techniques the students use; this is for the moment beyond the scope of PRAM. Another disadvantage of PRAM, is that it puts some constraints on the student exercise. For instance, if students change the main running predicate name, it will be detected as an existence error and they could get a bad mark when in fact their code is nearly correct. Finally, on 10% of the exercises we found the marks awarded by PRAM to be inappropriate. By way of illustration, sometimes PRAM found errors that did not exist in the code or was too severe in its analysis of correctness. Another main area where PRAM needs improvement is in the help it presents to students. PRAM gives very general errors messages, but doesn't specifically indicate where the trouble is or how to fix it.

3.3.2 Direct Interaction 1997/1998

During the year 1997/98 PRAM was used by students throughout the first semester. We conducted a small survey to elicit students' thoughts on our marking system and how useful they found it for learning Prolog. Additionally, students' e-mails and inquiries have been analyzed in order to reveal PRAM behavior in dealing with their problems and queries.

Comments on PRAM. The students were asked whether they found PRAM a useful tool and whether the marking was accurate. 23% of them thought PRAM was useful for learning but most of them (54%) agreed that it needs more feedback and help facilities. Conversely, 23%


of the students found the marking inaccurate; they complained that the automatic marks did not correspond to their expectations. To the question: "lfyou think PRAM needs to be changed, what are the things that need to be altered?", the majority of students (59%) thought that PRAM needed a detailed feedback mechanism that points to the exact problem in their code and proposes a method to alter the code. Another 8% of students wanted more explanation on the complexity measure and 25% of them thought that solutions should be made available after the deadline for submissions. Another quite interesting point that some students (8%) raised is that PRAM tried to force them to a desired solution not allowing them to use their own methods. Despite some expected imperfections of PRAM, most of the students (64%) preferred automatic marking just after submission to the manual marking with more feedback, but many days after submission.

Comments on the course. Students were asked to comment on the whole AI programming course (AlP). Most students (82%) found the level of difficulty of the course just right and only 18% found AlP a hard course. On the other hand, no one found the course too easy. They thought that too much coursework was required. On the other hand, some of them think it is a good revision for exams to have a great number of exercises. They complained that too much help was provided in the skeleton, which was sometimes misleading, especially, if they choose a method different from the one proposed in the skeleton.

Errors (bugs) and problems. The most difficult errors some of the students encountered were forgetting to add cuts or adding them in the wrong place. Another major problem area was complexity. The students did not fully grasp this measure which meant they did not know how to reduce the complexity of their code. The students also had trouble understanding the tracing process in SICStus [32] especially when it got quite long.

Analysis of electronic mail. Students addressed ninety e-mails to the teacher during AlP course (year 1997/98). These addressed such diverse areas as: administrative matters (e.g., postponing deadlines, extending the number of submissions); operator precedence; dissatisfaction with the marks given by PRAM; more details in the specification of the exercise; issues of complexity; test data; infinite loops; missing tests;


existence errors; expected output and typographic specifications. 77% of the answers to those e-mails on complexity suggested a modification to the students' code in one way or another and only 23% were asked to resubmit their code as it is, because it was PRAM, which made a wrong evaluation of their solution.

Comments on marking. In order to investigate PRAM's marking in detail, seventy-seven samples were randomly gathered from the 97/98 students' solutions and analyzed thoroughly by hand. Complexity and style metrics were looked at in detail. Marks varied from zero to one hundred per cent.

In the area of style, 82% of the samples were classified as 'good', which means that the manual marking and the automatic one are in concordance. Of these, 16% were samples that were classified as 'good' in terms of marking, but the feedback or remarks that PRAM presented were misleading and did not provide much help. 18% of the seventyseven samples were classified as 'bad' because the automatic marking was inappropriate and a gap was noticed between the automatic and manual marking. Of these, 6% of the marks should have been higher than PRAM's marking and 8% should have been lower.

In the area of complexity, 78% of the marking was classified subjectively by the authors as 'good', from which 9% had inappropriate feedback or remarks from PRAM. 22% of the marking was classified as 'bad'. Here, 5% of marks should have been higher marks and 10% should have been lower. The remainder consisted of 'bad' marking with misleading or ambiguous comments from PRAM.

4 Conclusion Given the above results an important question is whether metrics were useful in addressing student difficulties and whether PRAM was beneficial?

An objective answer is difficult to formulate. However, after analyzing the marking results, students' questions and student questionnaires, it is apparent that the system performed well in helping students learn


Prolog. Furthermore, while PRAM lacked the ability to suggest good modifications to a buggy solution, it was good at sensing errors in the student programs and most students preferred an automatic instant marking system to slower human marking even if the latter had slightly more feedback. Metrics such as complexity were among the most difficult to quantify and the most ambiguous to comprehend for students. We believe this is due to the samples studied which were relatively simple and short, and thus very hard to measure and comprehend in terms of complexity.

Despite some limitations, PRAM achieved its primary goal, which is to mark and assess students' Prolog programs. Additionally, PRAM provides a guide to future research and a basis for developing an automatic tutor for teaching Prolog. This automatic tutor will be used not only for marking and assessing, but it will also provide more detailed and helpful feedback to students. For instance, it might point the student to the exact position of a coding error and suggest several different solutions or methods to solve the student's problem. Furthermore, PRAM could be extended to industrial programming and its metrics could be developed for large projects in AI.

References

[1] Azem, A., Belli, F. and Jedrzejowicz, P. (1994), "Reliability prediction and estimation of Prolog programs," IEEE Trans. on Reliability, Vol. 43, No.4, December.

[2] Baker, H.G. (1997), "When bad programs happen to good people," ACM SIGPLAN Notices, Vol. 32, No.3, March.

[3] Basili, V.R. and Rerricone, B.T. (1984), "Software errors and complexity: an empirical investigation," Communications of the ACM, Vol. 27, pp.42-52.

[4] Bental, D (1993), "Why doesn't my program work? Requirements for automated analysis of novices' computer programs," Workshop on automated program understanding AI&ED 93, World conference on AI in Education.


[5] Beizer, B. (1990), Software testing techniques, 2nd Edition, International Thomson Computer Press.

[6] Berry and Meekings (1985), "A style analysis of C programs," Communications of the ACM, Vol. 28.

[7] Bratko, I. (1990), Prolog programming for Artificial Intelligence, 2nd Ed. Addison-Wesley.

[8] Bronowski, 1. (1973), The Ascent of Man, Little, Brown & Co., BostonIToronto.

[9] Calani Baranauskas, M.C. (1995), "Observational studies about novices interacting in a Prolog environment based on tools," Instructional Science, Vol. 23, pp.89-109.

[10] Collins English Dictionary, HarperCollins Publishers.

[11] Covington, M.A. (1985), "Eliminating loops in Prolog," ACM SIGPLAN Notices, Vol. 20, No. 1.

[12] Curtis (1979), "In search of software complexity," Workshop on quantitative software models for reliability, pp. 95-106.

[13] Evangelist, W.M. (1983), "Software Complexity metrics sensitivity to program structuring rules," Journal of system and software, Vol. 3, pp. 231-243.

[14] Fenton, N. (1991), Software metrics: A rigorous approach, Chapmann & Hall, London.

[15] Foxley, E., Higgins, c.A. and Burke, E. (1996), "The Ceilidh system: A general overview 1996," Monitor, CTI COMPUTING, newsletter Vol. 7.

[16] Henry and Kafura (1981), "Software structure metrics based on information flow," IEEE Transaction on Software Eng. Vol. SE-7 (5), p. 510-518.


[17] Darbydownman, K. and Little, K. (1997), "Critical factors in the evolution of logic programming and Prolog," European Journal of Information Systems, Vol. 6:1, pp. 67-75.

[18] Joseph, K. et al. (1986), "Software complexity measurement," Communications of the ACM, Vol. 29, pp. 1044-1050.

[19] Kaposi, A., Kassovitz, L. and Markusz, Z. (1979), "PRIMLOG, a case for augmented Prolog programming," Proc. Informatica, Bled, Yugoslavia.

[20] Kaplan, M. (1991), "A plea for readable pleas for readable Prolog programming style," SIGPLAN Notices, Vol. 26:2, pp. 41-50, Feb.

[21] Kearney, J.K., Sedlmeyer, R.L., Thompson, W.B., Gray, M.A. and Adler, M.,A. (1986), "Software complexity measurement," Communications of the ACM, Vol. 29, pp. 1044-1050.

[22] Kernigham, B.W. (1981), Software tools in Pascal, Prentice Hall.

[23] Looi, C.-K. (1991), "Automatic debugging of Prolog programs in a Prolog intelligent tutoring system," Instructional Science, Vol. 20, pp. 215-263.

[24] Mansouri, F.Z. and Higgins, c.A. (1997), "Prolog: An annotated bibliography," ACM SIGPLAN Notices, Vol. 32:9, pp. 47-53.

[25] Markusz, Z. and Kaposi, A.A. (1985), "Control in logic-based programming," Computer Journal, Vol. 28, pp. 487-495.

[26] Matsumoto, H.A. (1985), "Static analysis of Prolog programs," SIGPLAN Notices. Vol. 20:10, pp. 48-59, Oct.

[27] McCauley, R.A. (1992), Conceptual complexity analysis of logic programs, PhD thesis.

[28] McCabe, T.J. (1976), "A complexity measure," IEEE Transaction on software Engineering, Vol. SE-2:4, Dec.


[29] Myers, M. (1989), "Structural modelling of Prolog for metrication," Proceedings of the 2nd European software engineering conference(ESEC), SPRINGER, Coventry, UK 387, pp. 351-375, May.

[30] O'Keefe, R. (1990), The Craft of Prolog, MIT press.

[31] Redish, K.A., Smyth, W.P. and Sutherland, P.G. (1984), "AUTOMARK - An experimental system for marking student programs," Proceedings of CIPS, Calgary, Alberta, Canada, pp. 43-46, Canadian Information Processing Society, May.

[32] (1995), SICStus Prolog User's Manual, Swedish Institute of Computer Science, Release 3#0, June.

[33] Wohlin, C. (1996), "Revisiting Measurement of software complexity," Proceedings ASIA Pacific Software Engineering Conference, Seoul, South Korea, pp. 4-7, Dec.

INDEX

- A-

analytical tools, 46 applicative stage, 126 ART! network, 306 artificial intelligence (AI), 140

fundamentals, 55 modeling, 31-83 programs, 311-326 techniques, 87 -1 03

artificial neural networks, see neural networks

authoring tools, 192 automatic assessment, 311-326

- B-

backpropagation, 302

- c-

category relations, 108 communicative stage, 127 constructivism, 19 control,

damping control, 268 fuzzy logic control, 263, 264,

269 speed control, 269 voltage control, 267

CourseMaster (Ceilidh), 315 courses, 14, 105-132,235-257

interdisciplinary, 87-103 courseware system, 311-326 crossover, 42

- D-

damping control, 268 data analysis scheme, 50

data collection, 56 data preprocessing, 57 development tools, 44 distributed intelligent systems, 158 DON,222

-E-

education, 13 electric power systems, 261-286 empiricist-inductivist model,S Eon tools, 173 evaluation, 68

information, 113 evolutionary computing, 18 expansion phase, 100 expert systems, 17, 111, 163 exploration phase, 98

-F-

falsificationism, 6 fitness evaluation, 42 flexible learning, 21 FLUTE,223 functional relations, 110 fuzzification, 66 fuzzy expert systems, 38 fuzzy logic, 19,261-286

control, 263, 264, 269 stabilizing controller, 267

fuzzy neural system modeling, 60 integrated FNN model, 77 network architecture, 63 network evaluation, 68 network testing, 68 network training, 68

fuzzy systems, 35, 37, 61 fuzzy transformations, 51

332

- G-

generic frame protocol, 167 genetic algorithms, 35, 40, 52, 59, 98,

163 hybridization, 43 implementation, 43

GENETICA Net Builder, 45 GET-BITS model, 197,210

tools, 219 GKB-Editor, 168 graduate level, 135-179 graphical user interface, 279

- H-

Hebbian rule, 298 hierarchical modeling, 195 hybrid intelligent systems, 139 hybridization, 39, 43

- I -

innovative modeling, 189-227 instructional model, 117 integrated fuzzy neural network, 77 intelligent databases, 155 intelligent reasoning, 153 intelligent tutoring systems (ITSs),

105-132, 189-227 basic architecture, 107 design, 203 ontologies, 214 shells, 192 traditionalITSs, 190

interdisciplinary science course, 87 Internet, 140 interoperable software components,

203 invention phase, 99

- K-

knowledge base, 108

knowledge-based intelligent paradigms, 16 techniques, 19,21,23

knowledge-based tutors, 173 knowledge modeling, 135-179

object-orientated, 143 knowledge navigation, 118 knowledge processing, 150 knowledge processor pattern, 151 knowledge representation, 142 knowledge sharing, 147 Kohonen network, 304

- L-

laboratory assignments, 242 laboratory component, 242 learning, 1-24, 31-83, 289-308

flexible learning, 21 learning cycle, 96 learning rule, 296 learning types, 297 problem-based learning, 23

lectures, 240 logical positivism, 6 Loom, 165

-M -

marking systems, 314 MatLab,261-286

NN toolbox, 254 membership relations, 110 multilayer, 237 multimedia, 139 mutation, 42

- N-

network training, 46

Index

neural networks, 17, 35, 43, 56, 105-132,235-257

architecture, 35, 48 basic components, 293 models, 60, 293 network topology, 295

Index

simulators, 245 training, 48 workbench, 289-308

NeuraISIM, 50, 73 NeuralWorks, 247 Neuro-Forecaster, 44, 70 neuro-fuzzy network, 45 NeuroSolutions, 54, 76

- 0-

object-orientation, 143 observation, 10 Ontolingua, 168 ontologies, 147, 212

basic concepts, 212 GET-BITS model, 215 ITS ontologies, 214

- p-

parent selection scheme, 42 PARKA,170 PARKA-DB,170 patterns, 151 pedagogical context, 243 perceptive stage, 125 perceptron, 299 polar information, 263 population initialization, 41 positivism,

logical positivism, 6 PowerLoom, 165 PRAM,311-326

interaction, 322, 323 metrics, 316

prediction, 77, 79 problem-based learning, 23 processing element, 293 productive stage, 129 Professional II Plus, 48, 72, 247 programming examples, 161 Prolog,

interaction, 322 metrics, 312

- Q-

question generation, 119

- R-

real-time simulator, 280 configuration, 281 Matlab/Simulink based, 282 testing, 284

333

research-based innovative teaching, 96 reusable software components, 203 revision, 14

- S-

science, 3, 9, 12, 13, 14 interdisciplinary course, 87-103 education, 13

scientific method, 5 simulation-based design, 171 simulators, 245, see also real-time

simulator Simulink,261-286 social construction, 13 software,

components, 203, 204 features, 207 in GET-BITS, 210

design and architectures, 136 speed control, 269 stabilization, 261-286 student actions, 117 student concept map, 113 student knowledge-domain, 112 student modeling expert system, 111 style metrics, 316 subject knowledge base, 108 system monitoring, 48

-T-

teaching, 1-24, 31-83, 289-308 course, 235-257 knowledge modeling, 135-179 research-based, 96

334

teaching-learning process, 125 technology, 3, 14

education, 13 textbooks, 242 theory dependency, 10 transient stability simulation, 272, 273

typical results, 277 tutoring, 105-132, 173, 189-227, see

also intelligent tutoring systems

-u-

unification, 144 user interface, 120

- V-

variable selection scheme, 59 voltage control, 267

-w-

weather forecasting, 55 weather prediction, 56 windowing feature, 46

Index

Studies in Fuzziness and Soft Computing

Vol. 25. J. Buckley and Th. Feuring Fuzzy and Neural: Interactions and Applications, 1999 ISBN 3-7908-1170-X

Vol. 26. A. Yazici and R. George Fuzzy Database Modeling, 1999 ISBN 3-7908-1171-8

Vol. 27. M. Zaus Crisp and Soft Computing with Hypercubical Calculus, 1999 ISBN 3-7908-1172-6

Vol. 28. R. A. Ribeiro, H.-I. Zimmermann, R. R. Yager and J. Kacprzyk (Eds.) Soft Computing in Financial Engineering, 1999 ISBN 3-7908-1173-4

Vol. 29. H. Tanaka and P. Guo Possibilistic Data Analysis for Operations Research, 1999 ISBN 3-7908-1183-1

Vol. 30. N. Kasabov and R. Kozma (Eds.) Neuro-Fuzzy Techniques for Intelligent Informations Systems, 1999 ISBN 3-7908-1187-4

Vol. 31. B. Kostek Soft Computing in Acoustics, 1999 ISBN 3-7908-1190-4

Vol. 32. K. Hirota and T. Fukuda (Eds.) Soft Computing in Mechatronics, 1999 ISBN 3-7908-1212-9

Vol. 33. L. A. Zadeh and J. Kacprzyk (Eds.) Computing with Words in Information! Intelligent Systems 1, 1999 ISBN 3-7908-1217-X

Vol. 34. L. A. Zadeh and J. Kacprzyk (Eds.) Computing with Words in Information/ Intelligent Systems 2, 1999 ISBN 3-7908-1218-8

Vol. 35. K. T. Atanassov Intuitionistic Fuzzy Sets, 1999 ISBN 3-7908-1228-5

[studies in fuzziness and soft computing] innovative teaching and learning volume 36 ||

Documents