computer systems lab tjhsst current projects 2004-2005 second period

271
Computer Systems Lab TJHSST Current Projects 2004- 2005 Second Period

Upload: lesley-marsh

Post on 17-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Computer Systems LabTJHSST

Current Projects 2004-2005Second Period

Page 2: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

2

Current Projects, 2nd Period

• Casey Barrett: The Design and Implementation of Decicion Trees for Career Guidance

• Andy Desoto: Modeling of Evacuation Centers using NetLogo

• Keith Diggs: Optimizing Genetic Algorithms for Use in Cyphers

Page 3: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

3

Current Projects, 2nd Period

• John Fitzsimmons: Creating a 3D Game With Textures and Lighting

• Christopher Goss: Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

• Stephen J. Hilber: Modeling Evolving Social Behavior

• Rachel Miller: Use of Machine Learning to Develop a Better Artificial Intelligence for Playing Tic-Tac-Toe

Page 4: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

4

Current Projects, 2nd Period

• Kyle Moffett: The Analysis and Maintenance of a Robust, Highly- Available Computer Systems Laboratory

• Jason Pan: Developing an AI Player for "Guess Who?"

• Madeleine E. R. Pitsch: Study of Creating Compuattional Models of Traffic

• Timothy Wismer: Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel Debugging User-space API Library (KDUAL)

• Dan Wright: Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Page 5: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Decision Trees for Career Guidance

This research project will be an investigation into the design and implementation of

various decision trees for career guidance. A decision tree takes into account some sort of situation outlined by a group of parameters

and outputs a Boolean decision to the situation. This project will take into account many aspects associated with decision trees including database building, searching and sorting, and algorithms for accessing data.

My project utilizes numerous decision trees in an effort to serve as a tool for career

guidance for young adults. A user will fill out a form of specified fields that will then be analyzed by the group of decision trees until a field of study/occupation is given to

the user as the outcome. This group of decision trees will be built through database

building techniques. 5

Page 6: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey BarrettAbstract

This research pro ject will be an investigation into the design and implementation of various decision trees for career guidance. A decision tree takes into account some sort of situation outlined by a group of parameters and outputs a Boolean decision to the situation. This pro ject will take into account many aspects associated with decision trees including database building, searching and sorting, and algorithms for accessing data. My project utilizes numerous decision trees in an effort to serve as a tool for career guidance for young adults. A user will fill out a form of specified fields that will then be analyzed by the group of decision trees until a field of study/occupation is given to the user as the outcome.

Page 7: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey Barrett

This group of decision trees will be built through database building techniques. I have utilized extensive research from various websites that offer expertise in a number of areas. I used tutorial websites that had examples and explanations of decision trees generated by the C4.5 program. I have also tried to read various research papers that deal with decision trees. Although I have yet to find one that remotely relates to career guidance, I feel that my understanding of decision trees has increased as a result of these research papers.

Introduction

This pro ject will utilize numerous decision trees to assist young people by helping them focus on their interests and what career paths may coincide with those interests.

Page 8: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey BarrettBackground/Implementation

A decision tree is a graphical representation of the decision analysis process. This type of tool consists of some sort of input, whether it is a situation or an ob ject. This input is then sent through a set of parameters, or rules, and eventually the tree gives a Boolean output. There are many different types of parameters that can be used. Or in other words, many different types of parameter cases can be used. These cases can include numerical data, simple yes/no answers, or word answers, such as hair color (black, brown, or blonde). Each parameter will have a specified set of cases that correspond to the parameter.

Page 9: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey Barrett

To help me better understand decision trees and how to build them, I have utilized Professor Ross Quinlans revolutionary decision tree generator programs, the C4.5. What these programs do is take into account large sets of data (from properly written database files) and looks for correlations within the sets. It then uses these correlations to build a decision tree that follows the 'rules' outlined by the correlations. In an auxiliary program, the C4.5rules program, the 'rules' for the tree generated can be displayed. This program gave me a better understanding of how decision trees are built. I wrote my own database files that could be read by the C4.5 program. I learned the proper syntax for these database files, which I would later utilize for career guidance.

Page 10: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey Barrett

For each database, two separate kinds of files are needed, the .names file, which outlines each parameter and the appropriate cases as well as the end cases, and also the .data file, which consists of singular entries. Each entry in the database properly fills up each parameter outlined in the .names file. I also had to research techniques for career guidance. The most rudimentary of these techniques was to have a user fill out a list of field data and then compare the user's answers to those of a highly comprehensive database. Some of these questionnaire-type devices included many different occupational fields, such as art/music, engineering, writing/journalism, and social services. For my pro ject, I decided to start out with two very distinct fields that would be a good way to acclimate myself for making a career guidance program.

Page 11: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey Barrett

These two fields would be the liberal arts field and the sciences field. For career guidance, I separated the decision trees that I would need into three different and distinct trees. The first one is a tree designed to help a user decide whether they should focus on either of the broad intellectual categories of the sciences or the liberal arts. This decision will be decided based a series of fields the user fills out in a separate C++ program. Some of these fields include the user's grades in their current English and Math class, if the user is in a science club, the number of computers that the user owns, and the number of plays the user has participated in during the last year. These starting questions are somewhat broad because this is the first preset tree that the user will be compared to.

Page 12: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey Barrett

The next step was to develop a user input program that prompts the user questions related to career guidance. The program then takes the answers and writes them to a .data file. I wrote this program in C++. Likewise, I needed to use the fstream.h library in order to gain access to the classes ifstream and ofstream. I then created the questions test file that became the ifstream that the program will read. This program contained four functions, better outlined in the Iteration Report: Third Quarter. Currently, the only acceptable inputs for this program are strings, however, during the fourth quarter, I will expand the inputs to include classes char, int, and double. The user input is first compared to a database of other people's answers and the decision that each of the other people decided upon.

Page 13: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey Barrett

Since I have not had enough users, the databases are fictitious data that I have engineered in order to fit my ideal decision tree for this progression.

The data from this database helps comprise a decision tree that the user's answers will be run through. Since there are three different decision trees, three separate ideal databases will be made, one for the broad first test, and then one each for the science fields and the liberal arts field. After a first decision is made, it will be relayed back to the user. If the user wishes to continue, he or she will be given another set of parameters that will be more in depth in either the liberal arts or science fields.

Page 14: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey BarrettThese trees, which are still in progress, will help the user compare their answers against another database corresponding to their broad interests (liberal arts or sciences).

When compared to these more specialized databases, a more focused decision was sent back to the user. This is because the user's responses will be matched up with the decision tree that the databases helped generate and the decisions that the databases output. Right now, I am working on the algorithm that compares the user input answers to the decision tree. This algorithm will produce some sort of numerical correlation between the sets of answers. The higher the correlation number is with respect to that specific path of the decision tree, then the more likely the program will output an answer similar to that of the higher correlation.

Page 15: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey Barrett

In the end, a type of occupation or a field of study will be output to the user based on the correlations. Also, I have trained extensively in the ways of the Reverse Game. Without possessing a shred of natural Reverse Game ability, I have worked my way up from the ranks of novice to a somewhat respectable player capable of defeating each and every of the 20 progressively challenging levels.

Page 16: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Design and Implementation of Decicion Trees for Career Guidance

Casey BarrettConclusions References

Decision Trees: A subtopic of machine learning. http://www.aaai.org/AITopics/html/trees.html Quinlan, Ross. (n.d.) Ross Quinlan Personal Page. http://www.rulequest.com/Personal/ C4.5 Tutorial. http://www2.cs.uregina.ca/~hamilton/courses/831/notes/ml/dtrees/c4.5/tutorial.html Career Key: Job Interests. http://www.careerkey.org/cgi-bin/ck.pl?action=choices Hettich, Scott. (n.d.) UCI Machine Learning Repository http://www.ics.uci.edu/~mlearn/MLRepository.html Nilsson, Nils. (May 10, 2004). Introduction to Machine Learning http://robotics.stanford.edu/people/nilsson/mlbook.html

Page 17: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

17

Modeling of Evacuation Centers

Using NetLogoModeling is a powerful tool that allows a programmer or social

engineer to observe cause-and-effect relationships in occurences that a)

happen too slowly or quickly to see, b) involve danger or safety concerns,

c) occur on a scale too large or too small for study, d) is not a common

occurrence. Using NetLogo, a multi-agent

programmable modeling environment, the socio- and psychological factorsaffecting decision-making in these

situations can be effectively simulated.

Page 18: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSotoAbstract

The actions of citizens during terror and evacuation events are oftentimes hard to predict. Using NetLogo, a 'crossplatform multi-agent programmable modeling environment' from The Center for Connected Learning and Computer-Based Modeling (CCL), the socio- and psychological factors affecting decisionmaking in these situations can be effectively simulated. Through appropriate research in categories of modeling and sociology, human behavior can be studied to help urban developers and social engineers protect the nation's interest: its citizens. Citizens that follow certain algorithms, or behave in certain ways, have a much greater chance of survival.

Page 19: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

I . Introduction to NetLogo

NetLogo is a cross platform multi-agent programmable modeling environment created by Northwestern University's Center for Connected Learning and Computer-Based Modeling (hereafter CCL). It was designed specifically for simulation of social and natural phenomenon. Originally StarLogoT, this language and graphical user interface hybrid offers a multitude of possibilities for researchers and students, allowing users to examine interactions and behaviors on both micro and macro scales.

Page 20: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

NetLogo is designed with accessibility in mind; the program allows for quick Java applet exportation and interactivity with CCL's HubNet interface, which lets the user run participatory simulations in which a class or group of testers takes part in enacting the behavior of a system as each user controls a part of the system by using an individual device (such as a TI-83+ calculator or a networked computer). NetLogo comes packaged with an expansive models library and an assortment of code samples for easy access. A. NetLogo Graphic User Interface NetLogo's main advantage is that it allows a quick and relatively simple way for the programmer to implement a convenient graphical user interface (hereafter GUI).

Page 21: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

A NetLogo GUI can contain the following components: buttons, sliders, switches, choices, monitors, plots, texts, and outputs. 1) Buttons: Buttons activate on the user's command. There are two types of buttons, standard buttons and forever buttons. Standard buttons activate the contained code on click, but forever buttons execute the contained commands repeatedly until the button is clicked a second time, mimicking a whileloop.

Page 22: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

2) Sliders: Sliders allow the user to manually select the value of a certain variable. Sliders control the minimum and maximum values, as well as the increment of increase/decrease, while the user chooses the value. 3) Switches: Switches allow for a user to directly affect a boolean value. 4) Choices: Through the use of choices, the user can select certain values, Strings, and more that the programmer allows. 5) Monitors: Monitors allow the user to examine the value of a variable as the simulation runs. 6) Texts and Outputs: Texts and Outputs display certain messages the programmer encodes. These can be displayed at any time during the simulation.

Page 23: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

B. HubNet and Other Extensions NetLogo's versatility allows programs to be accessed and used in environments outside the computer laboratory setting. HubNet, one instance, allows a multitude of users to interact with a single model through the use of Texas Instruments TI-83+ calculators or networked computers. This is ideal for classroom settings because it allows each tester involved to affect the model in some way. Also, NetLogo offers easy exportation to Java Applet format, and with several keystrokes, an applet embedded in an HTML page can be created.

Page 24: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

I I . Introduction to Modeling

Modeling is a powerful tool that allows a programmer or social engineer to observe cause-and-effect relationships in occurences that a) happen too slowly or quickly to see, b) involve danger or safety concerns, c) occur on a scale too large or too small for study or d) is not a common occurrence. Armed with this knowledge, a scientist can use the material learned to help a community, understand a cause, or solve a problem. A perfect use for the modeling technology of today is a subject that meets all of these above criteria.

Page 25: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

The modeling of population centers during evacuation situations is perfect for such a study, because evacuation frequently occurs too quickly, is quite dangerous, happens rarely, and happens on such a wide scale that observing patterns is nearly impossible. In order to accurately code such an event, certain features must be researched.

A. The Environment In general, areas to be evacuated generally contain a large number of citizens in a small area; the population density in such regions tends to be quite high. In the metropolises and cities of today, high-rise apartments, parking garages, subway stations all provide unique, but crowded, environments.

Page 26: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

These situations provide the greatest trouble to researchers and social engineers. B. The Citizens In times of terror or enormous stress, citizens often behave irrationally; different than they would normally. Thus, citizens' behavior in the simulation must reflect this behavior. The intelligences programmed into version 1.2 follow. ) Speedy, Normal, Slow: The Speedy, Normal, and Slow citizens differ in the number of times they move per timestep. Speedy, Normal, and Slow citizens more 3, 2, and 1 squares, respectively.

Page 27: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

When they collide with a wall (or another citizen, if Collision? is turned on), they turn in a random direction and continue their movement. 2) Righty: Righty citizens react to collisions by turning ninety degrees. They move two squares per timestep. 3) Retreater: Retreaters react to collisions by turning 180 degrees, taking a step forward, and choosing a direction in their 90 degree front arc.

Page 28: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

I I I . Running the Simulation

Utilizing the Simulation consists of three steps: initializing the Basic Environment Setup, or BES; running the model; analyzing the model with the Graphical (Statistical) Interface, or GSI. A. Basic Environment Setup (BES) The user controls the following sliders, choices, and etc. to set the parameters for the environment. 1) 'walldistancey' Slider: sets the distance between the horizontal walls. When a citizen's movement would take it into a wall, it runs its collision program detailed above. The two horizontal walls each contain a passable gap separating the danger from the safety area.

Page 29: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

2) 'walldistancex' Slider: sets the distance between the vertical walls. When a citizen's movement would take it into a wall, it runs its collision program detailed above. 3) 'evacuationtime' Slider: sets how many timesteps the model runs for. 4) 'collide?' Switch: selects whether or not citizens collide with one another, activating collision code upon impact. 5) 'areasize' Selector: selects how many citizens begin in your model.

Page 30: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

The selector has five options: a) Metropolis (225 citizens) b) City (150 citizens) c) Village d) Town (90 citizens) e) Thorpe (15 citizens) f) Abode (3 citizens). There are (citizens / 3) citizens of each color (red, white, blue) in the model. 6) 'explosiontime' Slider: selects at what time the explosion occurs. The explosion is the radius selected below.

Page 31: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

7) 'explosionrad' Slider: chooses the radius of the explosion. Explosions kill citizens contained within them and creates an impenetrable wall. 8) 'Draw Walls' Button: while depressed, lets the user draw black walls 1 pixel in diameter. I V. T H E P RO J E C T This project was set up to determine the most effective ways to get large numbers of citizens to safety in times of danger. In order to determine how to do this, certain things must be determined. First, the parameters that add to the dangerousness of the environment should be measured. Secondly, once the situations of greatest danger are found, the most effective intelligences to escape this danger must be poinpointed as well.

Page 32: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

Using the environment instantiated in the setup (see subsection "The Environment"), different situations are tested. The tests were run in environments of four different dimensions: large x large, short x large, large x short, and short x short. V. T H E T E S T As described above (see "Project" section), the first part of the test is to determine the dangerous situation. The results of this test follow.

Page 33: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

These tests were ran with build version 1.2, located on the web: www.tjhsst.edu/ kdesoto/techlab. Large x large = 15m x 5m Short x large = 5m x 15m Large x short = 15m x 5m Short x short = 5m x 5m A set of 225 "control" citizens was given a period of 120 minutes to escape each danger situation. This was performed ten times, and escaped citizens were graphed. This can be found in the appendix. The l x l area, on average, trapped almost 111 citizens per 120 minutes. This is almost 50 percent. The s x l area, on average, trapped almost 22 citizens per 120 minutes. This is almost 10 percent.

Page 34: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evacuation of Population Centers Using NetLogo

Kurt Andrew DeSoto

The l x s area, on average, trapped almost 27 citizens per 120 minutes. This is almost 12 percent. The s x s area, on average, trapped 0 citizens per 120 minutes, 0 percent. VI.

Conclusion The conclusion goes here.

V I I . Acknowledgement

The authors would like to thank... Mr. Randy Latimer, Computer Systems Laboratory Mr. Uri Wilenski, CCL REFERENCES

A [1] H. Kopka and P. W. Daly, A Guide to LTEX, 3rd ed. Addison-Wesley, 1999.

Page 35: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

35

Optimizing Genetic Algorithms for

Cypher DecodingOver the past several years, genetic algorithms have come into wide use because of their ability to find good

solutions to computing problems very quickly. They imitate nature by

crossing over strings of information represented as chromosomes, with

preference given to the more fit solutions produced. They hold great promise in the field of cryptology, where they may be used to quickly

find good partial solutions, thus eliminating much of the intense

manual labor that goes into identifying initial coding schemes.

Page 36: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

Abstract

Over the past several years, genetic algorithms have come into wide use because of their ability to find good solutions to computing problems very quickly. They imitate nature by crossing over strings of information represented as chromosomes, with preference given to the more fit solutions produced. They hold great promise in the field of cryptology, where they may be used to quickly find good partial solutions, thus eliminating much of the intense manual labor that goes into identifying initial coding schemes.

Page 37: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

Introduction

This document is the written history of the author's work on genetic algorithms over the course of the past year. The hope is to reach a better understanding of genetic algorithms through the results of intensive work on the sub ject in several different applications.

The Problem

Genetic algorithms have become very useful tools in the field of computer science, however, they are rather open-ended and can be applied in wildly different ways.1 The problem here is to optimize a genetic algorithm for a cryptological problem.

Page 38: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

Statement of Objective

The ultimate objective is to use a genetic algorithm to efficiently solve different problems such as the Beale cypher and the evolution of pi. Regardless of whether that objective is reached or not, the pro ject should foster a better understanding of how to apply genetic algorithms to cryptology problems. The purpose of this report is to document the development and effectiveness of the project.

Page 39: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

Scope

The pro ject deals mainly with the Beale Cyphers as a tool with which to determine which properties of a genetic algorithm may be modified so that the algorithm may operate more smoothly. The pro ject also includes an initial foray into evolving the value of pi.

Background

Historical Summary

The Beale Cypher is one of the most famous unsolved puzzles in cryptography. Basically, 100 years ago, Beale buried treasure in a lake in Bedford County, near Roanoke.

Page 40: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

Three letters were written, encoded and given to a friend. Beale subsequently travelled west and never returned. Later, one of the 3 letters was deciphered. Beale had used a simple encoding algorithm. His letters consisted of a large list of numbers. These numbers corresponded to the first 480 words in the American Declaration of Independence. For example, if the document starts "The quick brown fox..." then 3 would represent the letter 'b' and so forth.

(1 Duray)

Page 41: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

So, one of the letters was succesfully deciphered, but to this day the other two remain unbreakable. While there is a fair amount of evidence that the Beale treasure is a hoax, there is no reason to believe that the cyphers themselves are. It is also probable that the other cyphers either use the same document, just different encoding methods, or use similar documents and the same encoding.2

Preceding Work

I am not sure how much work has been done trying to decode the two as-yet undeciphered letters.

Page 42: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

Throughout the 1970s, one Dr. Carl Hammer used computers to analyze those two letters and found cyclic patterns in the numbers that suggested an actual coding mechanism as opposed to randomly chosen numbers.3 There is some evidence that the cypher is nothing more than a hoax. Kenneth Dobyns concludes after a lengthy statistical analysis that "there is no content to [the] cyphers" and that the numbers in the two undeciphered texts were chosen "substantially at random, limited only by the restriction that no numerical value would be repeated immediately adjacent or semi-adjacent to a similar value."

Page 43: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

Theory

It is one of the great ironies of computer science that a machine built to outperform a living organism works best at times when it imitates the living organism itself. Charles Darwin's 1859 book On the Origin of Species laid the foundation not only for anthropology but for a great number of largely economic theories such as "Social Darwinism." The idea that the members most fit to survive would be the most likely to reproduce resounded in the scientific community. It should come as no surprise, then, that computers might be able to produce end results as well as nature itself if they would follow this model.

Page 44: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

A genetic algorithm is a method of computing that attempts to replicate nature's mechanism for evolution by organizing computer data into formats that are akin to DNA chromosomes. The idea is to have multiple manifestations of these computer data so that they might "cross over" the way that real-life chromosomes do. The "crossing over" does not happen randomly; the programmer has to code the program to do this, but he can choose to do it in a manner that takes two promising sets of data and combines them into what would hopefully be an even better set of data. We need these "promising sets of data" because the data are supposed to be the solution to some sort of problem.

Page 45: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

Genetic algorithms are most effective in quickly finding good solutions to problems in which there may be several acceptable solutions.

The problem of decoding the Beale Cypher, then, would seem to be an excellent candidate for a genetic algorithm as described above. The search space is essentially infinite, yet it would be possible to program the computer to recognize good solutions as those with a high incidence of actual English words coming out of the translation. While there is one "correct" solution, but if I can get my program to come up with a solution that a human viewer could identify as reasonably close to this correct solution, then it will have been a success.

2 Matthews 3 Krystek

Page 46: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith DiggsDesign Criteria

Primary Criteria

· The pro ject will be able to take the code of the second letter in the Beale Cypher and independently come up with a string in which an observing human can recognize the English message of the cypher. · The pro ject will use a genetic algorithm to do so, crossing over possible solutions to produce hopefully better ones.

Secondary Criteria

· The pro ject may independently produce the exact answer to the second letter in the Beale Cypher, character for character. · The genetic algorithm may be both time-efficient and space-efficient, allowing for easy application to other similar cyphers (even other letters) ·

Page 47: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

The pro ject may produce a viable solution to the first and third letters of the Beale Cypher from which a human would be able to do meaningful and independent work.

A genetic algorithm similar to the one used to satisfy the Primary Criteria may be adapted to other areas in cryptology and (this is a long shot) into other research areas such as artificial life.

Procedure

The project largely dictated its own future since there was no initial goal at the outset of the year, the pro ject was largely defined as whatever might be successful.

Page 48: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

At the time the author applied for the Computer Systems Research Lab, his idea was to create an intelligent computer agent to play the Japanese game Go. By September, the idea of exploring genetic algorithms had become fairly entrenched in the author's mind, but a search for applications was still necessary. After spending two months tinkering with a program that evolved the value of , it was decided to pursue the Beale Cypher in depth. The program decode01.c was written to the point where it would begin crossing multiple chromosomes over each other and producing results that clearly followed progressive patterns (thanks to substantive debugging). Once the functionality for crossing over was written into the program, work on the heuristic could begin.

Page 49: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

It is based on the relative frequencies of each alphabetical character in the English language4 and the most commonly occurring words in the language.5 The heuristic analyzes the translations produced by a chromosome and calculates how far the relative character frequencies of the translations deviate from that of general English, with priority values being assigned to the translations that deviate the least in terms of character frequency and also have the most identifiable English words.

Page 50: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

The word-search function was the most difficult part of the program to write, not because the goal was uncertain, but because it introduced an entirely new problem to the pro ject: that of organizing an extensive list of data into the program's infrastructure in such a manner that it could then be successfully used to analyze translations of the program. I finally settled on a linked-list structure after a long period of flailing with a very error-prone multi-dimensional array.

Page 51: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

Results

Results at this point are very basic; I have just recently completed the word-search function and have yet to use it heavily in tandem with the genetic algorithm. Appendix 2 shows some sample results. The first part demonstrates the character-frequency analysis function it shows five false translations of the Beale cypher and gives their variances. (Defined in the program as the sum of the percentage points by which the frequency of each letter relative to the complete translation varied from the ideal percentage defined in Singh. A lower number is better.) The second part shows the word-search function, only recently completed.

Page 52: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

The results shown include the debugging code I planted in the program. It shows the program going through the sample translation two characters at a time. Every exclamation point means that the program has identified the most recently displayed two-character combination as possibly beginning a word. Actual words showing up after exclamation points indicate that that word has been identified in the text and that the program has added a point to the translation's heuristic score.

References

Dobyns, Kenneth. "Beale Codes Were they a Hoax?" Genealogy, History, and Miscel laneous Material. 1984. 25 Jan. 2005 ¡www.myoutbox.net/bealhome.htm¿

4 Singh, accessed at Southampton

Page 53: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

Duray, Naranker. "Genetic Algorithms." Imperial Col lege of London: Department of Computing. 1996. 25 Jan. 2005 ¡www.doc.ic.ac.uk/~nd/surprise 96/journal/vol4/tcw2/report.html¿ Gantovnik, V.B., Z. Gurdal, and L.T. Watson. "Genetic Algorithm with Memory for Optimal Design of Laminated Sandwich Composite Site Panels." Technical Report, 2002. Computer Science @ Virginia Tech. 21 May 2002. Virginia Polytechnic Institute and State University. 8 Feb. 2005

Page 54: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

¡eprints.cs.vt.edu:8000/archive/00000555/01/gaCS02a.ps¿ Huss, Eric. "The C Library Reference Guide." Webmonkeys: A Special Interest Group at the University of Il linois. 1997. 14 Apr. 2005 ¡www.acm.uiuc.edu/webmonkeys/book/c guide/¿ Krystek, Lee. "The Beale Cryptograms." The Museum of Unnatural History. 2000. 13 Jan. 2005 ¡www.unmuseum.org/beal.htm¿ Matthews, James. "The Beale Cypher." Generation5. 2003. 13 Jan. 2005

Page 55: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Optimizing Genetic Algorithms for Use in Cyphers

Keith Diggs

¡www.generation5.org/content/2003/beale.asp¿ Singh, Simon. The Code Book: The Secret History of Codes and Code-breaking. Unknown: Fourth Estate, 1999. University of Southampton, ed. Teachers' Notes to Accompany the Lesson Packs. National Cipher Challenge. Southampton, UK: University of Southampton, 2003. National Cipher Chal lenge. 1 Oct. 2003. University of Southampton. 25 Jan. 2005 ¡www.cipher.maths.soton.ac.uk¿ "The 500 Most Commonly Used Words in English." World English: Test, Learn, and Study the English Language Online. 22 Aug. 2001. 25 Jan. 2005. ¡www.world-english.org/english500.htm¿

Page 56: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

56

Creating a 3D Game With a Study of

OpenGL Textures and Lighting Techniques

To create a first person 3D game using OpenGL. The program consists

of using models, textures, lighting, and polygons to create a 3D world in OpenGL. Various equations are used

to calculate camera angles, movement, and physics. For example,

to move the camera, “eye movements” are controlled by glLookAt, which takes an eye

position with 3 points (x,y,z) and 2 vectors. One vectors is the up

direction and the other is the forward direction.

Page 57: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

Abstract

The objective is to create a simple, 3D first person shooter using OpenGL and C++. To acheive this goal, I will need to implement textures, lighting, 3Dshapes, polygons, and basic C++ structures. II. Introduction The problem of creating a 3D first person shooter from scratch is a large task. To accomplish this task, I am breaking down the problem into simple blocks, such as getting a camera to work so that I can view the world from a person's perspective and move around.

Page 58: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

The reason I chose this pro ject is because computer games interest me immensly and creating one of my own would give me some experience and pleasure, even if it is very primitive. My pro ject will probably be finished very near the end of the year as there will always be another upgrade to add. The maximum scope of this program would be to have multiple users playing on the same computer or over a network and models of the users and different ob jects. Multiple weapons would be available to pick up and different levels to play. However, the multiple user aspect nor the models will be accomplishable by the end of the year.

Page 59: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

III. Theory and Discusion and Workplan

I will go over the two main features that I have completed so far: the camera and collisions. The camera is the user's point of view. Using keyboard input to change velocity, the user moves around. The largest task was create a realistic system to use mouse input to rotate the camera up, down, and side to side. To do this, horizontal and vertical movement is calculated in pixels for each iteration and used to modify two angles. One for up and down and the other for side to side.

Page 60: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

Using both of these angles, I calculate a unit vector in the opposite direction the user is looking to input into an openGL function. The distance the user looks up is the sine of the angle up. The maximum distance forward is the cosine of this angle. The forward distance is further broken up into x and z components. The x component is the max distance multiplied by the sine of the angle to the side and the y component is the max distance multiplied by the cosine of that angle.

Page 61: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

As a result, my code looks like such with a unit vector of length 2: temp=2*cos(angleDown); centerX=eyeX-temp*sin(angleLeft);//puts center point 2 units centerZ=eyeZ-temp*cos(angleLeft);//in front of the eye centerY=eyeY-2*sin(angleDown); An up vector is also generated very similar to this to make the camera right side up. The other large chunk of code is collisions. My collisions are very simple. When a pro jectile hits an ob ject, it stops and moves back a little (so it doesn't get stuck in the ob ject). The acceleration due to gravity then pulls the ob ject down. If it hits the ground or the top of an ob ject it just stops and stays.

Page 62: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

After a certain amount of time, the pro jectile will be deleted. That way, there won't be hundreds of pro jectiles bogging down on processing power. To detect collisions, I check to see if the pro jectiles is inside the bounds of the ob ject plus the radius of the pro jectile. This currently only works for rectangular prisms or flat rectangles. Circles and spheres are also workable with this algorithm. To do more complicated ob jects, multiple smaller, undrawn ob jects will have to be used to approximate the larger shape and acheive accurate collisions.

Page 63: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

Another large piece of the program will be textures. I will have to load in a texture for each different type of surface I am going to use. To do this I must read a bitmap image into memory. I do this by first reading in the bitmap header which contains information about the file, and then I read in 4 values for every pixel. Each value is the size of a byte, so I read these values into an array of unsigned chars. This array is then plugged into a texture function (glTexImage2D) which stores the image in video memory. I then delete the array from memory.

Page 64: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

For some reason, the read in textures are not staying in video memory, even though I have debuged and found I interpreted the bitmap correctly. Textures will only be accomplished if time permits because I have decided to push back fixing the problem until later. The other main aspect of collisions are the actual ob jects. I have created a "class" ob ject and I have a pointer list of these ob jects. When a destroyable ob ject gets hit enough, it will be destroyed and I will take it out of the list so it isn't drawn and doesn't use processing power. Currently, I am working on a system to read in ob jects and put them in the list.

Page 65: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

When I finish, I will be able to load in an environment from a text file, allowing for multiple levels. After accomplishing muliple text input levels, I will work on a HUD (heads up display) which will show life, time, and score. I will also add different weapons that cause different damages that the player can pick up in the level. Lighting will be added last, such as a muzzle flash effect and lighting throughout the level. My basic system of operations is to code, test, and debug. I do this for every single aspect of the pro ject. When I added the camera code, I had to fix alot of errors.

Page 66: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

For example, I found out on the opengl.org site that the "center" values determine a point a unit away from the camera in the direction it is looking rather than a unit vector. The "up" values actually do determine a unit vector in the "up" direction. Only I am working on this pro ject. I need no money and therefore have no budget. The only equipment I am using is a computer, implementing OpenGl and C++. I may also use a graphical modeler later on. I have mainly used the OpenGL website to get all of my information for my pro ject.

Page 67: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

However, as I am now getting into some more complicated aspects (not just camera and drawing basic ob jects), I will need more and more sources. I have begun to research texturing, which I have researched outside the opengl.org site. I will have to research elsewhere for most of the remaining tasks as well. Tasks to be completed include adding textures, lighting, additional ob jects, and a more complicated environment.

Page 68: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Creating a 3D Game With Textures and Lighting

John Fitzsimmons

Sources: OpenGL - The Industry Standard for High Performance Graphics. http://www.opengl.org/. January 27, 2005. Spacesimulator.net - OpenGL Texture mapping. http://www.spacesimulator.net/tut3 texturemapping.htm January 26, 2005. OpenGL Texture Mapping : An Introduction. http://www.gmonline.demon.co.uk/cscene/CS8/CS802.html. January 23, 2005.

Page 69: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

69

Paintball Frenzy!

Minimax Agent AI

The purpose of this project is to create an innovative and enjoyable

graphical game and program a minimax AI agent that performs

optimally.

Page 70: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

0.1 Abstract

Paintball Frenzy is a unique turn-based game of my creation. The game is played on a four by four grid layout that is initially a neutral grey color. Two to four players that are either human or computer controlled posess one colored paintball that originates at one of the board's four corners. The players take turns moving about the grid one horizontal or vertical space at a time.

Page 71: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

Players also have the option to jump over one player's paintball in an adjacent space. There is a time limit that each player must choose their move before. Whenever a paintball moves into a space on the grid, that space becomes the color of that paintball. Whichever player has the most colored spaces at the end of the turn limit wins the game. My final pro ject was the creation of this game Paintball Frenzy. First, I had to invent the game and all of it's particularities and balance issues.

Page 72: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

Of course, I had to program this game's framework, first with a standard terminal display and keyboard input, and later with graphical gameboard display and mouse controls. Additionally, I had to program several artificial intelligence agents for my game including an agent that moves randomly and an agent that performs optimally. To design the agent that moves optimally, I researched the minimax algorithm and used it to search through all the possible game-choice options and choose the best one. Combining all of these aspects together yielded a fully enjoyable and informative senior research project.

Page 73: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

0.2 Introduction and Background

I am pleased to introduce Paintball Frenzy in all its glory. Originally, I had a massively ambitious pro ject proposal to design a computer programming language designed specifically for the creating of video games. I was quick to dispatch this proposal towards the end of my junior year. Also in my junior year I took the Artificial Intelligence class and I learned of my deep interest in the Artificial Intelligence field and the difficult minimax algorithm in particular (I was never able to master it in the class).

Page 74: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

Therefore, I was quick to consider the minimax algorithm for my final pro ject. But I wanted to do more than just Artificial Intelligence, I also wanted to design my own game. So I did, and Paintball Frenzy was born. Background on game design.

Background on graphics.

Background on game AI.

0.3Theory

My research was a three-pronged proccess. First, I chose to formulate an innovative turn-based game.

Page 75: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

Game Design Theory. 1 Graphics Theory

Intitially, I delved into my artificial research with impressive haste. I acquired a green artificial intelligence book from Mr. Torbert and I've been researching the minimax algorithm ever since. The turn-based, deterministic style of my program makes the minimax algorithm ideal. However, the multiplayer asect of the game will greatly slow down the algorithm and it'll most likely force it to use a cut-off test before the entire tree has been searched. This means, unfortunately, that at the faster time limits, my AI agent won't operate optimally.

Page 76: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

More AI Theory. 0.4

Design Criteria

The progress of my pro ject is three-pronged. In the first quarter, I researched game design and programmed a working version of Paintball Frenzy withoutgraphics or an AI Agent. I used C++ to program. My Main function first initializes global variables like the board matrix. It then calls functions to display the title and generate the menus. After that, Main runs a do-while loop until the number of turns has expired. In this loop, Main calls functions to display the board and prompt the current player's move. Then Main repeats the loop with the next player, incrementing the turn counter as required.

Page 77: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

After the loop, Main calls a function to evaluate and display the results of the game. In the second quarter, I used OpenGL graphics to make my program more aesthetically pleasing. I needed to make several design modifications to allow for a graphical interface. My main function is only responsible for initilizing variables and initiating the OpenGL display cycle. My display function (RunGame) regulates the game loop with a series of if else statements that assess the current game progress via mouse clicking history.

Page 78: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

In my main game loop, I designed to have a second set of if statements to track the state of each particular turn. First, I start a timer, then I get mouse input until the timer has expired. At the end of the time, I excecute the players' move and signal the change in player or turn. I store the mouse clicking history by saving phases of the game as integers in a global variable. I also designed DispBoard to display the gameboard in the left part of window. The upper right side displays the current turn and player.

Page 79: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

Below that is directional pad that records mouse input. I created three auxilary functions to handle mouse clicking, the mouse menu, and text output. I also designed the menu screen layout that would allow for alterable data and buttons. The second semester is consumed with my attempts to add an AI agent to my program. This includes the research of Game AI and more specifically the minimax algorithm. Techniques such as alpha-beta pruning where also researched. In the end, an optimized AI Agent was produced. I've redesigned my RunGame function to call the AgentMove function at the start of its time limit to allow for the most time.

Page 80: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

I designed the first version of a recursive minimax function that I call maximax. My AgentMove function now simply sets some variables and calls maximax to get the new board position. First, maximax tests the turn limit and time limit constraits. If the time is up or the simulated game is over, the program calls the terminal function entitled EvalFunc. Next, I increment the simulated player or turn to delve further into the maximax tree. Then, I determine if the four directional moves are possible, and recurse in each of the legal directions. Then, I decide which one is the best move, and return the changed board. I also programmed the EvalFunc.

Page 81: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Paintball Frenzy: Graphical Turn-Based Game With a Minimax AI Agent

Christopher Goss

0.5 References

(this is merely a reference list right now, not a bibliography) http://www.latex-pro ject.org/ http://www.vancouver.wsu.edu/fac/peabody/game-book/Coverpage.html http://www.msoe.edu/eecs/cese/resources/stl/string.htm http://www.tjhsst.edu/ rlatimer/ http://www.opengl.org http://www.tjhsst.edu/ rlatimer/assignments2004/independenceday.txt The AI book http://ai-depot.com/LogicGames/MiniMax.html

Page 82: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

82

Modeling Evolutionary

Behavior

The purpose of this project is to attempt to model evolutionary

behavior in agents in an environment by introducing traits and

characteristics that change with the different generations of agents.I hope to create an environment

where certain agents will prosper and reproduce while others will have traits that negatively affect their performance. In the end, a single

basic agent will evolve into numerous subspecies of the original agent and demonstrate evolutionary behavior.

Page 83: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

1 Abstract

With the creation of Epstein and Axtell's Sugarscape environment, increasing emphasis has been placed on the creation of "root" agents - agents that can each independently act and interact to establish patterns identifiable in our everyday world. Models created for traffic patterns and flocking patterns confirm that these conditions are caused by each participating agent trying to achieve the best possible outcome for itself.

Page 84: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

The purpose of this project is to attempt to model evolutionary behavior in agents in an envionment by introducing traits and characteristics that change with the different generations of agents. Using the modeling package MASON programmed in Java, I will be able to create an environment where agents will pass down their genetic traits through different generations. By adding certain behavioral traits and a common resource to the agents, I hope to create an environment where certain agents will prosper and reproduce while others will have traits that negatively affect their performance.

Page 85: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

In the end, a single basic agent will evolve into numerous subspecies of the original agent and demonstrate evolutionary behavior. This pro ject will show that agents which possess the capability to change will change to better fit their environment.

2.1 Body Matter Introduction

Computer modeling, simply defined, is the process of programming the conditions of an environment into a computer and adjusting parameters of the model to see how they affect the results of the model. Today, though, many computer models are merely used to verify behavior that we already suspect is accurate. Many models created today focus on topics such as disease, population growth, and traffic, where the results of the models were already well known.

Page 86: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

The models are used primarily to see how adding, subtracting, or otherwise changing any parameters in the environment affect the outcome of the groups in the environment or the environment as a whole. This opens the doors for many scientists and researchers to simple, cost-efficient experimentation. For example, analyzing traffic patterns requires a large committment to observation and analysis over months and even years. Such research is not undertaken lightly. With computer modeling, this information can be programmed into a model and used to estimate how the real-world system will react to changes in its stimuli.

Page 87: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

Also, computer modeling allows us to perform "experiments" in areas of science where experimentation is not normally possible. This is especially true in astronomy, where the sheer vastness of space and our relative insignificance to the universe means that experimentation is simply not possible. You can't reset the universe and watch over five billion years of history. However, computer models of planetary orbits and systems allows us to try and find explanations for phenonoma we have observed.

Page 88: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

In short, the majority of computer modeling is used to verify existing theories. Social behavioral patterns, although well-researched in general, have thus far been used to demonstrate how one individual interacts with his social environment as a whole. This pro ject, based off of such personality research pro jects as the Dr. John A. Johnson's IPIP-NEO and using evolutionary behaviors first established in Epstein and Axtell's famous Sugarscape models, attempts to use computer modeling as a form of research in and of itself into the evolutionary aspects of social behavior.

Page 89: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

In particular, my pro ject attempts to analyze the behavioral patterns of introverted agents and extroverted agents over a long period of time. With any luck, insights will be made into this exciting, riveting, and brilliant field of the human psyche.

2.2 Background

Conway's Game of Life was the first prominent agent-based model. Each cell was an "agent" that contained either a 0 or a 1 (alive or dead) depending on how many neighbors it had, and acted independently of the environment. Conway's Game of Life didn't lead to any profound insights, but it did pave the way for future agent-based modeling.

Page 90: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

The advantage of agent-based modeling, as many found out, is that it did not assume prior conditions. It was a method of building worlds "form the bottom up", where independent agents were able to create complex worlds without any overseers. One popular psychological game, Prisoner's Dilemma, spawned a series of games where agents tried to maximize their outcome, often at the expense of other agents. Eventually, these agent-based models were incorporated in studies of flocking. The models created to show flocking behavior in birds did not incorporate flock leaders, as many presumed. Instead, the birds all acted for their own best interests, and directions and resting points were chosen as compromises of sorts.

Page 91: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

Using the theory that independent agents can create organized structures such as flocks, Epstein and Axtell created the Sugarscape world in an effort to discover if social behaviors and human characteristics could emerge through independent actions. The Sugarscape model had agents able to breed, fight, trade, and die, and the core of the model was the resource sugar. Each agent had a metabolism rate which burned off its sugar; if it ran out of sugar, the agent died. Instead of isolated behavior, however, the agents soon used their resources to work together.

Page 92: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

Agents shared sugar, sent "scouts" to gather sugar for the benefit of all, engaged in wars, and in general performed a startingly large amount of human behavioral characteristics. When spice, a second fresource with its own metabolism rate, was introduced into the world of Sugarscape, trade emerges as agents tried to meet their needs as best they could - and tried to get the best deal as a result. These behaviors are surprising, but ultimately show the value of agent-based modeling and the useful insight it can provide.

Page 93: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

2.3 Theory

In order to simulate evolutionary behavior in an agent-based system, the agents need to simulate the real world as much as possible. In actual evolution (as described in computer science terms), two agents of opposing sex combine their genetic information at random to generate offspring with half of each parent's traits. Genetic mutations also happen at random, causing new traits that neither parent had in their genetic code. This gradual evolution creates swarms of different agents, and those agents that are best suited for their environments will be best able to survive and reproduce.

Page 94: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

Of course, several different portions of the environment could be home to dfferent "breeds" of agents, and these different breeds could live alongside each other in seperate societies. It is this phenomonon that this pro ject is trying to recreate. By closely following the rules of genetics, the pro ject should be able to show several different breeds of agents thriving, having only been created by a single agent type. Instead of passing on dominant and recessive genes, however, this pro ject opts for a higher-level approach by using characteristics such as extraversion as the "genetic currency". While the human genome has tens of thousands of genes to determine these characteristics, such attention to detail is impossible and unnecessary for this pro ject.

Page 95: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

By changing characteristics such as cooperation and extraversion on a slider, agent's traits will gradually change as they are passed down from generation to generation. This effectively simulates actual genetic activity, and is thus effective for this pro ject. Agents will breed, die, and interact, eventually changing the genetic code of their societies to suit their needs. The main change in the program will revolve around the gradual changes between extraversion and introversion. Over time, each agent type will breed with other agents. The population will then evolve to identify the most effective agent type.

Page 96: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

Extraverted agents can band together and reproduce en mass, while introverted agents can focus on preserving self above any community good. In the end, the program will randomly create worlds of extraverted and introverted agents attempting to establish control over their environment.

2.4 Design Criteria and Pro cedures

The program is built according to a three-level structural system; the environment, the agent, and the graphical interface. The environment creates the world that the agents live and interact on, tells the agents when to act, and takes care of all background processes along the way.

Page 97: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

Extraverted agents can band together and reproduce en mass, while introverted agents can focus on preserving self above any community good. In the end, the program will randomly create worlds of extraverted and introverted agents attempting to establish control over their environment.

2.4 Design Criteria and Pro cedures

The program is built according to a three-level structural system; the environment, the agent, and the graphical interface. The environment creates the world that the agents live and interact on, tells the agents when to act, and takes care of all background processes along the way.

Page 98: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

The agents are able to interact with each other and move around the environment, and form the basis of the research of this pro ject. The graphical interface takes the environment and the agents and uses that information to display everything graphically so that users can observe the simulation. In the early phases of this pro ject, levels of the pro ject were slowly implemented. Eventually, a functional interface emerged. In the middle and end phases, new characteristics are added to the agents one at a time.

Page 99: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

Each new characteristic (such as extraversion) goes through a week of design and a week of rigorous and foolproof testing. Such a structured and precise system allows me to identify aspects of my program that had before alluded me; I found out, for instance, that the program lets agents share spaces, so agents that are extraverted will appear to be fewer in number to the untrained eye. After careful research, development, and debugging, I have enhanced my program so that every new feature to the program involves a new, unique method. This allows for new features of the program to be implemented and debugged without worrying about corrupting other sections of code.

Page 100: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

At first, the MASON interface was reluctant to incorporate such a structure, however, it is now well established and able to handle any aspects of my program that I wish to develop. In my current pro ject, much of my time is spent analyzing the various runs of the program. The current program now shows the various interactions between extraverted and introverted agents. Already, extraverted agents have been observed to crush the introverted agents inside their groups of agents without any external prompting. This demonstrates the previously mentioned control battles that these two agents are undergoing, and is already leading into some keen insights for the relationships between introverts as a group and extraverts as a group. As fourth quatrer comes by, I hope to update the graphical interface, complete debugging, and finally create a simulation capable of generating information that will help us understand the relationships between two different social groups.

Page 101: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Modeling Evolving Social BehaviorStephen J. Hilber

3.1

End Matter Sources

Credit goes to Conway for The Game of Life, Epstein and Axtell for Sugarscape, the MASON team for developing MASON, and Dr. John A. Johnson for the IPIP-NEO

3.2

Code ...

Page 102: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

102

Techniques of Asymmetric File

Encryption Encryption programs have been created to

protect privacy during a transfer of files and to make sure that sensitive files will be

protected. My project is to create an asymmetric file encryption program. This

means that encrypted files will need a pass-key to open that will be different from

the key used to encrypt. This program could be applied practically to protect files

duringtransfers.

Page 103: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Techniques of Asymmetric File EncryptionAlvin Li

AbstractAs more and more people are linking to the Internet, threats to the security and privacy of information continue to grow. To solve this growing problem, encryption programs have been created to protect privacy during a transfer of files and to make sure that sensitive files will be protected. My pro ject is to create an asymmetric file encryption program. This means that encrypted files will need a pass-key to open that will be different from the key used to encrypt. This program could be applied practically to protect files during transfers.

Page 104: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Techniques of Asymmetric File EncryptionAlvin Li

Introduction and BackgroundOver the summer, I read a book that explained the history of encryption. Since before the times of Caesar and the Roman empire, encryption has been used to keep secrets secret. In the modern world, file encryption is now used almost everywhere. Whether you are transferring files, compressing files, or formatting databases, you must use modern file encryption techniques. There is so much private information, such as social security numbers, credit card numbers, bank-account information, or private correspondence that all needs to be protected. In order to fully understand my pro ject, you must know a little about encryption first.

Page 105: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Techniques of Asymmetric File EncryptionAlvin Li

There are many categories of encryption, such as key encryption, block encryption, and stream encryption. The one that I will focus on, since it is the category of my algorithm, is key encryption. The basic concept behind key encryption is simple. The user runs the algorithm on the desired file and the file becomes encrypted with the key. In order to decrypt the file, the user must input the correct password, which translates to the correct key. This method of encryption is very powerful because the algorithm is dependent on the key, which is arbitrarily defined. Since each key is unique, each encryption is unique as well.

Page 106: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Techniques of Asymmetric File EncryptionAlvin Li

If we used a set, standard algorithm, as soon as a hacker figured out the algorithm, he could break every code created by that algorithm. But, with a good key encryption program, knowing the algorithm does not help. You must have the correct key. The ultimate goal of any key encryption program is to make the encrypted file impossible to decode without the key. This will force the intruder to use the brute force method, which is to try every single possible pass-key. This is highly difficult to do, because even with a super computer, the number of permutations is so great, that it would take years to finish. So, with a good key encryption program, you can create practically unbreakable codes. Key encryption algorithms are separated into two groups as well. There are public key and symmetric key encryption programs. There are also two categories of keys, asymmetric and symmetric keys.

Page 107: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Techniques of Asymmetric File EncryptionAlvin Li

When using symmetric keys, the key used to encrypt is the same key used to decrypt the file. So there is only one key. Symmetric key encryption uses this form of key. The public key encryption uses different keys to encrypt and decrypt. The key used to encrypt is called the public key. The private key, provided by the user or the computer the file is being transfered to, is used to encrypt the public key. When decrypting, the private key accesses the public key, which decrypts the file. The keys are said to be asymmetric. File encryption is commercially used in almost every widely used application. Research into this sub ject is every extensive. Programmers are constantly inventing stronger, faster, and more effective algorithms.

Page 108: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Techniques of Asymmetric File EncryptionAlvin Li

Some of these are very efficient, able to not only encrypt, but also compress files. The military also has a large interest in this field. Although their algorithms are generally much safer, they are slower and harder to implement. Some of the popular encryption algorithms I have studied include RSA, DES, and AES. The RSA is a classic key encryption program created in 1977. It uses very large prime numbers and factoring as its public and private key. Since factoring multiples of large primes is near impossible, this method is very safe and easy to use. It is the most popular form of key encryption used today. The AES or Advanced Encryption Standard, also known as Rijndael, is considered the strongest algorithm to date. So far, no one has found a way to easily crack it.

Page 109: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Techniques of Asymmetric File EncryptionAlvin Li

The main task of my pro ject is to create a encryption program. Specifically, the program that will take a file as an input, and create an encrypted copy of the file in an executable form. When the encrypted file is run, there will be an input asking for a pass-key. If the user inputs the correct pass-key, the file will self-decrypted and transform back to the original file. My program will use a block cipher method of encryption. This means that data will be encrypted in 16, 32, or 128 bit blocks. This method is faster than taking every bit, as in a stream cipher. I expect to get a version of this program running and put my finished product on the Internet for download.

Page 110: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Techniques of Asymmetric File EncryptionAlvin Li

In writing the program, I will use aspects of some popular existing algorithms. By using a combination of methods, I hope to create an algorithm that is stronger than any.

Procedure and DevelopmentI plan to work on this pro ject in a series of iterations. Each iteration will build off the previous one by adding a function or revising an algorithm. After each iteration is complete, I will thoroughly test the program to see if everything is working. This way, I can easily pinpoint bugs in the program. There will be two goals for the final version of the program.

Page 111: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Techniques of Asymmetric File EncryptionAlvin Li

First, the program must be able to encrypt and decrypt a file. The decrypted file must be an exact copy of the file before encryption. This will be easily determined by examining both files. Second, the encrypted file must be very difficult to crack. This could be determined by letting hackers try to decipher the code. If it is very difficult to decode, then this goal will be achieved. The only tool I will need for this pro ject will be a computer. The program will be written in C++. If the two criteria listed above are met, then the pro ject can be considered a success. I plan to post a free version of the program on my website for download.

Page 112: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Techniques of Asymmetric File EncryptionAlvin Li

This way, people can use the program to securely transfer files over the Internet without interference from outside parties.

Results and ConclusionI am currently still working on the pro ject, so I have not completed this section in this draft.

ReferencesFor more current, up-to-date information on my pro ject, visit my website at www.tjhsst.edu/ ali The following are website URLs of sources I used in researching about file encryption: www.mycrypto.net/encryption/cryptoa lg orithms.html This is a good website for details about existing algorithms. http://catalog.com/sft/encrypt.html

Page 113: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

113

Machine Learning Techniques for Game PlayingMachine learning allows the

computer to create its own logical rules, and learn from its past

experiences. Machine Learning allows an AI to increase its abilities over time, even without additional

direct programmer input. My project hopes to develop a proficiency at Tic-Tac-Toe. My project hopes to create

a new algorithm for a relatively simple game, Tic-Tac-Toe. Ideally,

this algorithm will be modified according to its results to create better

algorithms.

Page 114: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

Abstract Machine learning is a tool that allows an Artifical Intelligence to learn from past sucesses and failures, increasing its ability over time. Though there are many different types of machine learning AIs, there is no concurence on the 'best' type of machine learning algorithm; different algorithms have different strengths, such as speed of learning, adaptability, or eventual skill. My pro ject hopes to create several algorithms for a relatively simple game, Tic-Tac-Toe, and then have them compete to discover their relative advantages. Ideally, these algorithms would be modified according to these results to create better algorithms.

Page 115: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

1 Introduction

Computers have enormous amounts of memory and processing available, but are limited by the logic they follow. Programs are only as useful as the logic they follow. This logic is given to the computer in the form of code. The use of Artifical Intellegence hopes to make computers better at difficult tasks that typically require a human. Machine learning, however, allows the computer to create its own logical rules, and learn from its past experiences. Machine Learning allows an AI to get smarter, even without additional dirrect programmer input. My pro ject hopes to be able to play effectively for several two player games, primarily Tic-Tac-Toe.

Page 116: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

1.1 Machine Learning

This pro ject will collect data over time, as the program creates its own library of board position values. It will run a game to its finish between two computers, two humans, or a computer and a human. Each position in the board will be considered for its proximity to the final outcome. For example, a position right before a win would be considered valuable, while a position right before a loss would be considered undesirable. Multiple algoritms will have to be developed for judging the relative merits of a position and assigning a value.

Page 117: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

1.1 Machine Learning

This pro ject will collect data over time, as the program creates its own library of board position values. It will run a game to its finish between two computers, two humans, or a computer and a human. Each position in the board will be considered for its proximity to the final outcome. For example, a position right before a win would be considered valuable, while a position right before a loss would be considered undesirable. Multiple algoritms will have to be developed for judging the relative merits of a position and assigning a value. These must be tested against eachother, or against a third party algorithm, to judge their relative effectivnesses. If the position allready exists in the library, the value of the position in this game will be added to the total value for the position.

Page 118: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

If the positioin does not allready exist, the algorithm will add it to the library. Board positions are considered as a one dimensional array of values of 0, 1, and 2. Board positions in the library will be sorted, with primary decision given to the first array value, and secondary to the second, etc. This will make the library easy to search through a binary algorithm, looking first in the middle of the library, and then going higher or lower, accordingly. As the computer moves, it considers each possible move. It looks for each move in the library, and chooses the one it thinks has the greatest value.

Page 119: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

2 Background

There are multiple standard AI learning algorithm's. These algorithms include decission trees, neural networks, or genetic learning, amoung others. Many AIs are based upon a one-ply search of a library of board positions. These algorithms are most traditionally applied to two player games, such as checkers or chess. I therefore found this library and search technique most applicaple to my program, which would also be designed to play a two player game. Its relative simplicity was also desired, so potentially multiple algorithms could easily be employed, creating new algorithms and test them against eachother.

Page 120: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

3 Theory

An algorithm should be able to increase its ability at a given task by learning through experience. Even without initial code related to strategy, over time, it should be able to predict which strategies lead to the greatest win rate. It should even be able to adapt to changing situations. My game playing algorithm hopes to fufill all of these components by gathering information about past games, and organizing it to find the most winning board positions.

Page 121: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

4 Design Criteria

My project hopes to develop a new algorithm for machine learning. Machine learning comprises some of the most recent AI developments. Concurrently with my coding, I hope to research machine learning, and part of the more general field of AI as well. Hopefully, my new algorithm will lead to competitive AIs, that show distinct playing progress over time. Though these algorithms will be developed through common games, the results areapplicable to computer science, particularly current AI researchers.

Page 122: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

5 Materials used

My program is coded in C++, due to the languages ease of use for loading and storing libraries, and using large matrices. My program does not require any extra external hardware, as its inputs and outputs are all digital information.

6 Procedure and Workplan

I plan to finish the tic-tac-toe programing in one week. This will include programming the board, visuals, and the basic rules. I plan to create the library and sorting infastructure in six weeks. This will create a value for each board position, and add it to the library.

Page 123: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

5 Materials used

My program is coded in C++, due to the languages ease of use for loading and storing libraries, and using large matrices. My program does not require any extra external hardware, as its inputs and outputs are all digital information.

6 Procedure and Workplan

I plan to finish the tic-tac-toe programing in one week. This will include programming the board, visuals, and the basic rules. I plan to create the library and sorting infastructure in six weeks. This will create a value for each board position, and add it to the library.

Page 124: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

The library moust be sorted for easy searching, as the number of possible board combinations is great. Sorting should minimize run time for the algorithm, making sure it can run in real time. I plan to create several algorithms in six weeks. This may include modifying a single algorithm with various weights, or even whole new methods of calculation. I will test the algorithms for two weeks. This will include more research, and developing values for board positions. I will adapt the algorithms for additonal uses for the next eight weeks. This might include adaptation to chess, checkers, a sliding puzzle, or other games or applications.

Page 125: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

7 Results

Though my code and program are not complete, and thus no results have yet been created, I hope that my program will show increased win rates (or at least decread loss rates) against humans or other algorithms over time.

8 9 10 Discussion Conclusion Further Recommendations

Machine Learning algorithms are very powerful, and are applicaple to much more than games. As well as employing my algorithm for similar game situations, it could be used for other situations where strategies would need to be numerically ranked. These could deal with orginization tasks or modeling tasks.

Page 126: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel Miller

http://www-2.cs.cmu.edu/satirist.org/learn-game http://theory.lcs.mit.edu/ mona/lectures.htm

Page 127: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Research Paper for Use of Machine Learning to Develop a Better Artificial Intelligence for

Playing Tic-Tac-ToeRachel MillerContents

1 Intro duction 1.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 3 3 4 4 4 5 5 5 5

2 Background 3 Theory 4 Design Criteria 5 Materials used 6 Pro cedure and Workplan 7 Results 8 Discussion 9 Conclusion 10 Further Recommendations

AcknowledgementsI would like to acknowledge the Thomas Jefferson Computer Systems Lab, for providing the hardware, software, and support to make this pro ject possible. I would particularly like to thank Randy Latimer for his on going mentorship.

Page 128: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

128

Analysis and Maintenance of a Robust,

Highly-Available Computer Systems

Laboratory

This project is an exploration of one possible environment that

meets the criteria for a "robust" and "highlyavailable"

laboratory, while still providing the students who work in the lab with all of the required facilities. The first goal was to determine exactly must be executed for

each lab that is created in order to best fit the systems design to

the needs of the students and staff.

Page 129: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

A well designed computer systems environment requires the simultaneous solution of a large variety of conflicting problems. One of the best examples is that of the balance between security and functionality. If users were allowed to run an unlimited amount of processes and use all available RAM, then those users that need the maximum available resources would be more efficient, however, those who abuse their privileges would cause extreme detriment to the general service availability.

Page 130: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

This project is an exploration of one possible environment that meets the criteria for a "robust" and "highlyavailable" laboratory, while still providing the students who work in the lab with all of the required facilities. The first goal was to determine exactly what those criteria are, and exactly what "required facilities" entails. This is perhaps the most difficult portion of the project, due to the complex issues involved, and it must be executed for each lab that is created in order to best fit the systems design to the needs of the students and staff.

Page 131: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

Overview:The TJHSST Computer Systems Laboratory's requirements for robustness and stability, as well as the requirement for maximum software support and the desire to minimize costs where possible, are all direct factors in the decision to utilize Linux. Linux is a completely free and Open-Source operating system that has more software support than any other POSIX-compliant system available. It is extremely robust and requires little day-to-day maintenance other than software updates.

Page 132: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

The kernel and applications are constantly being maintained and improved on a day-to-day basis. Of the available Linux distributions, the Computer Systems Lab uses Debian <http://www.debian.org/>, primarily due to the ease and flexibility of configuration, as well as the constant software updates available. The Debian distribution is one of the largest available distributions which, in addition to being non-profit, is dedicated to ensuring that all software distributed is free of legal complications.

Page 133: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

Sof tware:New Computer Systems Lab Service Layout king Authentication: User/System Data: File Services: Web Authentication: robustus emperor Kerberos OpenLDAP OpenAFS on DRBD WebKDC/WebAuth macaroni

Domain Name Resolution: BIND 9 Client Network Autoconf: DHCP 3 Student Intranet: Apache 2 & PHP 5 chinstrap cronos

Mail Transfer (SMTP): Postfix Mail Access (IMAP/POP): Courier Mail Storage: Ext3 on DRBD adelie Student websites: Student remote access: rockhopper Apache 2 & PHP 4 Secure Shell

Page 134: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

Sof tware:New Computer Systems Lab Service Layout king Authentication: User/System Data: File Services: Web Authentication: robustus emperor Kerberos OpenLDAP OpenAFS on DRBD WebKDC/WebAuth macaroni

Domain Name Resolution: BIND 9 Client Network Autoconf: DHCP 3 Student Intranet: Apache 2 & PHP 5 chinstrap cronos

Mail Transfer (SMTP): Postfix Mail Access (IMAP/POP): Courier Mail Storage: Ext3 on DRBD adelie Student websites: Student remote access: rockhopper Apache 2 & PHP 4 Secure Shell

Page 135: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

Requirements:The requirements determined for the TJHSST Computer Systems Laboratory are as follows, in order of decreasing priority: · The lab should provide sufficient hardware for each student to have his or her own computer during his or her classes in the lab, as well as providing a certain number of spares should any particular computer be in need of repair. · The lab should provide sufficient computational resources on each machine, as well as on the network as a whole, for students in classes such as Computational Physics, and Super-Computer Applications to learn how to write software in a true high-speed networked multiprocessor environment.

Page 136: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

The lab should provide the software such as compilers and interpreters for as many differing languages as is practical, with the express purpose of assisting students in Artificial Intelligence, and Comparative Computer Languages, as well as providing necessary services for seniors working on their technology projects. · The lab should enable lab administrators to easily and efficiently manage users, software, servers, roles, and activities without undue and error-prone repetition. The administrators should also be able to delegate such privileges as they see fit to responsible assistants, and the systems should be simple enough for each rising class of admins to understand and operate.

Page 137: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

The lab should provide a simple, secure, networked environment such that each student can use their own personal environment independent of what system they are connected to. This also includes providing access to other information-technology services within the school. · The lab should minimize its equipments and software costs, both initial purchases and maintenance over time.

Page 138: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

One of the primary requirements for the lab is that each student has his or her own computer with sufficient resources to run demanding graphics and computationally intensive tasks, and yet it is on this requirement that the lab most frequently falls short. The lab possesses a collection of 19 2.8GHz Pentium 4 workstations with 256MB RAM each and 11 2.4GHz Pentium 4 workstations with 512MB RAM each. The workstations all have aging GeForce2 graphics cards which are barely sufficient for the 3d work that many students wish to undertake, and the small quantities of RAM are insufficient for any but the smallest data-sets for the Computational Physics classes.

Page 139: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

As 30 workstations is insufficient to support more than one class per period, there are additionally 12 ancient workstations with 1.8GHz Athlon CPUs or dual 800MHz Celeron CPUs. The primary cause of the resource shortages is the lack of a hardware replacement and renewal line-item in the school budget. Computer Systems Lab Workstation Administration Debian Software · Perl: A powerful all-purpose scripting language · SystemImager: Simplistic consolebased ghosting tools that use an rsync backend. · SNMP: An advanced remote system-monitoring tool. Custom Software · sysconf/syspref: Workstation autoconfiguration tools. · sysimage: SystemImager-based software that provide beta-testing and remote ghosting capabilities. · tj-kpkg: Kernel source tree management tools.

Page 140: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

Workstations:

The Computer Systems Laboratory does not rely on other portions of the school for its services, instead it houses more than ten servers, two clusters, and a Cray supercomputer. The converse, however, is not true. Due to the available server-space and CPU-time, as well as the high stability of the servers, the CSL is frequently used to host services for other portions of the school's IT network. The best examples of this are the school's web server and the student Intranet, both of which operate out of the Computer Systems Lab. They designed and operated by the student administrators at a high level of efficiency.

Page 141: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

Recently, however, the student Intranet has become too old and outmoded to properly serve the school, and so a new system, called Iodine, is being designed to compliment a major services upgrade within the Computer Systems Lab. One of the largest problems in the Computer Systems Lab is the amalgamation of large quantities of services on the same few pieces of hardware. This creates an increased probability that a single system failure will cripple a huge proportion of the lab, or even of the school itself. Graph of CSL-hosted services over time Services

Page 142: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

Services: Servers

The Computer Systems Lab workstations are a collection of systems of widely varying construction, especially the older ones that have been repaired time and time again. Due to the heterogeneous composition and the multitude of requirements, a secure, efficient method of simultaneously managing more than 50 independent systems was needed. To this end, we used Perl, a powerful scripting language, to write a set of scripts totaling over 4000 lines of code that run on both clients and servers to automate the process of managing the clients.

Page 143: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

The software includes "sysconf", a tool which automatically detects the hardware in a workstation and reconfigures all the software to run optimally, "sysimage", a complex wrapper around SystemImager that allows us to remotely ghost clients, optionally with a beta version of the workstation image, and "tj-kpkg", which quickly and efficiently recompiles new kernel binaries and associated modules for multiple architectures simultaneously.

Page 144: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

The Analysis and Maintenance of a Robust, Highly- Available Computer Systems

LaboratoryKyle Moffett

Administration:

While the Computer Systems Laboratory is well maintained by its staff of student administrators, many of whom work more hours than some part-time jobs, it receives insufficient funding to properly handle all of the challenges that it must meet. Due to lack of hardware, the ratio of services to servers is climbing at an unmaintainable rate, despite the fact that many of those services are being redesigned to better accommodate the additional limitations. The upcoming renovation of the CSL's authentication and authorization services, in combination with the new student Intranet (Iodine), is a much needed upgrade to core school services, and it should be accompanied by a corresponding upgrade to the servers that will host it.

Page 145: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

145

Developing an AI Player for Guess

Who My project is to create a computerized

version of the game "Guess Who?" complete with an AI player. This involves

two research areas: Game AI and Data Mining. Data mining is the analysis of data and the

use of software techniques for finding patterns and regularities in sets of data.

My AI's strategy algorithm will formulate questions that eliminate 50% of the

suspects, which is the optimal percentage.

Page 146: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

AbstractMy project is to create a computerized version of the popular Milton Bradley game "Guess Who?" complete with an AI player. This involves two research areas: Game AI and Data Mining. Data mining is the analysis of data and the use of software techniques for finding patterns and regularities in sets of data. My pro ject can be divided into two ma jor phases: Developing the Game Interface and Developing the AI Player. My game interface consists of a matrix of buttons with pictures of the suspects and an input text field where questions can be entered.

Page 147: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

My AI's strategy algorithm will formulate questions thateliminate 50% ofthe suspects, which is the optimal percentage. If my program can beat an opponent at least half of the time, then I can deem it successful.

Introduction and BackgroundOne of my favorite childhood pastimes during really long road trips was Milton Bradleys "Guess Who?", a simple two-player game. In it each player has a board with pictures of twenty different people labeled with their names. To begin, the opponents each choose a mystery person from the list of twenty.

Page 148: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

To begin, the opponents each choose a mystery person from the list of twenty. Afterwards in subsequent turns, each player asks his opponent a yes-or-no question about the Mystery Person, and he then uses the clues to narrow down the twenty suspects into the answer. A player can only guess the opponents mystery person once, and if he succeeds he wins the game. My pro ject is to create a computerized version of Guess Who complete with an AI player. This involves two research areas: Game AI and Data Mining.

Page 149: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

Computer game AIs can be divided into two categories: current and classic. Current are complicated, animated games like RPG and simulation. Classics games on the other hand are the simple, strategy games that have been around before computers, like Poker, Hearts, Twenty Questions, Othello, Go, etc. Guess Who is one of the few classic game AIs that has not been done. Thus my pro ject is original and will require new algorithms to accommodate the games unique features. Features like interpreting typed questions and specialized file reading will prove useful for future computer games. Since my product will have great entertainment worth, it will be valued by programmers and non-programmers alike.

Page 150: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

Data mining is the analysis of data and the use of software techniques for finding patterns and regularities in sets of data. My AI's strategy algorithm will formulate questions thateliminate 50% ofthe suspects, which is the optimal percentage. This type of critical thinking involves data mining techniques that enable the agent to form conclusions after analyzing input data. If my automated player can defeat its opponent at least half of the time, then I can deem it successful. Data mining, also known as knowledge-discovery in databases (KDD), uses computational techniques like statistics and pattern recognition.

Page 151: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

Although it is usually used in relation to analysis of data, data mining, like artificial intelligence, is an umbrella term and is used with varied meaning in a wide range of contexts. Its official definition is the "nontrivial extraction of implicit, previously unknown, and potentially useful information from data." This encompasses a number of technical approaches, such as clustering, data summarization, learning classification rules, finding dependency networks, analyzing changes, and detecting anomalies. My pro ject however implements an elementary form of KDD; one that does not require complicated equations or extremely sophisticated search algorithms.

Page 152: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

Developing data mining software often involves corporate funding in the millions and teams of programmers working in cooperation with pro ject deadlines of years. Because I work solo and only have the length of an year, my pro ject can at best produce only a crude but effective data mining function. Data mining has many practical applications in the fields of science and mathematics and especially in the business world. Retail companies make a large use of KDD so they can identify patterns among customer purchases and thereby reorient their marketing strategy. Developed in the 1960s, data mining really gained ground in the 90s but is still a relatively new field. New purposes are being identified every year as more sophisticated algorithms applying those principles are programmed.

Page 153: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

I hope to make a minor contribution by applying it to turn-based computer games, which from my research has been largely undone. Research on game structures was limited because most games' code was close sourced. Thus I had to largely develop the two player version on my own from scratch. My game format is largely inspired by Battleship, which I programmed during my summer computer science class. I utilized several matrices to store the suspects images and attributes, which the user can access via a grid of buttons. Also in common is that they are both turn based games where the player tries to locate his opponents secret target. Privacy must be maintained and only one player can look at the screen at a time.

Page 154: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

The guessing game closest in functionality to Guess Who would be Twenty Questions. Like GW,20Q is a two-player, turn based guessinggame that usesprobability and deduction. However the AI agent functions differently. 20Q's AI involves a machine learning technique called the expert system. During a game, the AI maintains a list of questions and as ititerates that list, iteliminates possiblesuspects based on the user's responses. After eachgame, the AI learns from its mistakes andexpands its knowledge database. However, my game AI will actually formulate questions instead of merely regurgitating them. I hope to develop a data miningtechniquethat searches for a shared pattern among thesuspects and thengenerates ayes-or-no question, thatno matter how the user answers will eliminate half of the list.

Page 155: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

In-depth Explanation of Game Structure and Functionality:The goal of Guess Who is to narrow down a group of 20 suspects through yes-and-no question asking to identify the mystery person. Each suspect is of the Person class, which means he/she has a name, picture, and physical attributes. Based on the pictures, the player can enter a question about the mystery person's physical attributes. The user can interact with the game via buttons and a text field. I divided mu game interface into three panels. The top panel's purpose is to provide information. It consists sololy of labels. There are brief instructions on the question format.

Page 156: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

Also displayed is the opponent's question from the previous turn. The middle panel consists of a 5x4 grid of buttons. Each button has a picture of a suspect. The player clicks on the grid in the beginning of the game to choose his mystery person, and again at the end of the game to guess his opponent's. Corresponding to the grid are several 5x4 matrices. Each suspect has a unique 2D coordinate, which links the grid and the matrices. For example coordinate [1,4] on the grid is the button through which suspect Susan can be selected. Coordinate [1,4] on the Person matrix contains Susan's Person ob ject...coordinate [1,4] on the ImageIcon matrix contains Susan's picture...etc.

Page 157: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

There are two additional integer matrices, memory1 and memory2. Memory1 keeps track of which suspects player1 has eliminated, and memory2 does likewise for player2. The grid and the matrices are all created at the start of the game based on input from file suspects.txt, where each suspect's unique attributes are stored. The bottom panel consists of an input text field and three buttons. The text field states the turn number and which player's turn it currently is. In the beginning of the game the user enters his name into the field. For the rest of the game the user enters his question, which must be only one or two attributes long. Afterwards the user clicks the Done button. If the question is valid, the appropriate suspects are eliminated which is illustrated by the corresponding buttons being disabled.

Page 158: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

The screen then turns yellow and informs the next player that it's his turn. This transition screen is a vital feature since it protects each player's privacy from the other. The next player then clicks the Ready button to indicate his readiness to begin his turn. The game continues like this until one player has identified his opponent's mystery person. To reset the game, a player clicks the third button, the Reset button, which activates the setBoard function. Pro cedureI decided to code my game in Java because of my substantial experience with it and the languages conveniently built-in GUI. My game is very low budget and the only resources I needed were a java compiler and internet access.

Page 159: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

My project can be divided into three iterations, each of which required one school year quarter:

Phase 1General Research and Organization Here I planned out my project. I studied the structures of similar computer games and existing data mining techniques to design my program. I also researched GUI methods to develop the game interface. I decided to mold my games structure after Battleship and its functionality after Twenty Questions. Most of my work during phase 1 was mentioned in Introduction and Background.

Page 160: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

Phase 2Game Structure and Logistics I developed the actual game infrastructure and made a functional two player version. The most complicated task of this phase was coding the input text field. The user should be able to enter a really complicated question that the game mechanism then interprets and reacts accordingly to. I had to create a special jargon that included parentheses, "and", "or", "not", and the attribute names. The program read this as a stack and if there was an error notified the player.

Phase 3AI Player Research and Development The most innovative and difficult part of my pro ject will be developing the AI player. I havent reached this stage yet.

Page 161: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

Results and ConclusionI am currently behind schedule. I havent actually started programming the AI, which is my pro jects core, but I have learned tons from the developing the game interface itself. I had definitely underestimated the effort necessary for this phase. I had ambitiously first planned to develop an interpreter where the user can enter in a complex question and the program parses it into usable components to which it then reacts to. However this turned out to be a whole other pro ject by itself.

Page 162: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

I didn't have the time to develop a sophisticated search function that could navigate its way through a maze of parenthesis and prioritize which segment of the question to process first. Calculators use those immensely complex algorithms to process equations. I therefore watered down my interpreter so only two attributes can be entered. However this is a topic I would like to study further, and will be the sub ject matter of any future individual research pro jects I pursue next year at the university.

Page 163: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing an AI Player for "Guess Who?"Jason Pan

References and AppendixFor additional information, visit my website www.tjhsst.edu/ jpan My research sources are mostly websites: http://www.the-data-mine.com/bin/view/Misc/IntroductionToDataMining www.guessmaster.com/aifaq.asp http://www.gameai.com/clagames.html

Page 164: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Computational Models of

Traffic

The goal of my project is to make an accurate simulation of traffic in an

multi-lane intersection world that will be easily mutable for work in studies

on the effects of construction work and accidents on traffic flow.

Traffic Simulations are used in a variety of ways. One of the most

prominent and original uses was to use traffic simulations to evaluate alternate

treatments.

Page 165: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch Abstract The goal of my project is to make an accurate simulation of traffic in a multi-lane intersection world that will be easily mutible for work in studies on the effects of construction work and accidents on traffic flow.

1 Introduction

Traffic simulation is a fairly complex computational model becuase of the many interactions between the agents and the world plus the duality of the agents themselves. There has been much to study in approaching and understanding traffic simulations and why they are useful tools.

Page 166: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

2.1 Background Information and Theory, Uses

Traffic Simulations are used in a variety of ways. One of the most prominent and original uses was to use traffic simulations to evaluate alternate treatments. Since engineers were in control of all the variables, they could evaluate such things as signal control strategies, and speed limit management. Traffic simulations are often used to test new designs. Because roadwork is so expensive, traffic simulations can help quantify the improvement of traffic flow with different geometric arrangements. Even more so, traffic simulations can also be an element of the design process as well.

Page 167: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

Traffic simulations are also used to test traffic center personal and can also be used to recreate a traffic accident and then design a safer environment in response to that accident.

2.2 Why simulations

Simulations describe a non-static environment in a statistical or pictorial way. They can be used whenever there is a system undergoing mathematical changes over a long period of time or there is a need to view an animation of the system to understand what is causing the final results. They are also approached when the mathematical equation can not accurately factor in all the agents of a system. Traffic simulations are used to support optimization models, and new theories in management.

Page 168: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

They are an efficient way to see understand calculated data particularly if the simulation creates an animation output.

2.3 Classification

Most traffic simulations have a dependent variable of time meaning they are dynamical systems. Discrete simulations show real-world examples by having certain changes occur at a certain time. The two discrete models are: discrete time and discrete event. Discrete event models run cycles after each change in the system while discrete models merely run at time increments were situations occur according to the time. Discrete event models are a lot harder to create and model but are useful in specific cases. A traffic simulation called NETFLO was a discrete event model that considered a single facility.

Page 169: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

They are an efficient way to see understand calculated data particularly if the simulation creates an animation output.

2.3 Classification

Most traffic simulations have a dependent variable of time meaning they are dynamical systems. Discrete simulations show real-world examples by having certain changes occur at a certain time. The two discrete models are: discrete time and discrete event. Discrete event models run cycles after each change in the system while discrete models merely run at time increments were situations occur according to the time. Discrete event models are a lot harder to create and model but are useful in specific cases. A traffic simulation called NETFLO was a discrete event model that considered a single facility.

Page 170: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

Discrete event models are often more efficient for models that have lots of down time but since traffic simulation is a study of continuous flow of traffic, discrete time models are more prevalent. Simulations are also defined by there level of detail and/or fidelity to a real life system. There are three levels: Macroscopic, Mesoscopic, and Microscopic. Macroscopic systems are the least accurate and are usually use when only a simple understanding of the interactions between the vehicles is necessary. Mesoscopic are used in situations where a model with high accuracy with the entities is needed but describes their interactions at a more simple level.

Page 171: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

Microscopic is the most detailed which keeps a high level of detail in both the entity and their reactions. For examples, a microscopic model would have lane-changing or left turning while the other two levels might not. Microscopic however only refers to the possibility of a more accurate simulation. They may not be so accurate dependent on the complexity of the system. They are also very hard to create and maintain. Lower-fidelity models are much easier to create and maintain and could be complicated for the purpose created. Often the designer must have a well-developed focus to choose the correct level of fidelity for the pro ject. The last form of classification refers to the processes of the model.

Page 172: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

There are two types: deterministic, and stochastic. Deterministic models are models where every decision made relies on previously gathered data from a reallife situation. Stochastic models instead use random-number generators and statistics to have the entities make decisions. While the deterministic model should be more accurate, very often situations will arise where there is not gathered data. Unless the system is very focused on one type, a deterministic model would be inefficient.

Page 173: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

2.4 Approach to simulations

A certain approach should be taken towards a building a traffic simulation. The first step is to define the problem and the model ob jectives. A builder must now what is the primary focus of his simulation so the appropriate model choices can be made to make the most efficient model for that purpose. The builder must now what data the simulation should output. The next step is to then, using the previously found data, to define the situation the model is evaluating. The builder must understand the ma jor components, and what are the ma jor interactions of those components. He must as well identify the information that needs to be acquired.

Page 174: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

The third step is the most complicated and is where the actual modeling occurs. The complexity of the model must be established to then identify which of the three classifications (macroscopic, mesoscopic, microscopic) will be used. Next is determine what will be the ma jor functions of all of the components of the models and how the data will flow through the components. For that to be efficient, an appropriate hierarchy must be chosen as well. The modeling language must be chosen and all the logic in the program must be documented.

Page 175: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

The final point is the hardest which then the actual coding of the program followed soon after by debugging. After the model has been created, the model must then be tested with input data and then followed up by acquizitionof output data. The model must be validated and evaluated on pre-specified criteria relating to the actual purpose of the model.

Page 176: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

2.5 Specified Functions

Part three of the development model refers mainly to the creation of the model in particular the creation of the functions that interact with the different components. One of the most important is the car following function. The function that determines how one car reacts to the car in front of it is probably the most important function in a traffic simulation. The Car has to decide whether to keep the speed, accelerate, or decelerate depending on the what the car does in front of it.

Page 177: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

In the real world, the driver sees how the car in the front proceeds and makes judgment calls dependent on their own attitude. This is hard to program and also can be one of the most messy methods to deal with. It is the primary method that requires outside information of other cars and often is one of the definers for approaching the hierarchy of the entire model. The most customary parameters for this method are speeds of self and leader, separation distance, pro jected deceleration/acceleration methods for self and leader, and the reaction time of the follower. Another important method if the model is stochastic is the random number generators.

Page 178: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

In the real world, the driver sees how the car in the front proceeds and makes judgment calls dependent on their own attitude. This is hard to program and also can be one of the most messy methods to deal with. It is the primary method that requires outside information of other cars and often is one of the definers for approaching the hierarchy of the entire model. The most customary parameters for this method are speeds of self and leader, separation distance, pro jected deceleration/acceleration methods for self and leader, and the reaction time of the follower. Another important method if the model is stochastic is the random number generators. Because all random-generators are actually no random they have to be truly referred to as pseudo-random number generators.

Page 179: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

For stochastic model to work well, the pseudo-random number generators have to be fairly random. An often forgotten specified function when planning is the vehicle generation method. If any of the real-world data has to be found to make a model, most of it would have to be required for this method. Particularly the volume for a particular road has to be accurate or the entire model will be fairly inaccurate. Other points have to be taken into account like if the mean volume of cars should very over time (eg. Show traffic over a period of time like an entire day with two rush hour periods).

Page 180: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

For stochastic model to work well, the pseudo-random number generators have to be fairly random. An often forgotten specified function when planning is the vehicle generation method. If any of the real-world data has to be found to make a model, most of it would have to be required for this method. Particularly the volume for a particular road has to be accurate or the entire model will be fairly inaccurate. Other points have to be taken into account like if the mean volume of cars should very over time (eg. Show traffic over a period of time like an entire day with two rush hour periods).

Page 181: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

2.6 Picking a language

Picking a language is a very important decision to make when considering a simulation. There are two types of languages to consider: general purpose languages, and simulation languages. Simulation languages are exactly what there title implies: languages created entirely for the purpose of being used to make simulations. They are often easier to use and can be very efficient. They incorporate many features that compile statistics as well as other functions common to modeling. General-purpose languages fit into two categories: procedural or ob ject oriented. Ob ject-oriented languages support the concept of defining an ob ject and then can be put into a world and process data as well as react with the environment.

Page 182: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

Although it creates better analysis, ob jectoriented programming is much harder to program . Some of the factors that should be considered when picking a language are the expected life of the simulation model, the skills of the user community, budget, and assessment of the developers skills.

2.7 Representative Model Comp onent

In simulation, particularly ob ject-oriented planning an important aspect is defining the agent in which the world is created for. For traffic simulation, it is important to realize that the agent has two parts: the driver and the car. The car has fairly easy to quantify attributes such as size, acceleration limit, deceleration limit, and maximum turn radius. These attributes are easy to define and pose no problem. It is the human aspect of the agent that proves to be difficult to model.

Page 183: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

The driver has an aggression factor( will take more risky turns or be a defensive driver), response time to stimuli, and a destination point. The destination can often be ignored in a simpler program but the other two pose a problem. The simplest approach is to set the mode of people in the center with average aggression and response time. The mass of the people can form a standard bell curve with the fewest people being very risky or cautious. The response time will have a much smaller range because of the sensitivity of the data to a fairly long response time.

Page 184: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

This component has to use its own attributes to accurately give logical responses to the environments characteristics like roadway geometrics, intersection configurations, nearby driver-vehicle entities, control devices, lane channelization, and confliction vehicle movements. Some of these aspects of the environment can be ignored depending on the level of fidelity of the program (micro-, meso-, or macro-) and focus of the simulation. Others could to be very vital to the model and have to seriously be taken into account when programming the entity and its reactions to vehicles and the environment. Typically, this is the most important activity in the creation of the model because this is where the analyst decides whether the program is an accurate and reasonable traffic simulation.

Page 185: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

Because traffic simulations tend to be very complex, certain things should be taken into account such as that some of the models features do not accurately simulate a process, the input data is not valid, the results are not detailed enough for this model, the statistical analysis is incorrect, or the model just has bugs or incorrect algorithms. Animation displays are one of the best ways to analyze whether or not the model is an accurate representation of a traffic simulation. The analysis has to be through, but can help identify cause-effect relationships and anomalous results.

Page 186: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

Particular cause-effect relationships to look out for are where congestion starts and is it where consistent traffic jams start or only so often. Anomalous results (when traffic congestion starts in an illogical point) should be observed until they can be pinpointed to a particular behavioral or model deficiency. When there is no animation similar methods can be used to check the reasonableness of the model. One is to execute the model in a real-world application to see if it stands up to the real world data. Another technique is to perform sensitivity tests where certain key variables such as randomnumber variable seeds, maximum seeds, volume of cars/area, are changed and to analyze the data for the model responses. Plotting that data can also give a good indication of how reasonable the data is.

Page 187: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

2.8 Statistical Analysis of the Data

More often than not, statistically analyzing the data is often allotted the least amount of time in comparison to the actual coding of the program. Many simulations are often programmed to just show a situation and therefore there is no analysis is needed. Others however, have to understand that without statistically analyzing the data confidence cannot be placed on any simulation. Simulation is a sampling experiment on the computer and the data gathered has to be appropriately and statistically analyzed. One of the most popular ways to analyze to is to pick point estimates of the measures of effectiveness. Using different arrangements of a system and these point estimates of measures of effectiveness, one arrangement can prove to be the more efficient or reasonable of the others. These points can be found from only one simulation run or from a set of runs.

Page 188: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

2.9 Predicting and Understanding Traffic Congestion

There are two types of roadways to consider when predicting traffic. Urban traffic flow is defined by the intersections or more importantly the traffic light time arrangement. But that has been debunked by that amount of traffic flow can actually by helped by the insertion of traffic lights. Traffic has also been known to increase as well with the shutting down of certain roads. Freeway traffic is a even harder to evaluate. The theory that if there were no traffic lights there would be no traffic is obviously debunked by the mile long traffic jams seen often on highways.

Page 189: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

Although many traffic jams are started because of accidents, some are created because of one underlying principle: when a car slows down the car behind it will always slow down more. When one person slows down by just tapping the breaks, the following car will see the break lights and slow down more. This interaction can actually lead to a traffic jam. In the NETLOGO traffic simulation, it was found that when the simulation started that unless every car had the same distance from the other a traffic jam would result. It would be impossible to separate each car an equal distance from each other, but this program also ignores a very important issues. People all have their own aggression factor when it comes to driving. Although the concept of a fast left lane does help counteract that, there is very little that can be done to work with the human aspect of the traffic simulation.

Page 190: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

3 Design Criteria and Pro cedure

My project was made in MASON, a specific derivation of Java that is condusive to making simulations. It requires that I have not only a World.java file but a WorldWithUI file. In MASON, simulations can be run without any type of visual representation by just running the World file. The WorldWithUI file is like a wrapper for the World filea and actually converts everything into an animation that can be seen and re-animated. I have chosen to work primarily in animation mode because it helps me see if the cars are actually doing anything. I have two other class files: Car and Street. In the world there are two data members: Land and World.

Page 191: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

3 Design Criteria and Pro cedure

My project was made in MASON, a specific derivation of Java that is condusive to making simulations. It requires that I have not only a World.java file but a WorldWithUI file. In MASON, simulations can be run without any type of visual representation by just running the World file. The WorldWithUI file is like a wrapper for the World filea and actually converts everything into an animation that can be seen and re-animated. I have chosen to work primarily in animation mode because it helps me see if the cars are actually doing anything. I have two other class files: Car and Street. In the world there are two data members: Land and World.

Page 192: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

The Land is the actualsparsegrid of the world where the streets are placed. The world is a doublegrid with a layout of how the streets are actually lined up together. This was done because one of my original problems was the streets had a length but that length could not shown in the sparse grid because it is only a ob ject which therefore can only take up one unit (like the cars). WorldWithUI uses the doublegrid to pro ject the screen in the animation becuase world is full of 1 and 0's where 1's are areas of street and 0's are areas of grass. The land contains two ob jects: the streets and the token ob jects for cars. Each street has a data member called myArea which is where the cars are actually located. Whenever a car moves it finds its token ob ject in World.land and moves it the same spot.

Page 193: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

This is done again so that the streets can cycle the cars and since the cars cannot be in two spots at one time there are tokens for the cars in World.land. WorldWithUI uses the token ob jects to pro ject the cars on the animation. Each street runs through the cars that are located in this area starting from end of the street to beginning of the street. Each car checks certain critera -current speed, the max speed of the road, and the orientation of the cars around it- to decide whether it should change either its lane or its current speed. This is where most of the work needs to be done to create a logical and accurate simulation.

Page 194: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

4 Results

My approach to this pro ject has always to work systematically each time adding a more complex idea. I will first work on programming single lane interaction followed by multiple lane interaction. After that is successfully done I will work on intersection codingif it can be done in environment. Using these building blocks I can then make a world to then test certain traffic principles. I particularly interested in the affects of traffic blockage or construction on a system. As of now, I can make cars change lanes, stop at the end of the road and change speed according to the cars arrangement around them. I need to make a method creates carsat the beginning of the road so a constant flow can be monitored. The cars need to be updated so that there understanding of the area around them is more specific and actually looks at the speeds of the cars around them and not just the arrangement.

Page 195: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

A Study of Creating Compuattional Models of Traffic

Madeleine E. R. Pitsch

End Matter

This pro ject involves a lot of work and understanding of the interactions. The way I have designed my pro ject is that if I run out of time I can simple use the building blocks I have made to make a world to study.

6 References

The Monograph of Traffic Chapter 10:Traffic Simulation by Edward Lieberman and AJay Rathi Articles from The Physics of Traffic

Page 196: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Resource Locking and Synchronization in the Linux Kernel

The goal of the KDUAL project is to create a C library which implements the

kernel Application Programming Interface (API) in user-space and performs automatic

debugging. Sections of kernel code can then be compiled against this library and run as ordinary programs for convenient

testing. This particular section of the project aims to implement the kernel's resource locking API with automatic

detection of deadlock situations. Locking will be implemented in two parts-the core algorithms, with their ownAPI designed to

be convenient for the developers, and simple glue code bridging that API to the

kernel API.

Page 197: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

Abstract The kernel is the heart of an operating system; it is the program that is run when the computer first boots up, and it is responsible for accessing the computer hardware and performing other management tasks on behalf of all other programs. Because of its importance, the Linux Kernel must perform perfectly; any vulnerability, instability, or inefficiency will slow down or threaten the entire system. Unfortunately, due to the nature of the program, kernel code cannot be easily debugged. The kernel runs in an environment called kernel-space, which is significantly different from the user-space environment in which ordinary programs run. The kernel provides userspace as an abstraction to the running programs, masking process scheduling, disk I/O, and so forth.

Page 198: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

Because the kernel expects to run in kernel-space, it cannot run in userspace. In addition, if there is an error while the kernel is running, it is difficult for the code to provide the tester with useful data about the error (because the kernel itself is responsible for access to the hard drives and monitor). The goal of the KDUAL pro ject is to create a C library which implements the kernel Application Programming Interface (API) in user-space and performs automatic debugging. sections of kernel code can then be compiled against this library and run as ordinary programs for convenient testing.

Page 199: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

This particular section of the pro ject aims to implement the kernel's resource locking API, providing the core resource-locking algorithms with debugging code which will provide automatic detection, reporting, and resolution of deadlock situations, thus catching subtle locking errors in early testing and production and easing later debugging. Locking will be implemented in two parts the core algorithms, with their own API designed to be most convenient for use in development, and simple wrapper code bridging that API to the kernel API. The core algorithms will also be suitable for applications other than kernel programming, e.g. as a generic lock-testing toolkit.

Page 200: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

11.1 Background and IntroductionThe Linux Kernel

The Linux Kernel performs a number of functions: device access, process management, memory management, and file systems, for example. Thus, it is a huge program (many thousands of lines of code). Moreover, because of the essential functionality it provides, the kernel must perform perfectly; any vulnerability, instability, or inefficiency can slow down or threaten the entire system--in the worst case scenario, bad kernel code can actually damage system hardware. Unfortunately, the very nature of kernel code also makes it difficult to debug.

Page 201: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

Kernels cannot be run as ordinary processes; testing a kernel requires compiling the new kernel and then reconfiguring and rebooting the test machine. In addition, if there is an error in the kernel code, it is difficult for the tester to obtain useful information about the error; first, because the kernel is responsible for all I/O operations, it is often impossible to interact with the system at all, and second, if the kernel code faults it will halt the whole system, leaving no way to analyze the crash. It is possible to run a kernel in a debugger, but only in a very limited fashion. The KDUAL pro ject intends to simplify kernel coding by creating a C library which implements the Kernel's Application Programming Interface (API) in user-space and provides an extensive debugging framework (for example, automatic deadlock checking in the locking implementation).

Page 202: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

Because this library will have an API identical to the real kernel, kernel programmers will be able to compile code against the KDUAL library for testing purposes without making any modifications. The resulting binary will run as an ordinary user-space program, so it can be run with minimal effort, will pose no threat to the system stability, and can be debugged using common tools such as the GNU Debugger (gdb ). In addition, the library code will automatically provide the user with valuable debugging information. Pro jects of this type have been produced before. The Daytona [1] pro ject is a user-space implementation of the TCP stack (which is responsible for analyzing packets and passing them to the appropriate application).

Page 203: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

The Daytona TCP stack provides a valuable tool for studying and extending the TCP protocol, analyzing networks, or creating specialized user-level applciations. Alpine [2] is a similar but more expansive pro ject that provides a network stack and virtual network driver in userspace (the driver stops short of actually communicating with the network device only because the Linux

Page 204: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

Figure 1: Operating Systems Layers

Page 205: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

system requires such access to go through the kernel). Like KDUAL, both of these pro jects emphasize the need for transparency--providing a complete system which can be used as a substitute for the ordinary kernel code without few or no configuration changes to any of the affected applications or to the base system (e.g. it should be possible to make an exisiting program work with the new code simply by recompiling it). (FIXME: this is true for alpine but not quite for daytona? (overlays and stuff ))

Page 206: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

They also share the KDUAL goal of providing a system that is primarily a tool for debugging and development rather than a production system designed to perform at or above the level of the kernel (which is generally impossible when porting kernel code to user-space, because of the overhead incurred in user-space). However, because these pro jects operate as a functional, integral part of a working system (the user-level network stack is used for actual transmission of packets in a running system), they must also deal with issues of synchronization with the running kernel that do not occur in KDUAL (because the ersatz kernel provided by the KDUAL library is used only as a test-harness for other programs and not to actually provide the kernel functionality).

Page 207: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

1.21.2.1 Resource Lo ckingBasics of Locking

This particular part of the KDUAL pro ject is focused on implementing the kernel resource-locking API. Resource-locking is essential for successful emphconcurrency: having multiple separate programs running simultaneously and working with the same resources--shared data ob jects, I/O handles (e.g. sockets), physical devices (reading data from a floppy disk), or any other resource which the processes cannot all use at once. In the absence of locking, such programs would simply "stomp" on each other, accessing the data simultaneously and potentially resulting in data corruption and program failure.

Page 208: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

Consider a simple example: a number i shared by two programs, A and B. At some time during execution, i is 0, and each program wants to access it: A wants to set it to 3, and B wants to read its value for later use. If both processes access i without coordination, A will succeed in setting the value, but the value B returns is unpredictable. It may be 0 (if the read is completed before the write), 3 (if the write is completed before the read), or some other garbage value (if the read is performed while the write is occuring). To prevent an error situation, a process cannot safely operate on

Page 209: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

the data until it ensures that no other process is operating on the data. To achieve this, each resource is associated with a "lock" ob ject. Generically, the lock has two states: locked and unlocked. To use a resource, a process must first obtain the associated lock. If the lock is currently in the unlocked state, the process can take the lock and then proceed to manipulate the data as it pleases, returning the lock to the unlocked state when it is done. If the lock is taken (because some other process is using the resource), the current process must wait on the lock until it is returned to the unlocked state (i.e. the resource is no longer in use).

Page 210: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

The process can then take the lock as before and proceed to use the resource. The use of locks is illustrated in the classic "Dining Philosophers" thinking puzzle. Consider a group of philosophers seated around a table, with a plate and chopstick for each philosopher. A philosopher's life consists of two things: thinking and eating. Eating requires two chopsticks, so the philosopher must take both of the chopsticks next to him (one on either side). While thinking, a philosopher needs no chopstick. As long as the philosophers take turns thinking and eating, they should all be able to eat.

Page 211: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

However, a problem arises if all the philosophers attempt to eat at the same time. Each philosopher will grab one of the adjacent chopsticks, so that there are no chopsticks left on the table. Each philosopher will then wait for the philosopher next to him to give up his chopstick. Since no philosopher has two chopsticks, no philosopher will finish eating; therefore no philosopher will ever give up a chopstick, and no philosopher can ever have two chopsticks.

Page 212: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

Thus, the philosophers will starve to death. This can be avoided only by coordinating their actions; for example, access to the chopsticks could be controlled by a single lock, perhaps a bottle of hot sauce in the center of the table, with all the philosophers agreeing that they must take the lock before they can take chopsticks. When a philosopher is hungry and neither of his neighbors is eating, he takes the bottle, takes both chopsticks, puts the bottle back, and eats.

Page 213: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

When he is done he takes the bottle, yields the chopsticks, and puts the bottle back again. A philosopher will not attempt to get the bottle unless both chopsticks are available, and when he holds the bottle the other philosophers are prohibited by their agreement from interfering with his taking of the chopsticks. Since a philosopher cannot end up with only one chopstick, the starvation situation above has been avoided; in programming terminology, taking both chopsticks has become an atomic operation: it will either fail or succeed completely, and is guaranteed not to result in a partially altered state or to produce any intermediate states visible to other processes.

Page 214: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

When the overall set of actions that needs to be accomplished is not atomic (e.g. taking the chopsticks), resources can be safely accessed by binding that access to a single, atomic operation (e.g. taking the bottle). 1.2.2 Deadlo ck

Problems can arise in locking when one or more processes end up in dead lock : the processes "spin" forever while trying to take a lock, and are never able to take it, thus bringing the system to a halt. To illustrate this with the dining philosophers problem, suppose that there were two locks involved; perhaps one for the chopsticks and one for the food. To be able to eat, a philosopher must therefore hold both locks.

Page 215: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

The same problem occurs as with needing to hold both chopsticks: if two philosophers each get one lock, each will wait forever to get the other lock, again resulting in starvation. In computer programming, this condition is known as the dead ly embrace. While the classic example involves only two processes and two locks, the same principle can be extended to any number of locks sought by any number of processes; deadlock occurs anytime a process cannot obtain the lock it is waiting for without giving up a lock it already holds. For example, the most trivial case of deadlock occurs when a process attempts to take a lock it already holds; each time the process checks the lock, it is in the locked state, so the process will keep waiting.

Page 216: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

However, it will never unlock the lock (since it is busy trying to lock it) and so will wait forever. Deadlock causes all processes invovled to hang and makes the resource controlled by the lock unavaiable. Deadlock in the kernel is an especially severe problem, because it will hang the entire system--if the kernel locks up, userspace programs will also be unable to run, and the entire system becomes useless and requires reboot. In addition, deadlock can be a very difficult problem to identify and resolve, since it can involve multiple different processes and usually is not reliably reproducible (since it requires the processes to take their locks with very specific timing).

Page 217: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer2

2.1

TheoryKernel-space User-space Transition

A computer running the Linux Operating System can be viewed as a series of "layers". Each layer is intended to be dependent only on the layers immediately adjacent to it. At the bottom is the machine hardware: processors, hard drives, video cards, RAM, and so on. Immediately above this is a layer of code called drivers. Each driver is an independent code module responsible for interacting with a single specific piece of hardware. Above the drivers is a layer of abstractions intended to mask the device-specific implementations of the drivers.

Page 218: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

For example, regardless of the type of device files are stored on (e.g. a local IDE or SCSI hard drive or a networked fileserver) and regardless of the type of filesystem present on the device (e.g. ext2,ext3,ReiserFS ), processes will see the standardized system presented by the Virtual File System code. This allows a uniform method for operating on any files used by the system; the VFS dispatches instructions to the driver responsible for the particular device, which can then take whatever device-specific actions are appropriate.

Page 219: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

Finally, above this are the syscal ls (for System Calls), which are the hooks intended to be invoked by regular user processes. These are the calls found in the second section of manpages as listed by man syscalls. As the manpage states, "The system call is the fundamental interface between an application and the Linux kernel." This exemplifies the nature of the kernel layers. To provide a simple example: a user program wants to write data to a file. It invokes the write syscall and knows that upon the call's completion the write has been accomplished. The write call tells the Virtual File System to write the data. The VFS in turn communicates with the appropriate filesystem driver; in the case of the TJCSL, the Andrews File System driver.

Page 220: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

This driver then communicates with the the driver for the actual physical device where the data will be written; this last driver is responsible for actually committing the data to disk. However, this is all invisible to the original program. That program is completely unaware of any layer beneath the write call and the various other syscalls it uses to manipulate files; for all it knows, the system is pulling data from the ether. Unfortunately, this structure makes debugging the kernel very difficult; since it is the bottom layer, all other layers ultimately depend on it, and any failure will have serious ramifications. Also, because the entire OS depends on the kernel, buggy code in the kernel can and in all likelihood will

Page 221: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

make the system unstable and/or insecure. This has a two-fold effect: first, it introduces the possiblity of data corruption or other system damage, and second, it prevents active debugging analysis after a crash. Moreover, it is difficult and somewhat unreliable to try to run the kernel in the kdb debugger. Typically, a kernel testing setup involves a seperate test machine with a serial console (another machine connected to it via serial cable that will allow access to the machine in the event of problems, e.g. a failure of the keyboard drivers). If there is a problem with the kernel, a few cryptic error messages will appear on the serial console and the system will become completely inoperable.

Page 222: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

This system of debugging is slow, requires additional hardware (a second computer and serial cable), and provides very little usable information. The goal of KDUAL is to simplify this process by moving the code to be tested up from its ordinary layer into the top layer--user-space. Running in user-space, the code would not pose the same threat to system integrity and stability that it ordinarily would. The running kernel is unaffected by a code failure, so it remains stable, preserving the system and allowing post-crash debugging. In addition, code running as a normal process can be debugged using powerful, already-existing debuggers like gdb. This eliminates the need for additional hardware and provides vastly more infromation about the error.

Page 223: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

However, kernel code is dependant upon the adjacent layers to funtion: the write and read syscalls can hardly function without the filesystem provided by the VFS. As is the nature of the layered system, these are not available to user-space programs. Thus, the needed functionality must be re-implemented in user-space, preserving the existing kernel API so that switching from compilation against the user-space testing library to the actual kernel will be seamless. The KDUAL pro ject will produce a user-space library of C code which implements the essential parts of the kernel API and which will provide the tester with built-in debugging checks and information that will further simplify the testing process.

Page 224: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

2.22.2.1 Locking ImplementationPrinciples of Locking

One of the key kernel components which must be implemented is resource locking: ensuring safe access to shared resources through the use of special ob jects (locks,semaphores,mutexes ), as explained in the previous section. The focus of this pro ject is completing the KDUAL implementation of the kernel's locking API, providing both the core locking algorithm and a powerful debugging infrastructure including automated deadlock detection. Atomicity is the key feature of a locking system.

Page 225: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

If, in the example of the dining philosophers, the lock was two bottles of hot sauce, then two philosophers could each obtain one bottle, and end up fighting over the lock in the same manner they fought over the chopsticks. The value of a lock is that the operation of taking it is necessarily atomic; thus, when the lock is used to control more complicated actions (e.g. taking two chopsticks), the entire set of actions becomes atomic. If the lock is not taken, the entire sequence is immediately aborted, and the operation thus fails without changing the state. If the lock is taken, the other actions can be performed safely, because no one else can interfere with them until the lock is released; thus the change in state is not visible until the entire process completes.

Page 226: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

In the philosophers example, taking a single ob ject is inherently atomic; we assume that the philosophers have the dignity to avoid fighting over the hot sauce. In programming, certain simple operations are guaranteed to be atomic because of the CPU architecture. The exact operations available differ among architectures, but all architectures provide some atomic operation or operations which make it possible to alter a value if and only if it matches another value; e.g. take a lock only if it is unlocked.

Page 227: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

Thus, the heart of any locking systems is a simple numerical value that can be atomically altered by individual CPU instructions. Building a locking implementation from scratch would be tremendously difficult, and is unnecessary given the number of existing locking implementations. The kernel code itself provides a very simple locking implementation; this could theoretically be ported to user-space, but building on it to provide the additional functionality desired for the KDUAL locking implementation would be very difficult. The POSIX Threads (pthreads) library provides a complete, powerful user-space locking implementation, and is thus suitable for use in KDUAL.

Page 228: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

Naturally, however, it does not conform to the kernel API. The KDUAL locking implementation thus consists of a lock structure and a set of methods based around the pthreads library, providing more powerful debugging features with the Linux Kernel API. The KDUAL code lso help debugging by providing built-in checking for deadlock conditions where processes wait forever trying to obtain a lock. This checking will be accomplished by a dedicated deadlock-detection thread that monitors the dependencies of all the threads and responds to deadlock conditions.

Page 229: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

2.2.2

Deadlo ck Detection

The detection thread relies on a structure called a Wait-For-Graph (WFG), a common concept in deadlock detection. The nodes of the WFG represent processes in the system, and the (directed) edges of the graph are dependencies between processes, where an edge i j indicates that process i is waiting for a resource currently held by j. If there is a deadlock, it will be detectable in the WFG as a cycle : a complete path from any node back to itself.

Page 230: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

The definition of deadlock is that the process is waiting on a resource that it cannot obtain without releasing a resource it already holds; this will necessarly result in a cycle in the WFG, because the process must (through some number of intermediaries) depend on itself. Thus there must be some series of edges leaving from a node and pointing back to the node. The reverse is also true: a cycle in the WFG always indicates a deadlock situation: the cycle indidcates that the process ultimately depends on itself, meaning that it is necessarily involved in deadlock and requires outside intervention. This means that checking for cycles in a WFG will detect all deadlocks without identifying non-deadlock situations as deadlocks (false positives); thus it is a reliable method for detection.

Page 231: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

//TODO: Talk about how the WFG will actually be implemented, potential for false positives?

Page 232: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

3

Design Criteria

Some sections of the kernel source can be used with little or no modifications, particularly data structures such as doubly-circularly-linked lists and redblack trees. A few others are beyond the scope of the pro ject, in particular the scheduler, block devices, and the TCP/IP stack. The primary focus is on implementing the memory allocation algorithm, resource access controls such as spinlocks and semaphores, and the Virtual File System (VFS). All code will be written in C, following the kernel style of taking advantage of special extensions for the GNU's Not UNIX (GNU) C compiler (gcc ).

Page 233: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

A few basic developmnent tools will be also be used, such as the Vi IMproved (VIM ) text editor, the Concurrent Versions System(CVS), and the GNU make utility. The ultimate test of the system is simply to see if kernel code can successfully compile and run. Testing will require using code to test each functionFigure 2: Wait-For-Graph The Wait-For-Graph at top clearly indicates a deadlock via the cycle (in red) between processes 2 and 3. The lock situation frmo which the WFG is constructed is shown below: 2 and 3 wait on each other's locks, while process 1 is merely waiting on one of the locks and is not involved in the actual deadlock.

Page 234: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

of the provided API, and running under various conditions (for example, low memory).

4

Results and Future Development

The first stage of testing has been to confirm the viability of the basic methods (locking, unlocking, status checking, etc). So far, these tests have succesfully completed, proving the capability of the locking system in a one-thread environment. The next tests will confirm the viability of the locking methods in a multiple-thread environment.

Page 235: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

The final testing phase will be to check the abilities of the deadlock-determination algorithm when it is complete. The KDUAL library will greatly simplify and speed up the kernel development process. This will be an immediate benefit to the kernel development community. It will also have much more far-reaching effects, because a better kernel development produce will benefit all users of the kernel--a significant group including dedicated hackers, more casual users experimenting with non-Windows systems, schools, and even important businesses (including Microsoft's web hosts).

Page 236: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

In addition, the library and information about its development will provide a valuable basis for anyone attempting to undertake a similar pro ject in kernel implementation--for example, producing a binary and library focused more on providing a viable production environment for "virtual servers" (like the User Mode Linux pro ject).

Page 237: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

References[1] P. Pradhan, S. Kandula, W. Xu, A. Shaikh, E. Nahum, "Daytona - A User-Level TCP Stack," Available HTTP: http://nms.lcs.mit.edu/~kandula/data/daytona.pdf [2] D. Ely, S. Savage, D. Wetherall, "Alpine: A User-Level Infrastructure for Network Protocol Development," Available HTTP: http://alpine.cs.washington.edu/alpineUsits01.pdf [3] N. Krivokapi´, A. Kemper, E. Gudes, "Deadlock Detection Agents: A c Distributed Deadlock Detection Scheme," Available HTTP: http://www.db.fmi.uni-passau.de/publications/techreports/MIP9617.ps.gz

Page 238: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Development of a Deadlock-Detecting Resource Locking Algorithm for a Kernel

Debugging User-space API Library (KDUAL)Timothy Wismer

[4] J. Holliday, A. El Abbadi, "Distributed Deadlock Detection," Available HTTP: http://www.cse.scu.edu/~jholliday/dd 9 16.htm

Page 239: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Algorithms for Computational Comparative

Historical Linguistics

Over time, languages change by regular, systematic processes. It is possible, by

looking at the state of a language now and in the past, to deduce the exact changes

that occurred, and the order in which they occurred. These changes also split

languages, therefore it is also possible to, by using modern languages as input, induce the probable structure of their

parent language. My goal is to develop algorithms by which computers may

efficiently analyze the historical structure of languages and language families.

Page 240: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

Abstract The purpose of my research was to design algorithms and techniques which would aid in using computers to deal with languages, their relationships to each other, and their changes over time. My method of research is mostly thinking of ideas as to how to organize linguistic data or how to use it to build information about the sub ject, for whatever purpose anybody who might use my algorithms might have. My results are several algorithms, data storage methods, and general insights reached about the difficulties involved in dealing with historical linguistics on an algorithmic basis.

Page 241: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

IntroductionHistorical Linguistics is a relatively new study. It only achieved an even slightly scientific status in the 19th century, and its methods are typically unsystematic, and often reliant on intuition. If it is to be used with any great degree of accuracy and reliability, a systematic approach must be taken to historical linguistics analysis, and the best way to do this is to develop algorithms for it. The development of algorithms not only allows computers to do much of the rote labor which historical linguistic analysis has much of, but also leads to a greater understanding of the methods used.

Page 242: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

I am aiming to develop algorithms for diachronic historical linguistics, meaning that it will reconstruct the changes that have occurred in languages at various points in time.This has no material benefit to anybody, but perhaps leads us to a better understanding of history, literature, culture, and to some degree language itself. I cannot predict what the algorithms I am attempting to create might lead to, but my purpose is simply to allow computational analysis of language change. My specific and imminent material aim is to create a program that will, given a group of languages in some form reflected nothing beyond their phonetic data (with semantic connections to organize), form a chronological and familial hierarchy among them, discovering which grew out of which, and their relations to each other.

Page 243: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

Ideally, this would also develop hypothetical ancestor languages for each of these, placing these within the temporal hierarchy. Once the hierarchy is in place, the ancestor languages can be honed to be more and more likely and rigorously derived from their descendents, hopefully without modification of the hierarchy already created. I plan to program in C, without any object-oriented or input/output extravagancies.

Page 244: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

2.1 Background Information and TheoryPhoneme Representation

Creating a phonetic representation system does not require a great deal of foresight as to use, and does not affect the algorithms that use it much by its design, other than in the realm of efficiency. A few integers will describe any phoneme, and there are many phoneme description schemata to choose from. I personally chose the most standard, that of the International Phonetic Alphabet. Once one has categorized and schematized phonemes, they can be arrayed into words. See Appendix A for a description of the phonetic categorization system I used.

Page 245: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

2.1 Background Information and TheoryPhoneme Representation

Creating a phonetic representation system does not require a great deal of foresight as to use, and does not affect the algorithms that use it much by its design, other than in the realm of efficiency. A few integers will describe any phoneme, and there are many phoneme description schemata to choose from. I personally chose the most standard, that of the International Phonetic Alphabet. Once one has categorized and schematized phonemes, they can be arrayed into words. See Appendix A for a description of the phonetic categorization system I used.

Page 246: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

2.2 Language Representation

Once one can store words, one can define a language. Morris Swadesh developed a procedure of generating lists of words that will not be borrowed, and will remain in a language only changed by phonetic phenomena. These are simple words, used frequently in daily life. If one stores a phonetic description of a Swadesh List of words, this defines a language at one point in time. This snapshot of a language can be used to directly relate any language to any other on purely phonetic grounds, escaping the traps inherent in keeping any semantic basis. This also avoids using any connection to grammar, which is a far more complicated subject and does not follow direct phonemic changes, unaffected by other things.

Page 247: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

2.3 Language over Time

If you can define a language at one point in time, you can define various languages at various points in time and link them, to create a fully temporal as well as phonetically spatial language. It is feasible to represent dialect in this manner, and make the language geographically spatial as well, but this is outside of my intention. It has been shown, beginning with people such as Jacob Grimm in the 19th century, that languages, on some level, change by regular phonetic rules. These rules are unaffected by semantics or other languages, and function randomly.

Page 248: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

Their randomness is probably not pure, but the multitude of factors affecting phonemic change are so complex that the result appears random, and can be treated as such in analysis. Most approaches to this have represented the various states of the language in a tree. This does a good job of showing which languages have relationships to each other, but does nothing to represent the nature of the relationships themselves. My goal is to discover the actual sound changes which occur between language states from the raw data, and using this information better determine unknown states of the language.

Page 249: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright2.4 Problems and Simplifications

There are various problems which make determination of sound changes difficult, and simplifications which remove these problems, though also lowering the accuracy of the conclusions.

2.4.1 BorrowingI aim to analyze the regular phonetic changes between states of a language. However, not all language change occurs because of regular phonetic changes. Therefore, I must only use word that are not sub ject to borrowing, and only sub ject to regular phonetic change. There are lists of words in a language known as Swadesh lists which are never borrowed, due to their fundamental and common use.

Page 250: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

In only using these words, I can avoid the problem of borrowing, with no significant loss of accuracy in conclusions drawn.

2.4.2 PolymorphismIn many languages, there are various phonetic forms of the same semantic form, i.e. synonyms. With these, typically only one will be phonetically related to those in previous languages. When assembling the list of words to use, it then becomes necessary to choose the word form cognate with the other word forms I am using. I can see no way to get around manually checking every list of words for non-cognate synonyms. This can be circumvented, but at a cost in speed.

Page 251: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

2.4.3 Dependent Evolution

The base assumption of my model of regular phonetic change is that sound changes are regular and independent. A problem is that occasionally sound changes, by changing the allocation of mouth space for various phonemes, can cause other sound changes. Initially, I will simply assume all sound changes are random. I may work in frequently-occuring dependencies in making my algorithms more efficient. If possible, I will use dependency in determining the likelihood of various sound changes, and therefore most likely past language states.

Page 252: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

2.4.4 Homoplasy

When a phoneme undergoes a sound change, in some cases it will be changed into a phoneme that already exists. E.g., /b/ is devoiced to /p/, but the /p/ phoneme already exists and is unchanged. I do not plan to eliminate homoplasy, and will definitely keep it as a possibility when analyzing sound changes from raw data. I will also attempt to predict homoplasy when formulating past states.

Page 253: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

3 Design Criteria and Procedure

I approached the problem of designing the complete system from both ends. My first work involved finding ways to categorize phonemes, and store them, then work up from this to storage of words in Swadesh lists. This would be required for all work dealing with languages, whether over time or simply synchronic. My nextt attempts were to develop algorithms to, given phonetic data of languages, find the connections between them, and the regular sound changes. This was successful in one way, but led to a dead end in another.

Page 254: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

I was able to develop algorithms to find the connections, but I found these connections useless. Sound changes do not follow linearly in the paths of those preceding them, and there is no such things as phonetic momentum. Simple similarity judgments proved more useful than sound changes. However, these algorithms may serve a purpose in some other aspect of historical linguistics, so my working on them was not a complete waste, they simply ended up being something of a tangent to my main effort. My third, and most actually significant effort, were my attempts to work from the top down.

Page 255: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

I would simply, given languages, form a hierarchical web. I began by simply organizing them by similarity, and discovered that organizational processes, if handled correctly, would actually serve to find hypothetical ancestor languages.

Page 256: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

3.1 Phoneme Storage

I designed a phoneme storage system described in Appendix A. To store a single phoneme, a 5-dimensional array of integers is needed. I basically used the guidelines of the International Phonetic Alphabet, but found a way to not separate vowels from consonants. I did not include clicks or other weird sounds, but my method of storage could easily be adapted to them. Essentially, there is a series of arrays within arrays describing the various dimensions through which phonemes can be classified.

Page 257: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

However, each level means a different thing depending on which section of the array you are in. In the consonants, the second dimension is voice, but in vowels this is not needed, so rounding is used there. This provides for a maximum efficiency of space, and an ease of access, balancing well memory and processing usage. This phoneme storage system can be applied to anything. Word storage involves simply making an array, or linked list, or some other linear structure of phonemes. A Swadesh list can simply be an array of words in which each position in the array corresponds to a single meaning. The phonetic categorization and storage system was my most concrete achievement.

Page 258: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

3.23.2.1 Correspondence Brute Corresp ondence Algorithm

This algorithm, the only one I have fully implemented, is very simple and ineffecient, but is a beginning. I go through two word-lists representing states of a language and find every case in which one phoneme in one corresponds to another in the other. When a correspondence occurs in all cases, it is recorded. These correspondences are the most simply sound changes, and can be used to analyze other sound changes. This algorithm is a necessary first step to analyzing the differences between two languages, though horribly inefficient. It is unpleasant, but cannot be avoided.

Page 259: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

3.2.2 Limited Correspondence Algorithm

This is a refinement of the Brute Correspondence Algorithm. In this, I first limit my list of phonemes to go through. I can do it either for those that occur in the first list, in both lists, or to some pre-existent list of phonemes for the language. This is somewhat dangerous, as it brings about a possibility of missing really peculiar changes, but it is much, much faster than just going through everything.

Page 260: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

3.2.3 Abstract from Correspondences Algorithm

In this, I go through the sound changes discovered by either of the above algorithms, and by a brute search find regularities, constant shifts between phoneme types. Again, this is brute force, but refinements can be added later to make it more efficient.

Page 261: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright3.33.3.1 Web Formation Web Formation Algorithm

The Web Formation Algorithm will act on a "Family." The first Family will be formed of all of the languages and a proto-language formed from all of them via an Ancestration Function. A Family has a parent, the proto-language, and any number of children, which are either languages or Families. At the beginning of the WFA, the Family has no proto-language, and the WFA forms it through an Ancestration Algorithm. The Web Formation Algorithm will take a random language, and find all of the languages which it is closer to (by some Distance Function, which I will discuss later) than it is to the proto-language.

Page 262: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

These are put into their own family, which is made a subfamily of the original family. This is done until either every language is closer to the proto-language than anything else, or all language are in Families. Then, the Web Formation Algorithm is applied to every Family within this Family. Eventually, there will be a web of connected families, culminating in leaf families whose only member is a single language.

Page 263: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

3.3.2 Ancestration Algorithm

This creates a proto-language from all of the languages which descend from it. For the abstract number languages, I've simply used averaging. This would be where the phonetic data comes in, and this would be one of the only places where abstract number-languages differ from actual phonetic data. It is hypothetically possible to only need a distance function and not a separate ancestraction function by having the ancestration function act by creating random languages and slowly working them closer in distance to the original languages, but this would be monstrously inefficient and could be greatly helped by use of phonetic knowledge.

Page 264: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

3.3.3 Distance Function

The Distance Function is used within the WFA. It is what makes use of the information built up through the Correspondence Algorithms. Simply by changing the Distance Function, the Web Formation process I describe here can be applied to many things. For testing, I mostly used abstract number-lists, and applied it to languages at the end.

Page 265: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

4 Discussion

I did not succeed in unifying all of my efforts, due to the overwhelmingly large scope of historical linguistic issues, but I did make significant progress in from both ends of the general problem. My phonetic classification system can be applied, and something like it will be necessary, in any effort actually using linguistic data in any way. Simply put, traditional alphabetic systems will not function, and an organized multi-dimensional method is far more efficient, in space and time, than a linear storage. My correspondence algorithms were generally uninteresting and on the whole not very useful.

Page 266: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

These were certainly the weak point of my work this year. My web formation algorithms were probably the strong point this year. I examined how to deal with languages (through family relations) and developed methods that will, with further application, yield actual information about language change. By experimenting with various distance algorithms within my framework, and testing the accuracy of these with regard to actual historical data gleaned through writings and extensive manual language derivation that has been done in the past, one could see how these models would correspond to the actuality of language change. My web-formation algorithms are a useful framework for historical linguistic experimentation and theorization, as the field has in the past been limited to purely historical studies.

Page 267: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan WrightConclusion I've yet to conclude.

6 References

Sedgewick, Robert. Algorithms in C, Part 5: Graph Algorithms. 2002. AddisonWesley. Boston, Massachusetts. Kanna, S., Warnow, T. A Fast Algorithm for the Computation and Dnumeration of Perfect Phylogenies. 1996. Warnow, T., Nakhleh, L., Ringe, D., Evans, S. A Comparison of Phylogenetic Reconstruction Methods on an IE Dataset. Warnow, T., Nakhleh, L., Ringe, D., Evans, S. Stochastic Models of Language Evolution and an Application to the Indo-European Family of Languages.

Page 268: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

Appendix A

My phoneme categorization system uses five integers to define a phoneme. Each phoneme is stored as an array of five integers, but could easily be converted into a more compact binary format, due to the limits on each integer. The first integer can only be within a range of 0 to 1, the second is also 0 to 1, the third is 0 to 11, the fourth is 0 to 7, and the fifth is once more 0 to 1, only used for vowels. The first integer declares whether the phoneme is a consonant or a vowel. A value of 0 is a consonant, a value of 1 is a vowel.

Page 269: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

The four other integers are differently used for consonants and vowels. The consonant system uses only three of the integers, and the vowel system has smaller ranges for the four integers it uses. The consonant system begins with a 0 or 1 for voice. If 1, the phoneme is voiced. The third integer defines Place of Articulation. 0 is bilabial, 1 labiodental, 2 spans dental, alveolar and postalveolar in situations where they are undifferentiated, 3 is dental, 4 alveolar, 5 postalveolar, 6 retroflex, 7 palatal, 8 velar, 9 uvular, 10 pharyngeal, and 11 glottal.

Page 270: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

The fourth integer defines Method of Articulation. 0 is plosive, 1 nasal, 2 trill, 3 tap (or flap), 4 fricative, 5 lateral fricative, 6 approximant, 7 lateral approximant. The vowel system also begins with a 0 or 1, but for roundedness rather than voice. The second integer defines openness of the vowel: 0 is close, 1 close-mid, 2 open-mid, and 3 open. The third integer defines Frontness: 0 is back, 1 central, 2 front. The fourth integer defines offset, which is for phonemes very close to another phoneme and differentiated in various directions, but not so different in any so as to be classified differently. The phoneme which less completely fits the phonemic classification is offset.

Page 271: Computer Systems Lab TJHSST Current Projects 2004-2005 Second Period

Developing Algorithms for Computational Comparative Diachronic Historical Linguistics

Dan Wright

Appendix B

Here I will put conclusions about relationships between languages I actually find using my algorithms.