ics2308 artificial intelligence notes.pdf

7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

1/23

ICS2308 ARTIFICIAL

INTELLIGENCE

Course Outline

1. Introduction to artificial intelligence2. Knowledge representation3. Heuristic search4. Natural language processing5. Symbolic machine learning6. Connectionism and evolutionary computation

References

Introduction to Artificial Intelligence

Intelligence is the capacity to learn; learning means acquisition and application of

knowledge.

Artificial intelligence is an area in computer science that focuses on creating machinesthat can engage on behaviour that humans consider intelligence.

AI is a combination of computer science, physiology and philosophy.

History of Artificial Intelligence

The invention of computers in 1941 availed the technology to create machine intelligence.

However, it was not until the 50s that the link between human intelligence and machines

was really observed.

Norbert Weiner researched on the feedback theory and established that an intelligent

behaviour was as a result of feedback mechanismswhich could possibly be simulated by

machines.


2/23

In 1955, a program was developed representing each problem as a tree model. The program

would attempt to solve it by selecting the branch that would most likely result in the

correct conclusion.

In 1956, the term AI was coined at the Dartsmouth Conference and since then, research

has continued into developing programs or applications that could efficiently solve

problems and learn by themselves.

Several applications have been developed e.g. the missile systems, voice and character

recognition, engineering controllers etc.

Motivation towards Artificial Intelligence

Computers are generally well suited toperforming mechanical computationsusing fixed

programmed rules.

This allows them to perform simple monotonous tasksefficiently and reliably which human

beings are ill suited to.

For more complex problems, computers have trouble understanding specific situations

and adapting to new situationsunlike humans.

Artificial intelligence aims to improve machine behaviour in tackling such complex tasks.

AI also is allowing us to understand our intelligent behaviour humans have an

interesting approach to problem solving based on abstract thoughts, high level reasoningand pattern information (recognition).

AI helps us to understand this approach by creating it and enhances us beyond our current

capabilities.

Applications of Artificial Intelligence

1. Game playing machines that can play certain games e.g. Chess with great masterymainly through brute force computation which gives ability to look at hundreds of

thousands of positions per second.2. Speech and character recognition the ability of computers to recognize voices

and writings.

3. Natural language processing providing the computer with understanding of thedomain of a natural language.


3/23

4. Expert systems systems that are able to make decisions and perform the work ofprofessionals (human experts) e.g. diagnostic systems in hospitals.

5. Robotics automation of tasks performed by a mechanical device throughpredefined programs

6. Information predictors e.g. in banks, insurance companies, market surveyswhereby intelligence tools are used to detect trends and predict e.g. customer

behaviour.

7. Computer vision and pattern recognition computer processing of images fromthe real world and recognition of features present in images.

Challenges to the Achievement of Total Artificial Intelligence

Test of Intelligence by Turing: A machine will be said to be intelligent if it can

successfully deceive a human being that it is a human being as well.

- Limitation of sensory organs (that assist humans to learn) in computers- Intelligence is a complex idea- Knowledge is vast- Knowledge is of various domains:

o Affective feelingso Cognitive analysiso Psychomotor motiono Etc.

Some domains are hard to store in a machine e.g. affective, psychomotor

- Lack of a well understood model to represent reality and thus induce artificialintelligence

- It is expensive to acquire tools develop and research on artificial intelligence

Knowledge Representation

Knowledge is the symbolic representation of some named universe of discourse.

The universe of discourse may be actual activities or fictional ones in the future or insome belief.

In AI systems, we may need to represent objects, events and performance or behaviour as

kinds of knowledge.


4/23

Knowledge representation is an area of artificial intelligence that is concerned with how to

use symbol system to represent a domain of discourse.

Its goal is to organize knowledge in a manner that facilitates drawing of conclusions.

Components of a Representation

A representation has four components:

i. A represented world the domain that the representations are mapped`ii. A representing world the domain that contains the representationiii. Representing rules the set of rules that map elements of the represented

world to those of the representing world

iv. The representation system the procedure for extracting information in aknowledge representation; its choice determines the ease or difficulty of

finding the information.

Uses of a Representation

After representing the knowledge, we use it for:

i. Inference/reasoning inferring facts from the existing data

ii.

Learning acquiring knowledge whereby mean data has to be classified prior tostorage for easy retrieval and has to interact with existing facts to avoid

duplication.

Types of Knowledge

There are two main types of knowledge:

i. Declarative/descriptive/propositional knowledge it is the factual informationstored in memory and is known to be static in nature.

It is the part of knowledge that describes how things are.

Its domain is defined by things or events or processes their attributes and the

relations between them

ii. Procedural/imperative/know-how knowledge it is the knowledge of how toperform a task or how to operate. It is mainly applied in problem solving.


5/23

Properties of Knowledge

Good representations of Knowledge

i. They make the important objects and relations explicit.ii. They expose natural constraints i.e. one can express the way one object or

relation influences another.

iii. They bring objects and relations together.iv. They suppress irrelevant detail.v. They are transparent i.e. the meaning can be understood clearly.vi. They are complete i.e. consist of all that needs to be contained.vii. They are concise i.e. they communicate the information efficiently.viii. They are fast i.e. retrieval of information is fast.ix. They are computable i.e. they have been created based on a known procedure.

Properties of Good Knowledge Representation Systems

These characteristics can be summarised into the following four properties for knowledge

representation systems:

i.

Representational adequacy the ability to represent the required knowledge.

ii. Inferential efficiency the ability to direct the inferential mechanismsintothe most productive directionsby storing appropriate guides.

iii. Inferential adequacy the ability to manipulate the knowledge represented toproduce new knowledge corresponding to that inferred from the original.

iv. Acquisitional efficiency the ability to acquire knew knowledge using automaticmethodswhenever possible rather than reliance on human intervention.

Fundamental components of a knowledge representation system

i. Lexical component the part that determines which symbolsare allowed in arepresentations vocabulary.

ii. Structural component the part that describes constraintson how the symbolscan be arranged.


6/23

iii. Procedural component the part that specifies access proceduresthat enableone to create descriptions; modify them and answer questions using them.

iv. Semantic component this part establishes a way of associating meaningwiththe descriptions created from the procedural part.

Knowledge Representation Techniques

i. Logic representationa. Propositional logic

This is logic at the sentence level where we consider sentences or statements that are

either true or false. If a proposition is true then it has a truth value of true and if it is

false then its truth value is false

Example

Proposition: Saturday is the last day of the week.

Non-proposition: Walk out.

Simple sentences which are true or false are basic propositions.

Larger and more complex sentences can be constructed from the basic propositions by

combining them with connectives.

Therefore, the basic elements of propositional logic are propositions and connectives.

Examples of connectives are:

NOT AND OR IMPLY/IF-THEN IF-AND-ONLY-IF

Truth tables are used to map the relations of propositions when they are combined with

connectives.

Let and propositions:


7/23

i. Not

ii. And T T T

T F F

F T F

F F F

iii. Or T T T

T F T

F T T

F F F

iv. Imply T T T

T F F

F T T

F F T

v. If and only if T T T

T F F

F T FF F T

b. Predicate LogicPropositional logic is not powerful enough to represent all types of assertions.


8/23

To cope with the deficiencies of propositional logic, we introduce predicates and

quantifiers to form predicate logic.

A predicate is a verb phrase that describes properties of objects or the relationships

among objectse.g.

The sky is blue;

is bluepredicate

B(x) is blue(x)

Quantification is performed on formulas of predicate logic by using quantifiers on

variables.

There are two types of quantifiers:

the universal quantifier () the existential quantifier ()For example:

Algorithm: Converting to Clause Form

1. Eliminate using the fact that . Krzrr example.2. Reduce the scope of each to a single term using the fact that () , DeMorgans

laws (i.e. ( and ( ) ) and in standard correspondencies

between quantifiers

krzrr

Consider the following set of facts:

ii. Maina was a man.iii. Maina was Larilian.iv. All Larilians were Nyandaruans.v. Mugo was a chief.vi. All Nyandaruans were either loyal to Mugo or hated him.vii. Everyone is loyal to someone.viii. People only try to stone chiefs they are not loyal to.


9/23

ix. Maina tried to stone Mugo.x. All men are people.

The above was propositional logic:

i. ()ii. ()iii. () ()iv. ()v. () ( ) ( )vi. ( )vii. ()() ( ) ()viii. ( )ix. () ()

The above is a predicate set of functions. Conversion of these statements to clause form

or well-formed statements (wffs) would lead to:

i. ()ii. ()iii. ()()iv. ()v. () ( ) ( )vi. ( )vii.

() () ( ) ( )

viii. ( )ix. () ()

Proof of did Maina hate Mugo?

i. Express the question in predicate form: ( )ii. Negate the statement/predicate: ( )iii. Look for relevant statements and put them together e.g.


10/23

ii. Rules

Rules are commonly used to represent knowledge in an inference system. The rules are

usually in the form of production rules (if-then rules).

They are used to show relationships among variables and derive actions from input to

an inference engine.

() ()() ()

() () ()()

() ( )() ()

()()

() () ( ) () ()

() () () ()

() () ()

() ()

NULL


11/23

Each rule consists of an antecedent(the if part) and a consequent(the then part).

Interpreting an if-then rule involves distinct parts:

Evaluating the antecedent. Applying the result to the consequent.

In the case of a binary/2-valued logic, if the premise is true, then the conclusion is true.

In case of a multivalued logic, if the antecedent is true to some degree, then the

consequent is also true to that same degree.

Binary (0 or 1) multivalued (Range from 0 to 1 i.e. 0.5)

The antecedent of a rule can have multiple parts: e.g. if the sky is grey and the wind is

blowing then it will rain in such a case, all the parts of the antecedent are evaluated

simultaneously and resolved into a single number/part using logical operators.

iii. Natural LanguageNatural language is the human spoken language. It is the most expressive knowledge

representation formalism since everything that can be expressed symbolically can also be

expressed in natural language. Its reasoning potential is very complex but its hard to

model.

Problems with natural Language

i. It is often ambiguousii. There is little uniformity in the structure of sentencesiii. Syntax and semantics are not fully understoodiv. Database systems

Database systems are logical organizations of data in a form that makes meaning to the

user and facilitates easy retrieval. Database systems are well suited to efficiently

represent and process large amounts of data. However, only simple aspects of some

universe of discourse can be represented hence reasoning is very simple and limited.

v. Semantic NetworksSemantic networks are capable of representing individual objects, categories of objects

and relations among objects


12/23

Mary is a sister of John.

Mary and John are members of persons.

Persons have two legs.

Semantic nets make it easy to perform inheritance reasoning. They are simple andefficient as compared to logic

vi. Frames**to read

An AI data structures used to divide knowledge into sub-structures by representing

stereotyped situations.

Frames are connected together to form a complete idea.

Heuristic Search

Heuristic search uses problems specific knowledge beyond the definition of the problem

itself. It is also known as informed Search. It can thus arrive at solutions more efficiently

than uninformed/blind search strategies.

Subset of

Sister ofMember of

Subset ofHas Mother Legs

Mammals

Persons

Female persons Male persons

Mary John

Member of


13/23

Well Defined Problems and Solutions

A problem is defined formally by four components:

i. Initial/starting state is the starting point for solving any problem.ii. Successor function which is a description of possible actions available. The initial

state and the successor functions define the set of all states reachable from the initial

state.

iii. Goal test which determines whether a given state is a goal state e.g. in the game ofchess, the goal is to reach a state called check-mate where the opponents king is under

attack and cannot escape.

iv. Path cost which is a function that assigns a numeric cost to each path. A problemsolving agent choses a cost function that reflects its own performance measure

These components define a problem and can be put together into a single structure that isgiven as input to problem-solving algorithms.

A solutionto a problem is a path from the initial state to the goal state. The quality

of a solution is measured by the path cost whereby an optimal solution has the lowest path

cost among all solutions.

Exercise: formally formulatethe problems of the eight queens on a chess board

The output of a problem solving algorithm is either failure or a solution. Some

algorithms may get stuck in an infinite loop and never return an output.

The performance of an algorithm is evaluated using four measures:

Completeness is the algorithm guaranteed to find a solution when there is one?Optimality does the algorithm find the optimal solution?Time complexity how long does the algorithm take to find a solution?Space complexity how much memory is needed to perform the algorithm?

Uninformed/Blind Search Strategies

Blind search means that the search has no additional information about states beyond that

provided in the problem definition.

Breadth First Search (BFS)

This is a simple strategy in which the root node is expanded first then all the successors

of the root node are expanded next, then their successors and so on. All the nodes are


14/23

expanded at a given breadth in the search tree before any nodes at the next level are

expanded.

BFS can be implemented using a FIFO queue ensuring that the nodes that are visited first

will be expanded first.

Evaluation of BFS Algorithm

i. It is completeIf the goal node is at finite depth d, BFS will eventually find it after expanding all

shallower nodes.

ii. OptimalityThe shallowest goal node isnot necessarily the optimal onehence BFS algorithm is

optimal if the path cost is a non-decreasing function of the depth of the node e.g. when allthe actions/moves have the same cost.

iii. Time complexityConsider a state space where every state has b successors. The root of the search tree

generates b nodes at level one b2 at level 2, b3 at level 3. Each of these generates b more

nodes and so on. If the solution is at level d in the worst case, we would expand all but the

last nodes at level d. this would result in exponential complexity of generated nodes.

(

)

Time requirements are major constraints in BFS.

iv. Space complexityEvery node that is generated must remain in memory hence space complexity also grows

exponentially. BFS places a very high demand on memory.

Depth First Search (DFS)

The search proceeds to the deepest level of the search tree where the nodes have no

successors. As the nodes are expanded, they are dropped off and the search backs up to

the next shallowest node that still has unexplored successors.

DFS can be implemented using stacks or LIFO queues.

Comparison: both are complete, optimality is relative to the node to be searched, time

complexity is relative to space.


15/23

Heuristic Searches

A key component of a heuristic search is heuristic function denoted (). ()is the

estimated cost of the cheapest path from node n to a goal node. If n is the goal node,

then () .

Greedy Best First Search (GBFS)

GBFS tries to expand the node that is closest to the goal on the grounds that it is likely to

meet a solution quickly. It evaluates nodes using the heuristic function

() () () ()

it resembles DFS in the way it prefers to follow a single path all the way to the end but

backs up when it hits a dead end.

Just like DFS, it is neither optimal nor complete.

A* Search

It evaluates nodes by combining (), the cost to reach the node and () i.e. the cost to

get from the node to the goal

() () ()

Since () gives the path cost from the start node to node n, () is the estimated cost

for the cheapest solution through node n. therefore, in trying to find the cheapest

193380172

374329253

366

0


16/23

solution, we try the node with the lowest value of () (). It is both complete and

optimal.

From the above algorithm, route would be ABEH and () .

Learning

Forms of Learning

The field of machine learning distinguishes three forms of learning:

a. Supervised learningb. Unsupervised learningc. Reinforced learning

The type of feedback available is usually the most important factor in determining thenature of learning the agent takes.

1. Supervised LearningIt involves learning a function from example of its inputs and outputse.g. learning

multiplication tables.

0

To reach goal

node K, we go

through ABCIK

(cheapest route)


17/23

The correct output values are first provided after which a learning agent can get the

correct output from its perceived knowledge.

For fully observable environments, an agent can observe the effects of its actions and

hence can use supervised learning methods to learn to predict them.

2. Unsupervised LearningIt involves learning patterns in the input when no specific output values are supplied.

A purely unsupervised learning agent cannot learn what to do because it has no information

as to what constitutes a correct action or desirable state.

An example is conducting a research.

3. Reinforced LearningRather than being told what to do, the agent learns from reinforcement maybe a reward

or its absence (teaches behavioral skills e.g. potty training, promoting hard working

employees, etc.)

The design of a learning element is affected by three major concerns:

Learning function

inputs outputs

input

Learning function


18/23

i. Which componentsof the performance element are to be learned?ii. What feedback is availableto learn these components?iii. What representationis used for the components?

The components of a performance element in learning may include the following:

i. A direct mapping from conditions of the current state to action.ii. A means to make an inference of relevance properties of the world being

learned.

iii. Information about the way the world being learned responds and the results ofpossible actions.

iv. Information indicating the desirable states and actions.Learning by Decision Trees

A decision tree takes as input an object or a situation described by a set of attributes and

returns a decision which is a predicted output value for the given input. The input values

can be discrete or continuous and so are the outputs.

Learning a Discrete-Valued Function is called classificationwhile learning a Continuous

Function is called regression.

A decision tree reaches its decision by performing a sequence of tests.

Each internal node corresponds to a test of the value of one of the propertiesand the

branches are labeled with the possible values of the test.

Each leaf node specifies the value to be returned if that value is reached.

Example:

Suppose you model a simple problem of whether to wait for a table at a restaurant or not.

The following is a list of applicable attributes:

i. Alternate whether there is a suitable alternative restaurant nearby.ii. Bar whether the restaurant has a comfortable bar area to wait in.iii. Friday/Saturday true on Fridays and Saturdays.iv. Hungry whether we are hungry or not.v. Patrons how many people are in the restaurant (none/some/full).vi. Price the restaurants price range.vii. Raining whether its raining outside or not.


19/23

viii. Reservation whether we have made a reservation or not.ix. Type the kind of restaurant (e.g. Italian/French).x. WaitEstimate the wait estimated by the host (0-10mins, 10-30mins, 30-

60mins, ).

Example Alt Bar Fri/Sat Hungry Ptrns Price Rain Rsvn Type est Goal (will wait)

Yes No No Yes Some No Yes French 0-10 Yes Yes No No Yes Full No No Thai 30-60 No No Yes No No Some No No Burger 0-10 Yes Yes No Yes Yes Full Yes No Thai 10-30 Yes Yes No Yes No Full No Yes French >60 Yes No Yes No Yes Some Tes Yes Italian 0-10 No No Yes No No None Yes No Burger 0-10 Yes No No No Yes Some Yes Yes Thai 0-10 No No Yes Yes No Full Yes No Burger >60 Yes Yes Yes Yes Yes Full No Yes Italian 10-30 No No No No No None No No Thai 0-10 No Yes Yes Yes Yes Full No No Burger 30-60 yes

The restaurant scenario is an example of Boolean decision tree which consists of a vector

of input attributes, X and a single Boolean output, Y. a set of examples (x_1, y_1), ,

(x_12, y_12) are as shown above.

Decision trees are fully expressive in the class of proportional languages (dealing with one

variable) since any Boolean function can be written as a decision tree.

Positive examples are the ones in which the goal will wait is true e.g. x_1, x_3, x_4,

while the negative examples are the ones in which it is false.

The complete set of examples is called the training set. The idea behind decision tree

learning algorithm is to test the most important attribute first i.e. the attribute that

makes most difference to the classification to the training example. This hopes to get thecorrect classification with a small number of tests implying that all paths in the tree will

be short and the tree as a whole will be small e.g. starting with patrons then hungry as

opposed to starting with type.


20/23

Testing both:

hungry2, 4, 5, 9, 10

yes2, 4, 10

no5, 9

type?

french1, 5

italian6, 10

thai2, 4, 8, 11

bugger3, 12, 7, 9


21/23

This is a poor attribute because it leaves us with four outcomes, each with the same

number of positive and negative examples.

Patrons is a fairly important attribute; if the value is none or some then we are left with

example sets which we can answer definitively.

Considerations for the Recursive Algorithm include:

i. If there are some positive and negative examples, chose the best attribute tosplit them.

ii. If all remaining examples are positive or negative then we can answer yes orno/true or false.

iii. If there are no examples left, it means that no such example has been observedand will return a default value calculated from the majority at the nodes

parent.

iv. If there are no attributes left but both positive and negative examples thenthere is a problem.

It means that the examples have the same descriptions but different

classifications as a result of incorrect data or when attributes do not give

enough information to describe the situation fully.

patrons

none7, 11

negative (no)

some1, 3, 6, 8

positive (yes)

full2, 4, 5, 9, 10,

12

yes no


22/23

The decision Tree Learning Algorithm and learning algorithm.

Assessing the Performance of a Learning Algorithm

A learning algorithm is good if it produces hypothesis that do a good job of predicting the

classification of unseen examples.

A prediction is good if it turns out to be true hence we can assess the quality of a

hypothesis by checking its predictions against the correct classification once we know it.

This is done on a set of examples known as a test set. If we train all our available

examples, it means we should go out and collect more examples. We usually apply the

following methodology:

a. Collect a large set of examples.b. Divide it into two disjoint sets; training and test sets.c. Apply the learning algorithm to the training set generating a hypothesis, h.d. Measure the percentage of examples in the test set that are correctly classified by

h.

e. Repeat steps a-d for different sizes of training sets and randomly selected trainingsets of each size.

This results to a set of data that can be processed to give the average prediction quality

as a function of size of a training set.

When plotted, on a graph, it gives the learning curve for the algorithm on a particular

domain.


23/23

As the training set grows, prediction quality increases.

NB: the learning algorithm must not be allowed to see the test data before the learned

hypothesis is tested on them.

d.i.y.

noise

over fitting

research on these terms regarding problems when using decision trees for training and

how to minimize them.

In order to extend decision trees to a wider variety of problems, the following issues must

be addressed:

-

Missing data: in many domains, not all attribute values will be known for everyexample. The values might have not been recorded or too expensive to obtain.

- Multivalued attributes: when an attribute has many possible values, the informationgain measure gives an inappropriate indication on the attributes usefulness.

- Continuous and integer-valued input attributes: they have an infinite set ofpossible values that would generate infinitely many branches. Typically, we find the

split point that gives the highest information gain.

02010 30 40 50 60 70 80

1

Training set size

Proportion

collection

set

data

ics2308 artificial intelligence notes.pdf

Documents