ics2308 artificial intelligence notes.pdf

Upload: tim-njagi

Post on 14-Apr-2018

254 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    1/23

    ICS2308 ARTIFICIAL

    INTELLIGENCE

    Course Outline

    1. Introduction to artificial intelligence2. Knowledge representation3. Heuristic search4. Natural language processing5. Symbolic machine learning6. Connectionism and evolutionary computation

    References

    Introduction to Artificial Intelligence

    Intelligence is the capacity to learn; learning means acquisition and application of

    knowledge.

    Artificial intelligence is an area in computer science that focuses on creating machinesthat can engage on behaviour that humans consider intelligence.

    AI is a combination of computer science, physiology and philosophy.

    History of Artificial Intelligence

    The invention of computers in 1941 availed the technology to create machine intelligence.

    However, it was not until the 50s that the link between human intelligence and machines

    was really observed.

    Norbert Weiner researched on the feedback theory and established that an intelligent

    behaviour was as a result of feedback mechanismswhich could possibly be simulated by

    machines.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    2/23

    In 1955, a program was developed representing each problem as a tree model. The program

    would attempt to solve it by selecting the branch that would most likely result in the

    correct conclusion.

    In 1956, the term AI was coined at the Dartsmouth Conference and since then, research

    has continued into developing programs or applications that could efficiently solve

    problems and learn by themselves.

    Several applications have been developed e.g. the missile systems, voice and character

    recognition, engineering controllers etc.

    Motivation towards Artificial Intelligence

    Computers are generally well suited toperforming mechanical computationsusing fixed

    programmed rules.

    This allows them to perform simple monotonous tasksefficiently and reliably which human

    beings are ill suited to.

    For more complex problems, computers have trouble understanding specific situations

    and adapting to new situationsunlike humans.

    Artificial intelligence aims to improve machine behaviour in tackling such complex tasks.

    AI also is allowing us to understand our intelligent behaviour humans have an

    interesting approach to problem solving based on abstract thoughts, high level reasoningand pattern information (recognition).

    AI helps us to understand this approach by creating it and enhances us beyond our current

    capabilities.

    Applications of Artificial Intelligence

    1. Game playing machines that can play certain games e.g. Chess with great masterymainly through brute force computation which gives ability to look at hundreds of

    thousands of positions per second.2. Speech and character recognition the ability of computers to recognize voices

    and writings.

    3. Natural language processing providing the computer with understanding of thedomain of a natural language.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    3/23

    4. Expert systems systems that are able to make decisions and perform the work ofprofessionals (human experts) e.g. diagnostic systems in hospitals.

    5. Robotics automation of tasks performed by a mechanical device throughpredefined programs

    6. Information predictors e.g. in banks, insurance companies, market surveyswhereby intelligence tools are used to detect trends and predict e.g. customer

    behaviour.

    7. Computer vision and pattern recognition computer processing of images fromthe real world and recognition of features present in images.

    Challenges to the Achievement of Total Artificial Intelligence

    Test of Intelligence by Turing: A machine will be said to be intelligent if it can

    successfully deceive a human being that it is a human being as well.

    - Limitation of sensory organs (that assist humans to learn) in computers- Intelligence is a complex idea- Knowledge is vast- Knowledge is of various domains:

    o Affective feelingso Cognitive analysiso Psychomotor motiono Etc.

    Some domains are hard to store in a machine e.g. affective, psychomotor

    - Lack of a well understood model to represent reality and thus induce artificialintelligence

    - It is expensive to acquire tools develop and research on artificial intelligence

    Knowledge Representation

    Knowledge is the symbolic representation of some named universe of discourse.

    The universe of discourse may be actual activities or fictional ones in the future or insome belief.

    In AI systems, we may need to represent objects, events and performance or behaviour as

    kinds of knowledge.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    4/23

    Knowledge representation is an area of artificial intelligence that is concerned with how to

    use symbol system to represent a domain of discourse.

    Its goal is to organize knowledge in a manner that facilitates drawing of conclusions.

    Components of a Representation

    A representation has four components:

    i. A represented world the domain that the representations are mapped`ii. A representing world the domain that contains the representationiii. Representing rules the set of rules that map elements of the represented

    world to those of the representing world

    iv. The representation system the procedure for extracting information in aknowledge representation; its choice determines the ease or difficulty of

    finding the information.

    Uses of a Representation

    After representing the knowledge, we use it for:

    i. Inference/reasoning inferring facts from the existing data

    ii.

    Learning acquiring knowledge whereby mean data has to be classified prior tostorage for easy retrieval and has to interact with existing facts to avoid

    duplication.

    Types of Knowledge

    There are two main types of knowledge:

    i. Declarative/descriptive/propositional knowledge it is the factual informationstored in memory and is known to be static in nature.

    It is the part of knowledge that describes how things are.

    Its domain is defined by things or events or processes their attributes and the

    relations between them

    ii. Procedural/imperative/know-how knowledge it is the knowledge of how toperform a task or how to operate. It is mainly applied in problem solving.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    5/23

    Properties of Knowledge

    Good representations of Knowledge

    i. They make the important objects and relations explicit.ii. They expose natural constraints i.e. one can express the way one object or

    relation influences another.

    iii. They bring objects and relations together.iv. They suppress irrelevant detail.v. They are transparent i.e. the meaning can be understood clearly.vi. They are complete i.e. consist of all that needs to be contained.vii. They are concise i.e. they communicate the information efficiently.viii. They are fast i.e. retrieval of information is fast.ix. They are computable i.e. they have been created based on a known procedure.

    Properties of Good Knowledge Representation Systems

    These characteristics can be summarised into the following four properties for knowledge

    representation systems:

    i.

    Representational adequacy the ability to represent the required knowledge.

    ii. Inferential efficiency the ability to direct the inferential mechanismsintothe most productive directionsby storing appropriate guides.

    iii. Inferential adequacy the ability to manipulate the knowledge represented toproduce new knowledge corresponding to that inferred from the original.

    iv. Acquisitional efficiency the ability to acquire knew knowledge using automaticmethodswhenever possible rather than reliance on human intervention.

    Fundamental components of a knowledge representation system

    i. Lexical component the part that determines which symbolsare allowed in arepresentations vocabulary.

    ii. Structural component the part that describes constraintson how the symbolscan be arranged.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    6/23

    iii. Procedural component the part that specifies access proceduresthat enableone to create descriptions; modify them and answer questions using them.

    iv. Semantic component this part establishes a way of associating meaningwiththe descriptions created from the procedural part.

    Knowledge Representation Techniques

    i. Logic representationa. Propositional logic

    This is logic at the sentence level where we consider sentences or statements that are

    either true or false. If a proposition is true then it has a truth value of true and if it is

    false then its truth value is false

    Example

    Proposition: Saturday is the last day of the week.

    Non-proposition: Walk out.

    Simple sentences which are true or false are basic propositions.

    Larger and more complex sentences can be constructed from the basic propositions by

    combining them with connectives.

    Therefore, the basic elements of propositional logic are propositions and connectives.

    Examples of connectives are:

    NOT AND OR IMPLY/IF-THEN IF-AND-ONLY-IF

    Truth tables are used to map the relations of propositions when they are combined with

    connectives.

    Let and propositions:

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    7/23

    i. Not

    ii. And T T T

    T F F

    F T F

    F F F

    iii. Or T T T

    T F T

    F T T

    F F F

    iv. Imply T T T

    T F F

    F T T

    F F T

    v. If and only if T T T

    T F F

    F T FF F T

    b. Predicate LogicPropositional logic is not powerful enough to represent all types of assertions.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    8/23

    To cope with the deficiencies of propositional logic, we introduce predicates and

    quantifiers to form predicate logic.

    A predicate is a verb phrase that describes properties of objects or the relationships

    among objectse.g.

    The sky is blue;

    is bluepredicate

    B(x) is blue(x)

    Quantification is performed on formulas of predicate logic by using quantifiers on

    variables.

    There are two types of quantifiers:

    the universal quantifier () the existential quantifier ()For example:

    Algorithm: Converting to Clause Form

    1. Eliminate using the fact that . Krzrr example.2. Reduce the scope of each to a single term using the fact that () , DeMorgans

    laws (i.e. ( and ( ) ) and in standard correspondencies

    between quantifiers

    krzrr

    Consider the following set of facts:

    ii. Maina was a man.iii. Maina was Larilian.iv. All Larilians were Nyandaruans.v. Mugo was a chief.vi. All Nyandaruans were either loyal to Mugo or hated him.vii. Everyone is loyal to someone.viii. People only try to stone chiefs they are not loyal to.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    9/23

    ix. Maina tried to stone Mugo.x. All men are people.

    The above was propositional logic:

    i. ()ii. ()iii. () ()iv. ()v. () ( ) ( )vi. ( )vii. ()() ( ) ()viii. ( )ix. () ()

    The above is a predicate set of functions. Conversion of these statements to clause form

    or well-formed statements (wffs) would lead to:

    i. ()ii. ()iii. ()()iv. ()v. () ( ) ( )vi. ( )vii.

    () () ( ) ( )

    viii. ( )ix. () ()

    Proof of did Maina hate Mugo?

    i. Express the question in predicate form: ( )ii. Negate the statement/predicate: ( )iii. Look for relevant statements and put them together e.g.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    10/23

    ii. Rules

    Rules are commonly used to represent knowledge in an inference system. The rules are

    usually in the form of production rules (if-then rules).

    They are used to show relationships among variables and derive actions from input to

    an inference engine.

    () ()() ()

    () () ()()

    () ( )() ()

    ()()

    () () ( ) () ()

    () () () ()

    () () ()

    () ()

    NULL

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    11/23

    Each rule consists of an antecedent(the if part) and a consequent(the then part).

    Interpreting an if-then rule involves distinct parts:

    Evaluating the antecedent. Applying the result to the consequent.

    In the case of a binary/2-valued logic, if the premise is true, then the conclusion is true.

    In case of a multivalued logic, if the antecedent is true to some degree, then the

    consequent is also true to that same degree.

    Binary (0 or 1) multivalued (Range from 0 to 1 i.e. 0.5)

    The antecedent of a rule can have multiple parts: e.g. if the sky is grey and the wind is

    blowing then it will rain in such a case, all the parts of the antecedent are evaluated

    simultaneously and resolved into a single number/part using logical operators.

    iii. Natural LanguageNatural language is the human spoken language. It is the most expressive knowledge

    representation formalism since everything that can be expressed symbolically can also be

    expressed in natural language. Its reasoning potential is very complex but its hard to

    model.

    Problems with natural Language

    i. It is often ambiguousii. There is little uniformity in the structure of sentencesiii. Syntax and semantics are not fully understoodiv. Database systems

    Database systems are logical organizations of data in a form that makes meaning to the

    user and facilitates easy retrieval. Database systems are well suited to efficiently

    represent and process large amounts of data. However, only simple aspects of some

    universe of discourse can be represented hence reasoning is very simple and limited.

    v. Semantic NetworksSemantic networks are capable of representing individual objects, categories of objects

    and relations among objects

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    12/23

    Mary is a sister of John.

    Mary and John are members of persons.

    Persons have two legs.

    Semantic nets make it easy to perform inheritance reasoning. They are simple andefficient as compared to logic

    vi. Frames**to read

    An AI data structures used to divide knowledge into sub-structures by representing

    stereotyped situations.

    Frames are connected together to form a complete idea.

    Heuristic Search

    Heuristic search uses problems specific knowledge beyond the definition of the problem

    itself. It is also known as informed Search. It can thus arrive at solutions more efficiently

    than uninformed/blind search strategies.

    Subset of

    Sister ofMember of

    Subset ofHas Mother Legs

    Mammals

    Persons

    Female persons Male persons

    Mary John

    Member of

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    13/23

    Well Defined Problems and Solutions

    A problem is defined formally by four components:

    i. Initial/starting state is the starting point for solving any problem.ii. Successor function which is a description of possible actions available. The initial

    state and the successor functions define the set of all states reachable from the initial

    state.

    iii. Goal test which determines whether a given state is a goal state e.g. in the game ofchess, the goal is to reach a state called check-mate where the opponents king is under

    attack and cannot escape.

    iv. Path cost which is a function that assigns a numeric cost to each path. A problemsolving agent choses a cost function that reflects its own performance measure

    These components define a problem and can be put together into a single structure that isgiven as input to problem-solving algorithms.

    A solutionto a problem is a path from the initial state to the goal state. The quality

    of a solution is measured by the path cost whereby an optimal solution has the lowest path

    cost among all solutions.

    Exercise: formally formulatethe problems of the eight queens on a chess board

    The output of a problem solving algorithm is either failure or a solution. Some

    algorithms may get stuck in an infinite loop and never return an output.

    The performance of an algorithm is evaluated using four measures:

    Completeness is the algorithm guaranteed to find a solution when there is one?Optimality does the algorithm find the optimal solution?Time complexity how long does the algorithm take to find a solution?Space complexity how much memory is needed to perform the algorithm?

    Uninformed/Blind Search Strategies

    Blind search means that the search has no additional information about states beyond that

    provided in the problem definition.

    Breadth First Search (BFS)

    This is a simple strategy in which the root node is expanded first then all the successors

    of the root node are expanded next, then their successors and so on. All the nodes are

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    14/23

    expanded at a given breadth in the search tree before any nodes at the next level are

    expanded.

    BFS can be implemented using a FIFO queue ensuring that the nodes that are visited first

    will be expanded first.

    Evaluation of BFS Algorithm

    i. It is completeIf the goal node is at finite depth d, BFS will eventually find it after expanding all

    shallower nodes.

    ii. OptimalityThe shallowest goal node isnot necessarily the optimal onehence BFS algorithm is

    optimal if the path cost is a non-decreasing function of the depth of the node e.g. when allthe actions/moves have the same cost.

    iii. Time complexityConsider a state space where every state has b successors. The root of the search tree

    generates b nodes at level one b2 at level 2, b3 at level 3. Each of these generates b more

    nodes and so on. If the solution is at level d in the worst case, we would expand all but the

    last nodes at level d. this would result in exponential complexity of generated nodes.

    (

    )

    Time requirements are major constraints in BFS.

    iv. Space complexityEvery node that is generated must remain in memory hence space complexity also grows

    exponentially. BFS places a very high demand on memory.

    Depth First Search (DFS)

    The search proceeds to the deepest level of the search tree where the nodes have no

    successors. As the nodes are expanded, they are dropped off and the search backs up to

    the next shallowest node that still has unexplored successors.

    DFS can be implemented using stacks or LIFO queues.

    Comparison: both are complete, optimality is relative to the node to be searched, time

    complexity is relative to space.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    15/23

    Heuristic Searches

    A key component of a heuristic search is heuristic function denoted (). ()is the

    estimated cost of the cheapest path from node n to a goal node. If n is the goal node,

    then () .

    Greedy Best First Search (GBFS)

    GBFS tries to expand the node that is closest to the goal on the grounds that it is likely to

    meet a solution quickly. It evaluates nodes using the heuristic function

    () () () ()

    it resembles DFS in the way it prefers to follow a single path all the way to the end but

    backs up when it hits a dead end.

    Just like DFS, it is neither optimal nor complete.

    A* Search

    It evaluates nodes by combining (), the cost to reach the node and () i.e. the cost to

    get from the node to the goal

    () () ()

    Since () gives the path cost from the start node to node n, () is the estimated cost

    for the cheapest solution through node n. therefore, in trying to find the cheapest

    193380172

    374329253

    366

    0

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    16/23

    solution, we try the node with the lowest value of () (). It is both complete and

    optimal.

    From the above algorithm, route would be ABEH and () .

    Learning

    Forms of Learning

    The field of machine learning distinguishes three forms of learning:

    a. Supervised learningb. Unsupervised learningc. Reinforced learning

    The type of feedback available is usually the most important factor in determining thenature of learning the agent takes.

    1. Supervised LearningIt involves learning a function from example of its inputs and outputse.g. learning

    multiplication tables.

    0

    To reach goal

    node K, we go

    through ABCIK

    (cheapest route)

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    17/23

    The correct output values are first provided after which a learning agent can get the

    correct output from its perceived knowledge.

    For fully observable environments, an agent can observe the effects of its actions and

    hence can use supervised learning methods to learn to predict them.

    2. Unsupervised LearningIt involves learning patterns in the input when no specific output values are supplied.

    A purely unsupervised learning agent cannot learn what to do because it has no information

    as to what constitutes a correct action or desirable state.

    An example is conducting a research.

    3. Reinforced LearningRather than being told what to do, the agent learns from reinforcement maybe a reward

    or its absence (teaches behavioral skills e.g. potty training, promoting hard working

    employees, etc.)

    The design of a learning element is affected by three major concerns:

    Learning function

    inputs outputs

    input

    Learning function

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    18/23

    i. Which componentsof the performance element are to be learned?ii. What feedback is availableto learn these components?iii. What representationis used for the components?

    The components of a performance element in learning may include the following:

    i. A direct mapping from conditions of the current state to action.ii. A means to make an inference of relevance properties of the world being

    learned.

    iii. Information about the way the world being learned responds and the results ofpossible actions.

    iv. Information indicating the desirable states and actions.Learning by Decision Trees

    A decision tree takes as input an object or a situation described by a set of attributes and

    returns a decision which is a predicted output value for the given input. The input values

    can be discrete or continuous and so are the outputs.

    Learning a Discrete-Valued Function is called classificationwhile learning a Continuous

    Function is called regression.

    A decision tree reaches its decision by performing a sequence of tests.

    Each internal node corresponds to a test of the value of one of the propertiesand the

    branches are labeled with the possible values of the test.

    Each leaf node specifies the value to be returned if that value is reached.

    Example:

    Suppose you model a simple problem of whether to wait for a table at a restaurant or not.

    The following is a list of applicable attributes:

    i. Alternate whether there is a suitable alternative restaurant nearby.ii. Bar whether the restaurant has a comfortable bar area to wait in.iii. Friday/Saturday true on Fridays and Saturdays.iv. Hungry whether we are hungry or not.v. Patrons how many people are in the restaurant (none/some/full).vi. Price the restaurants price range.vii. Raining whether its raining outside or not.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    19/23

    viii. Reservation whether we have made a reservation or not.ix. Type the kind of restaurant (e.g. Italian/French).x. WaitEstimate the wait estimated by the host (0-10mins, 10-30mins, 30-

    60mins, ).

    Example Alt Bar Fri/Sat Hungry Ptrns Price Rain Rsvn Type est Goal (will wait)

    Yes No No Yes Some No Yes French 0-10 Yes Yes No No Yes Full No No Thai 30-60 No No Yes No No Some No No Burger 0-10 Yes Yes No Yes Yes Full Yes No Thai 10-30 Yes Yes No Yes No Full No Yes French >60 Yes No Yes No Yes Some Tes Yes Italian 0-10 No No Yes No No None Yes No Burger 0-10 Yes No No No Yes Some Yes Yes Thai 0-10 No No Yes Yes No Full Yes No Burger >60 Yes Yes Yes Yes Yes Full No Yes Italian 10-30 No No No No No None No No Thai 0-10 No Yes Yes Yes Yes Full No No Burger 30-60 yes

    The restaurant scenario is an example of Boolean decision tree which consists of a vector

    of input attributes, X and a single Boolean output, Y. a set of examples (x_1, y_1), ,

    (x_12, y_12) are as shown above.

    Decision trees are fully expressive in the class of proportional languages (dealing with one

    variable) since any Boolean function can be written as a decision tree.

    Positive examples are the ones in which the goal will wait is true e.g. x_1, x_3, x_4,

    while the negative examples are the ones in which it is false.

    The complete set of examples is called the training set. The idea behind decision tree

    learning algorithm is to test the most important attribute first i.e. the attribute that

    makes most difference to the classification to the training example. This hopes to get thecorrect classification with a small number of tests implying that all paths in the tree will

    be short and the tree as a whole will be small e.g. starting with patrons then hungry as

    opposed to starting with type.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    20/23

    Testing both:

    hungry2, 4, 5, 9, 10

    yes2, 4, 10

    no5, 9

    type?

    french1, 5

    italian6, 10

    thai2, 4, 8, 11

    bugger3, 12, 7, 9

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    21/23

    This is a poor attribute because it leaves us with four outcomes, each with the same

    number of positive and negative examples.

    Patrons is a fairly important attribute; if the value is none or some then we are left with

    example sets which we can answer definitively.

    Considerations for the Recursive Algorithm include:

    i. If there are some positive and negative examples, chose the best attribute tosplit them.

    ii. If all remaining examples are positive or negative then we can answer yes orno/true or false.

    iii. If there are no examples left, it means that no such example has been observedand will return a default value calculated from the majority at the nodes

    parent.

    iv. If there are no attributes left but both positive and negative examples thenthere is a problem.

    It means that the examples have the same descriptions but different

    classifications as a result of incorrect data or when attributes do not give

    enough information to describe the situation fully.

    patrons

    none7, 11

    negative (no)

    some1, 3, 6, 8

    positive (yes)

    full2, 4, 5, 9, 10,

    12

    yes no

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    22/23

    The decision Tree Learning Algorithm and learning algorithm.

    Assessing the Performance of a Learning Algorithm

    A learning algorithm is good if it produces hypothesis that do a good job of predicting the

    classification of unseen examples.

    A prediction is good if it turns out to be true hence we can assess the quality of a

    hypothesis by checking its predictions against the correct classification once we know it.

    This is done on a set of examples known as a test set. If we train all our available

    examples, it means we should go out and collect more examples. We usually apply the

    following methodology:

    a. Collect a large set of examples.b. Divide it into two disjoint sets; training and test sets.c. Apply the learning algorithm to the training set generating a hypothesis, h.d. Measure the percentage of examples in the test set that are correctly classified by

    h.

    e. Repeat steps a-d for different sizes of training sets and randomly selected trainingsets of each size.

    This results to a set of data that can be processed to give the average prediction quality

    as a function of size of a training set.

    When plotted, on a graph, it gives the learning curve for the algorithm on a particular

    domain.

  • 7/27/2019 ICS2308 Artificial Intelligence Notes.pdf

    23/23

    As the training set grows, prediction quality increases.

    NB: the learning algorithm must not be allowed to see the test data before the learned

    hypothesis is tested on them.

    d.i.y.

    noise

    over fitting

    research on these terms regarding problems when using decision trees for training and

    how to minimize them.

    In order to extend decision trees to a wider variety of problems, the following issues must

    be addressed:

    -

    Missing data: in many domains, not all attribute values will be known for everyexample. The values might have not been recorded or too expensive to obtain.

    - Multivalued attributes: when an attribute has many possible values, the informationgain measure gives an inappropriate indication on the attributes usefulness.

    - Continuous and integer-valued input attributes: they have an infinite set ofpossible values that would generate infinitely many branches. Typically, we find the

    split point that gives the highest information gain.

    02010 30 40 50 60 70 80

    1

    Training set size

    Proportion

    collection

    set

    data