ewm ss2013 – questions & summary¼hrung in das... · ewm ss2013 – questions & summary 1...

38
EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge engineering important? Why? A: Important for Symbolic AI Methods like Semantic Networks (Ontologies) or Rule Based Systems (logics). For these systems it is important because ???? Q: In which step(s) of the knowledge engineering process is the domain expert involved? A: Acquisition, Validation, Explanation and Justification Q: How do you choose the interview method? A: Depending on the type of knowledge the interview method is chosen. Q: Which interview method is to be preferred when you want to interview different domain experts one after the other? A: The structured interview is preferred because the questions are prepared beforehand. Therefore the preparation of every interview with the domain experts is minimized. It also makes it possible to evaluate the differences between the answers of the experts because the questions which everyone is asked are all the same. Q: How do you ensure that the captured knowledge is correct? A: You summarize the captured knowledge to the domain expert and ask him if your understanding of the domain is correct. If it is necessary the domain expert will correct you. 2 Semantic Networks & Ontologies Q: What is the difference between syntax and semantics? A: Syntax names the rules of a system (eg: Grammar), semantic describes the content of an expression. Q: Why are we concerned with modelling knowledge in semantic networks/ontologies? A: Humans are able to include the semantic meaning, but computers aren’t able to do that.

Upload: others

Post on 22-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

EWMSS2013–Questions&Summary

1 Knowledge Acquisition

Q: For which types of Knowledge-Based-Systems (KBS) is knowledge engineering important?

Why?

A: Important for Symbolic AI Methods like Semantic Networks (Ontologies) or Rule Based

Systems (logics). For these systems it is important because ????

Q: In which step(s) of the knowledge engineering process is the domain expert involved?

A: Acquisition, Validation, Explanation and Justification

Q: How do you choose the interview method?

A: Depending on the type of knowledge the interview method is chosen.

Q: Which interview method is to be preferred when you want to interview different domain

experts one after the other?

A: The structured interview is preferred because the questions are prepared beforehand.

Therefore the preparation of every interview with the domain experts is minimized. It also

makes it possible to evaluate the differences between the answers of the experts because

the questions which everyone is asked are all the same.

Q: How do you ensure that the captured knowledge is correct?

A: You summarize the captured knowledge to the domain expert and ask him if your

understanding of the domain is correct. If it is necessary the domain expert will correct you.

2 Semantic Networks & Ontologies

Q: What is the difference between syntax and semantics?

A: Syntax names the rules of a system (eg: Grammar), semantic describes the content of an

expression.

Q: Why are we concerned with modelling knowledge in semantic networks/ontologies?

A: Humans are able to include the semantic meaning, but computers aren’t able to do that.

Page 2: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

Q: From the programmer’s point of view: Why are ontologies are advantageous?

A: Ontologies are advantageous because they help the programmer to gain knowledge

about the domain and also being able to understand the relations of the concepts.

Q: What are the main components of an ontology?

A: Relations, Types and Attributes

Q: What are the differences between a semantic network and an ontology?

A: A Semantic Network allows us to relate concepts among each other. An ontology is a

concrete type of a semantic network. Ontologies can be described by concepts (concept-

term-mapping).

Q: What is understood under the term “semantic annotation”?

A: Semantic annotation means to add meta data to a title or special paragraph in any

document. This metadata is added through tags. It increases machine readability.

Q: What is understood under the term “semantic uplifting”?

A: Semantic uplifting means to improve a simple semantic network to a complex semantic

network. This is done by adding new knowledge in the already existing semantic network.

3 Semantic Web

Q: How can we make the web more intelligent?

A: Two methodologies:

• Top-Down:

Ontology Engineering (Knowledge Engineering)

• Bottom-Up

o Socially driven:

Semantic-Social-Web: Identification of Emergent Structures from

Folksonomies

o Data driven:

LINKED OPEN DATA: Publication of data together with underlying structures.

o

Q: How can we apply ontologies to the web?

A: Ontologies are applied to the web with the bottom-up technology. This means the usual

knowledge acquisition process has to be performed.

Page 3: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

Q: How can we semantically enrich the web?

A: To enrich the web with semantic information, everyone could tag his/her content in the

web which is public to everyone. This makes it easier for machines to understand the

content of a specific website (machine readability).

Q: Advantages and Limitations of the Semantic Web?

A: The Semantic Web should be able to overtake some part of the human thinking.

4 Artificial Neural Networks (ANN)

Q: Which elements does an ANN consist of?

A: Neurons and connections.

Q: Where is the knowledge within the ANN stored?

A: It is stored in the weights of the connections between the neurons and also by having the

right weights.

Q: What is the knowledge representation of an ANN?

A: The knowledge of an ANN is represented through the weights and the structure of the

neural network.

Q: How do you compute the output of a single layer perceptron?

A: The inputs are multiplied with the weights of the connections and then the results are

summed up. In the next step the output function is applied.

Q: How does learning within ANN take place?

A: Through adjusting the weights and nearly approaching the desired result. The ANNs can

be trained with training data. Retrieval of the iteration error which is occurring in every

learning step.

Q: What is the difference between the training set and a test set?

A: Both are examples of data. Through the training set the ANN "learns" certain behaviour.

After the learning process is finished you want to verify that the result with a test set, so that

you know, if the ANN behaves like it should.

Q: What is the computational limitation of a single layer perceptron?

A: A single layer perceptron can only model functions that are linearly separable.

(Representation Theorem) Therefore more complex operations like XOR can't be performed

Page 4: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

by a single layer perceptron because the separation wouldn't work with a linear function.

This only would be possible with a multilayer perceptron.

5 Rule Based Systems (RBS)

Q: What basic symbols of propositional logic do you know and what are they for?

A:

Q: What are constant and variable symbols and functional expressions?

A: constant: e.g. Marco; variable: e.g. Marco drinks beer. Functional: Likes(Marco, Beer)

Function that has two parameters.

Q: How do RBS generally work?

A: If-then-else; if-then-else; ... Lots of If-Then-Else ...

Q: What are the main components of RBS and what do they do?

A: Knowledge Base: database of facts; Set of Rules

Inference Engine: Carries out reasoning

Explanation Facilities / Rule Tracing – Explain, why specific decision was made.

User Interface: Communication between user and Expert System

Q: What are the advantages and disadvantages of a RBS?

A: Pros:

• Can explain decisions more or less like human experts

• Provides consistent answers for repetitive decisions

• Hold and maintain high level of information

Cons:

• Can’t give creative solutions

• Difficult to update if KBS is large

• Difficult to avoid conflicting rules in large KBS

Page 5: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

Q: How does forward chaining work?

A:

• Input set of data

• Fire rules

• Run inference engine, add rules to knowledge base

• Repeat till no more rules

• Present results (solution found/not found)

Q: How does backward chaining work?

A:

• Provide hypothesis / goal / question

• Look for rules that provide answer to question Loop.

• If needed, ask user questions

• Present results (solution found/not found)

Q: When should which chaining process be used?

A:

• Forward: analysis, interpretation

• Backward: diagnosis

6 Information Retrieval Systems (Statistics)

What is the challenge in Information Retrieval (IR)?

Which steps have to be performed before an IR model can be used for querying? Illustrate

them on a simple example.

How is a document represented within the Vector Space Model? Illustrate it on a simple

example.

What kind of variations can you remember?

How is the similarity between the query and the documents computed? Illustrate them on a

simple example.

Which role does the analysis of search results play within the IR process

Page 6: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

7 Summary of the EWM Script

7.1 Data, Information and Knowledge

There are different stages of ‘knowledge’ per se.

• Data are commonly assumed to be raw facts, recorded symbols. There is only little or

no amount of semantics.

• Information is data in context. It is data presented in a form that is meaningful to the

recipient.

• Knowledge is “The explicit functional associations between items of information

and/or data” (Debenham, 1988). Or, in other words, what someone has after

understanding information. They can interpret information according to the human

understanding.

Data: It is -10° C. (simple fact)

Information: It is cold. (embedded meaning)

Knowledge: It is cold, and if it is cold you

should put on a warm coat. (linked concepts)

7.2 Knowledge Engineering

7.2.1 Steps of Knowledge Engineering

It is limited to technical aspects. Typically, 5 steps are involved:

• Knowledge Acquisition:

This step involves obtaining knowledge from various sources including human

experts, books, videos and existing computer sources of data such as databases and

the Internet.

• Knowledge Validation:

Here, the knowledge is checked using test cases for adequate quality.

• Knowledge Representation:

In this step, a map of the knowledge is produced and then encoded into the

knowledge base.

• Inference

Inference means that new knowledge is created in the form of links between

information or knowledge. A knowledge based system can use the provided and

inferred knowledge to make decisions or provide advice to the user.

Page 7: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Explanation and Justification

Explanation and justification involves additional computer program design, primarily

to help the computer answer questions posed by the user and also to show how a

conclusion was reached using knowledge in the knowledge base.

Knowledge Engineering is about developing a knowledge based system. A knowledge

engineer tries to figure out which methods and technologies are best-suited for eliciting and

storing knowledge.

Knowledge Management takes a more business related stance. Knowledge manager decide

which kind of knowledge needs to be acquired, or which kind of knowledge needs to be

elicited. They try to find out which knowledge is needed for the company to make decisions

or enable actions.

7.3 Knowledge Based Systems (KBS)

KBS are computer programs that are designed to emulate the work of experts in specific

domains of knowledge.

Expert Systems:

Expert systems are one possible realization of a knowledge based system. They model higher

cognitive functions of the human brain. They are often used to mimic the human decision-

making process. The algorithms used in expert systems are often static, which means that it

provides a specific degree of certainty. However, this also means they cannot learn from

experience.

Neural Networks:

Neural Networks model the brain at the biological level. This enables them to mimic the

pattern recognition abilities of the human brain. Through this, and opposed to expert

systems, they e.g. can learn to read, recognize patterns from experience or can be used to

try to predict the future. Neural networks will be discussed in depth in a later chapter.

Case-based reasoning (CBS):

CBS imitates the human thought process of thinking in analogies. One sector in which CBS is

used is the judicial system. Here, the knowledge of the law is contained in written

documents. CBS, however, saves in a knowledge base how the law was actually applied in

real life cases.

Genetic Algorithms:

These kinds of algorithms are inspired by natural evolution processes. Similarly to natural

evolution, genetic algorithms try to find one good solution out of all possible solutions

through various methods such as inference, mutation, crossover, or selection. For example,

there are many possible ways of how to schedule a meeting, but through applying genetic

algorithms it may be possible to find one of the good ones.

Page 8: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

Intelligent Agents:

Intelligent agents are computer programs. Usually, an overall goal or task is specified, but

they are given a certain degree of freedom to make their own decisions in how to reach or

complete that goal or task. Intelligent agents usually work in the background without the

user noticing them. They are often used to retrieve information from the internet when too

much potentially interesting information is available. For example, if you search for “When

was Steve Jobs born?” or “How far is the moon from earth?” on google.com, their intelligent

agent software (“rich snippets”) will find the information on the internet and return it to the

user.

Data Mining -> Text Mining:

Data mining (or knowledge extraction) is used extensively in many different business areas.

Through data mining, it is possible to identify relationships in data that were previously

undiscovered. This means that we can obtain useful information or knowledge by thoroughly

analysing (mining) all the data that is already available to us. For example, through data

mining it was found that when men bought diapers on Thursdays and Saturdays, they also

bought beer. As a result, diapers and beer were moved next to each other and sold at full

price on these days—resulting in a revenue increase.

Intelligent Tutoring Systems:

Intelligent Tutoring Systems try to provide instant and personalized instructions or feedback

to a user (e.g. a pupil). They are used as a cheaper alternative to human teachers. As is the

case with human teachers, they also need to be able to adapt their teaching strategies in

real time to conform to the different needs of the users.

In real world, the systems mentioned above are often combined, e.g. data mining algorithms

can use semantic networks or generic algorithms to analyse data.

7.3.1 Elements of an Expert System

• Acquisition module (input by external expert and database)

• Empty KB (input specific knowledge)

• Inference engine (derive decisions

from knowledge and be able

to justify them)

• Explanatory interface (which

enlightens the user)

Page 9: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

7.3.2 Roles within a KBS project

• Knowledge Engineer

• Domain Expert

• Project Manager

• Project Champion

Knowledge Engineers are experts at constructing useful, simplistic, meaningful Knowledge

Based Systems (KBS) using the information provided by the domain experts.

A Domain Expert (or subject-matter expert) is an expert in a particular area of topic.

A Project Champion works with the project team as a user representative. He must be able

to convince users that the KBS is needed and be capable of presenting the business benefits

to management.

The Project Manager needs to try and match the expectations of all parties involved in the

project.

Tasks of a Knowledge Engineer

• Extracts knowledge from domain expert(s)

• Plans and manages knowledge acquisition process and chooses acquisition method

• Represents extracted knowledge in some knowledge representation format

• Involved in the development of the KBS

• Ensures quality with the help of domain expert(s)

Skills of a Knowledge Engineer

• Knowledge representation

• Fact finding

• Human skills

• Visual skills

• Analysis

• Creativity

• Managerial

7.4 Knowledge Acquisition

Interviews are the preferred method for the knowledge acquisition process.

7.4.1 Conducting Interviews

Several steps are involved when conducting an interview.

Page 10: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

Preparation:

It is not necessary for the knowledge engineer to become an expert in the field, but the

domain still must be researched by them beforehand to keep track during the knowledge

acquisition process.

Process model:

• Choose appropriate domain expert(s)

• Choose appropriate interview method

• Use appropriate stage management techniques

• Plan the interview

• Consider and use appropriate skills

• Document knowledge acquisition process

Choose appropriate domain expert(s)

Choose appropriate interview method

In order to be able to choose the best interview method(s) in respect to the domain,

knowledge engineers need to

• know a wide spectrum of interview methods,

• know their advantages and disadvantages,

• assess the characteristics of the domain and the purpose for which the KBS will be

developed and match those to the appropriate interview method,

• identify who will be using the KBS later on (target audience)

• consider the type of knowledge which predominantly needs to be acquired.

Types of Knowledge

• Declarative Knowledge (Know-What): This type of knowledge is about facts. For

example, ‘Chocolate is made out of cocoa beans’ is a fact.

• Procedural Knowledge (Know-How): This type describes how to perform a certain

action, e.g. how to make chocolate yourself.

• Meta Knowledge (Knowledge about knowledge): If we know what we know, this can

help us make decisions. For example, a husband knows that his wife is allergic to

flowers, so he gifts her chocolate instead.

The following type of knowledge can be derived from the three above-mentioned ones.

• Semantic Knowledge is a memory for the knowledge of the world, of facts, meanings

of words etc. For example, we know that ‘Lindt’ is a popular chocolate brand, even

though the word ‘Lindt’ does not hint to this fact. We just ‘know’ the semantic

meaning of the word. This is something that has to be learned.

Page 11: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

7.4.2 Interview Methods

• Unstructured interview

• Structured interview

• Event recall interview / Think aloud protocol

• Repertory grid interview

Unstructured Interview:

• Early stages of knowledge acquisition

• Free flowing dialog between KE and DE

• Require little planning, but bulk of work afterwards

• KE acts passively -> DE could mention relevant topics that the KE might not have

been aware of.

• Advantages:

o DE can bring up new areas of subject matter which haven’t been considered

before (supports scoping activity -> enlarging the # of domain-relevant

knowledge areas)

• Disadvantages:

o DE may not stay focused

o Unstructured interviews are time consuming

Structured Interview

• Focuses on the scope of the domain

• Unstructured interviews should have been conducted and analyse for best results

• DE can answer questions clearly and in a detailed fashion

• Three steps:

1) KE and DE agree on questions which will be asked

2) DE answers questions

3) IMPORTANT: KE validates his knowledge acquired by summarizing the interview

content and asking the DE for confirmation

• Advantages:

o More in depth and systematic knowledge exploration of the domain

• Disadvantages

o Too early use of structured interviews can lead to omission of knowledge

areas within the domain

o Not applicable for scoping

Event Recall Interview

• DE asked to go through one specific case study

• Extracts priorities from the DE

• Reveals the thought process of a DE (Procedural Knowledge)

Page 12: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Used to check completeness of acquired knowledge

• Knowledge must have been collected through other methods before the event recall

interview

• Advantages:

o For checking completeness of knowledge acquisition process

o When validity of collected knowledge is in question

• Disadvantages:

o Requires significant interpretative skills from KE and a very good memory

from the DE

Repertory Grid

• Used to record the view of the DE on a specific problem (strong focus on one specific

part of the domain!)

• Steps to create a Repertory Grid:

o Define the domain which the grid will be used for (e.g. workflow in gaming

company)

o State the elements (e.g. different job positions in a game company)

o Define the constructs (e.g. different moments in the production pipeline)

o Rank the elements (e.g. which positions are most active at which moments?)

o Analyse the grid, identify similarities and differences

• Advantages:

o In-depth analysis

o Very structured

o Can be used to represent the opinions of multiple Des

• Disadvantages:

o Finding constructs is not always an easy task to achieve. Repertory grids

cannot be applied to everything. Constructs and elements need to stand in a

systematic relation to each other in order for the repertory grid to work.

7.4.3 Documenting the Knowledge Acquisition Process

• Usually interviews are taped to be transcribed afterwards (1hr interview -> 10hr

transcription)

• Detailed documentation to be able to track down the source of an item of knowledge

(important to verify knowledge or follow-up questions)

7.4.4 Dealing with multiple DEs

• Common to have different approaches and opinions towards certain topics and

procedures for the different DEs.

Page 13: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

7.5 Semantic Networks and Ontologies

7.5.1 Ambiguity

7.5.2 Difference Syntax and Semantics

• Distinguish between structure of a sentence and its meaning

• Syntax -> Structure

• Semantic -> Meaning

7.5.3 Semantic Markup (Machine Readability)

A machine can’t understand a simple grocery list. By providing semantic markup the

machine readability is increased:

<grocery> <item>Milk</item><item>Bread</item><item>Eggs</item></grocery>

Feeding an interpreter, a machine is able to understand that this is a grocery list.

Advantages of semantically well-structured data:

• We can store meaning

• Makes it possible for machines to use/interpret/infer data/information/knowledge

• Computers can discover new knowledge (e.g. find new relations between data)

• Extracted knowledge can be used to create rules (see chapter Rule Systems)

• Search engines might perform better and show more relevant/exact results

• Generally builds the foundation of the Semantic Web (see chapter Semantic Web)

Page 14: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

7.5.4 Artificial Intelligence and Intelligence Amplification

Artificial Intelligence (AI) is a research area within Computer Science. It brings together

researchers from psychology, sociology, neuroscience, computer science, and other fields. AI

aims to endow computers with human abilities. This involves research into novel ways to

represent knowledge and novel algorithms which work on these representations.

Intelligence Amplification (IA) uses information technology to augment human intelligence

(e.g. Cyborgs).

7.5.5 Knowledge Representation

Semantic Networks

• Graphical representation of knowledge that shows a network of objects and their

semantic relationship.

• Objects are shown by nodes, the links between them describe their relationship

• Visual representation to help humans to quickly get an understanding of a specific

topic.

• No computational algorithms or methods are applied on Semantic NW, instead

Ontologies are used.

• Semantic Networks include:

Concepts, Instances, Relations, Typed Relations, Attributes, Inheritance and Possible

Inferences

• in general: a ‘is_a’ relation defines an inheritance relation

• Possible Inferences

Inferences create new typed relations by analyzing the already existing concepts,

instances and their typed relations. In the picture above, the dotted relation

can_be_sick_with that connects the instances Mary and Mumps is one example of a

possible inference. We assumed that this relation was, at first, not present in this

Semantic Network. After analyzing the given context, though, it could be inferred.

Page 15: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Advantages of Semantic Networks:

o Easy to understand

o Powerful and flexible

o Well suited for declarative knowledge

o Can be used as a communication tool between KE and DE

• Disadvantages:

o Too flexible: Too many ways to represent something

o Inference becomes a process of searching the entire net

o Combinatorial explosion

o Not so well suited for procedural knowledge

Ontologies

• Definition:

o „An ontology is a specification of a conceptualization.” (Gruber)

o „Ontologies provide the means for establishing […] semantic structure. An

ontology is a formal explicit description of concepts in a domain of discourse.

It specifies the properties of each concept by describing the various features

and attributes of the concept.” (Wirsing)

o „An ontology together with a set of individual instances of classes constitutes

a knowledge base. In reality, there is a fine line where the ontology ends and

the knowledge base begins.” (NF Roy)

• Formal in this context means that knowledge is saved according to a specific syntax

which can be understood by a computer

• Explicit means that the knowledge is already visible

• Implicit knowledge would be hidden and it must be inferred first

• Synonyms are words with the same meaning (e.g. house – building)

• Homonyms are words with same writing but different meanings (e.g. apache)

• Used for structuring and exchanging knowledge between software apps and services

• Represent NW of information with logical relations and contain Inference- and

Integrity-Rules to reason and validate the knowledge.

• Part of the knowledge representation in artificial intelligence systems and the

Semantic Web

• Two types of ontologies

o Lightweight ontologies consist of concepts and their relations between

concepts and their attributes

o Heavyweight ontologies expanding lightweight and add Axioms and

Restrictions so the intended meaning of individual statements within an

ontology will be clearer

Page 16: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• So there are ontologies with differing degrees of semantics:

o A taxonomy is a simple ontology with numerous is_a relations. (little

semantics)

o We may add typed relations which means the ontology will contain more

varied relations, not just is_a relations. This way the ontology will contain

more semantics.

o We may also include attributes, which adds even more semantics.

o We may add restrictions, axiom schemata and so on…

• Characteristics of an ontology:

o Controlled vocabulary

o Concepts

o Needs modelling by experts

o Formal structure

Domain of an Ontology

Ontologies must have a specified purpose. The domain / purpose of our ontology influences

the definition of super- and sub-concepts.

An Ontology together with a set of individual instances (e.g. Triples) of classes constitute a

Knowledge Base.

Example: Triple Store (RDF):

Ontology Layers

Individuals, Synonyms (multilingual), Concept Formation, Concept Hierarchy, Properties

(Relations), Relation Hierarchy, Axiom Schemata, General Axioms and Restrictions

Examples & Explanations:

• Georg is an individual

• George is a synonym for Georg.

• Georg is “member of” Student, which is a

concept formation.

• Student “is a” Person and Person “is a”

Creature (concept hierarchy).

• Professor “teaches” a Student is a property

(relation).

• A relation hierarchy would be “teaches”,

“lectures” and “gives online courses” between

Professor and Student.

Page 17: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Axiom Schemata:

Axioms are statements within an ontology which are always true. These statements are

used to describe knowledge which cannot be derived otherwise. For example, the

statement “there is no train-connection between America and Europe” is always true.

We differentiate between four different relational axiom properties describing the

relation(s) between two or more concepts:

o Symmetric property: The attribute relationship is valid in two directions e.g. A “is

sibling” of B would be symmetrical, because B is also sibling of A

o Inverse property e.g. “has Father” is an inverse property to “has Child”

o Transitive property e.g. “older than”. If A is older than B and B is older than C,

then A is older than C.

o Functional property: Can have one value at most e.g. “capital of country”. The

country A has only one capital B.

7.6 Semantic Web

• No explicit semantics in the web

• Content of web pages can only be fully interpreted by humans

• No recognition of synonyms or homonyms

• No conceptual search; based on words

• Relationship between information pieces are missing (interpretation is done by

humans)

• Computers do the presentation of data and humans do the linking and interpreting

• Example:

o Planning a trip over the web with the help of software agents

o Software agents searches autonomously for

� Relevant flights

� Relevant hotels

� Alternatives

by using different data sources

o Creates an optimal travel plan

o Software agent can explain its choices

o E.g. TripIt (http://www.tripit.com/)

7.6.1 Definition of the Semantic Web:

The Semantic Web is an extension of the current Web in which information is given well-

defined meaning, better enabling computers and people to work in cooperation.

7.6.2 Goals of the Semantic Web:

• Opening up the web of data to artificial intelligence processes (getting the web to do

a bit of thinking for us)

Page 18: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Encouraging companies, organizations and individuals to publish their data freely in

an open standard format. (Linked Open Data)

• Encouraging businesses to use data already available on the web.

7.6.3 Semantic Web Stack

• Illustrates the architecture of the Semantic Web.

• It is a list of technologies, which build up on each other, and which might enable the

Semantic Web

• Standards to guarantee the interoperability between the heterogeneous applications

URI: Uniform Resource Identifier;

Enables world-wide unique identifiers;

Similar to URLs

Unicode: Standard Character Encoding;

System-independent within Semantic Web

XML: Extensible Markup Language;

Structured Representation of data;

Defines a Syntax -> Validation;

Defines a Tree -> Parsing defines Syntax

RDF: Resource Description Framework;

Does not define a Syntax; based on XML;

Fundamental data model for facts and meta data; Defines “name directed graph”

Original intent: Statements about web resources -> Link everything which has a URI

Statements: Triple

Triple: Represents a statement between the things denoted by notes that it links;

Subject, predicate and object;

• Subject: RDF URI or blank node

• Predicate: RDF URI

• Object: RDF URI, a literal or a blank node

Triple Example:

Advantages of RDF:

• Enables the merging of data from different data models

• Resource referenced by URI

Page 19: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

Unrealized

Semantic Web

technologies

Standardized

Semantic Web

technologies

Hypertext

Web

• Separation of data and metadata

• Well-defined standard

• Many tools available: triplestores, parsers, editors, frameworks

RDFS: RDF Schema; enhances RDF for categorization; allows creation of Taxonomies;

Only one hierarchy of classes and properties; Example: Article, rdf:type, rdfs:Class)

SPARQL: Query Language for RDF;

Used to retrieve and manipulate data stored in RDF;

No logic; Example:

OWL1: Web Ontology Language; Uses RDF/XML Syntax and RDFS elements; Every OWL

ontology is an RDF graph;

OWL2: Full compatible OWL1; New features: Keys, Property chains, Richer datatypes, Data

ranges, Qualified cardinality restrictions, Asymmetric, reflexive and disjoint properties,

Enhanced annotation capabilities; New Syntax;

RIF: Rule Interchange Format; SWRL: Semantic Web Rule Language;

Support of rules e.g. IF ... THEN; Used for describing relations that cannot directly described

in OWL;

Unifying logic: Logical reasoning: infer new facts and check consistency;

Proof: Explain logical reasoning steps;

Trust: Authentication of sources and trustworthiness of derived facts;

Trust to derived statements will be supported by

-> verifying that the premises come from trusted source and

-> relying on formal logic during deriving new information.

Cryptography: Ensure and verify that semantic web statements are coming from trusted

source; Protect RDF data via encryption; Validate the source of facts by digitally signing RDF

data

User Interface: User interface for semantic web application

Page 20: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

7.6.4 Achieve the Semantic Web

• Top-Down

o Ontology Engineering (Knowledge Engineering)

• Bottom-Up

o Socially driven:

� Semantic-Social Web: Identification of Emergent

� Structures from Folksonomies

o Data driven:

� Linked Open Data: Publication of data together with underlying

structures

7.6.4.1 Linked Open Data

• Data should be freely available

• Everyone can use and republish them as they wish

• No restrictions coming from copyright, patents or other mechanisms of control.

• No specific format – API description available

7.6.4.2 Linked Data

• Describes a method of publishing structured data

• Can be easily interlinked

• It is built on standard web technologies (HTTP and URIs)

• Linked Data is published typically in RDF -> triples

• And is linked to other data sources e.g. DBPedia

7.6.4.3 Open Data + LinkedData = LinkedOpen Data

• Free available open data

• Available in RDF format

7.7 Artificial Neuronal Networks (ANN)

What makes a system intelligent?

• Knowledge Representation

• Knowledge Manipulation Mechanism

• Learning Mechanism

Human brain principle:

• Human brain holds massive amounts of data

• Billions of simple neurons are connected to each other

o Neuron: simple processing unit

o Connections: large variety of connection

o Weights: learning principle

Page 21: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Adapting brain’s principles

� Artificial Neural Network

Basic examples of ANNs: Image Recognition or Regression Analysis

Image Recognition:

• Analyses distinct features in picture, e.g. face recognition

• Network receives image as picture �distinguish in different classes

• Very simple example for classification

Classification – Definition:

Classification is used if unstructured data is collected and sorted into certain classes. The

classes and their number are known to the system.

Regression Analysis:

• Searches for best approximation between data points

• Simple approximation: 2 points given � line

• More points � minimize sum of deltas between data points and function

• Outcome: function with best approximation

• General term for approximation: regression

Regression – Definition:

Regression Analysis is a statistical technique in order to estimate the relationship among

data points. The desired outcome is a function which approximates the data points. We talk

here about linear regression, thus the function is of the form: f(x)=a+b*x

Biological Neural Network (BNN)

• Human brain = biological neural network (BNN)

• Simple neurons pass information to other neurons = Network

• Let’s build very simple neuron model

o camera system

Neural Network Principle:

Weighting of Inputs

But what if one input is more important? � Weighting

• Example: Thinking about your own bike via three different possibilities

o Read generally about bicycles (weakest impact)

o See colour of personal bicycle (more impact)

o See your bicycle (strongest impact)

• Neuron reacts if sum of inputs >= threshold

Page 22: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Assumption 1: neuron’s threshold is 1.0

• Assumption 2: inputs are continuous

About Weights:

Let’s raise one weight to 1,0 � one activated input would activate whole neuron

• changing weights would totally change the neuron’s behaviour

• Intelligence for neural networks is hidden in its weights and in its structure

7.7.1 Artificial Neuron

• x1 .. xn: Inputs, discrete x ∈ 0,1 or

continuous signals e.g. x ∈ [0,1]

• w1.. wn: Weights, change impact of

certain input

• Sigma ∑: Summing up inputs

• O=f(x): Output

Activation Function:

• Linear activation function , used for regression

• step function (or sigmoid function), used for classification

o sum of weighted inputs must be above or equal a certain threshold

Calculation power: Boolean functions

Example AND: --------->>>

Single Layer Perception:

Perceptrons can only model functions that are

linearly separable (Representation Theorem).

7.7.2 Learning in Neural Networks

• ANNs solve dynamical problems

• can be trained with training data

Page 23: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

o Image Recognition: some pictures / correct names

o Regression Analysis: relevant points

• weights change through learning

• after learning finished �ANN applied on test data

• if computed output can be compared to correct output � supervised learning

7.7.3 Learning in Single Layer Perceptrons

• used algorithm: Perceptron Learning Algorithm

• correlation input/output easy

• responsible weight can be found easily

7.7.4 Gradient Descent

• basic stochastic method of adjusting a function to

its minimum

• our goal: minimize difference between actual

output function and correct results

• introduce error function, find local

minimum (could be global, we don’t know)

• Example on bowl function, 3D, dynamical learning

rate, after several steps minimum is reached:

• Calculation:

o s … amount of training examples

presented to the network

o y … output with current weights

o d … desired (correct) output

o E = iteration error which the perceptron exhibits over ALL training examples

while using one specific set of weights �

• Updating the weights:

o � = step size = learn rate could be fixed or dynamical

o if is too big, then problem of oscillation in the error space

o if is too small, then problem of slow learning and possibility to get stuck in

“local minima”

7.7.5 Multi Layer Perceptrons

• increase complexity, introduce hidden layer

• raise computation power to maximum

• responsible weight is harder to find

• weights can be indirectly responsible

• weights in two layers

Page 24: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• build up XOR with Multi Layer Perceptron

o numbers represent threshold

o step activation function

o checked via table

7.7.5.1 Backpropagation

• learning much harder for Multi Layer

Perceptrons

• need for a new learning algorithm � Backpropagation

• iterates until weights converge (difference between changed weights becomes

insignificant)

• between every iteration calculate delta / compute new weight vector

• iteration steps:

o Forward Pass, calculate output vector

o Error Computation, calculate difference between desired and resolved result

o Compute delta between hidden and output neurons

o Compute delta between input and hidden neuron

o Update weights in network

o start over if necessary

7.7.6 Other Applications of ANN

• System identification and control (vehicle control, process control

• Game-playing and decision making (backgammon, chess, racing)

• Pattern recognition (radar systems, face identification, object recognition)

• Sequence recognition (gesture, speech, handwritten text recognition)

• Medical diagnosis

• Financial applications,

• Data mining visualization

• E-mail spam filtering

Page 25: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

7.8 Rule Based Systems

• One way to represent an Expert System

• Symbols are used to depict entities and their values and show relationships between

multiple entities

• Symbols:

• Example statements:

o A = David eats a sandwich.

o B = David drinks tea.

o A ∧ B… David eats a sandwich and David drinks tea.

o A → B… If David eats a sandwich then David drinks tea.

• If there is a relationship between different statements, new data may be inferred

from them.

• Interference rules apply to propositional logic, as shown in the example below:

∀A (Human(A) → Arms(A, 2))

Human(David)

One possible inferred statement would be that David is human and therefore has two

arms, which can be expressed in propositional logic as Arms(David, 2).

7.8.1 Introduction to Rule-Based Systems

• Operate on an IF-THEN level:

Form: IF <condition> THEN <outcome>

• Analyse information regarding a specific domain and apply reasoning algorithms to

draw conclusions

• These systems make it possible to solve complicated problems without a DE

7.8.2 Components of Rule-Based Systems

• Knowledge Base:

Contains domain-specific knowledge to solve problems.

• Database of facts:

Represents what we know about the problem that we are working at. These facts are

used for the IF [condition] parts of the rules.

• Set of rules:

The rules represent relationships between the facts. Each rule has the IF [condition]

THEN [action] structure.

7.8.3 Inference Engine

• Decides which rules to apply and which order

Page 26: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Asks user to enter new facts (backward chaining) if necessary and determines when

an adequate solution has been found; It provides the reasoning that the expert

system uses to obtain a solution. It links the rules in the Knowledge Base with the

facts in the database.

7.8.4 Explanation Facilities/Rule Tracing

• Explanation facilities trace the rules that were fired while working on a solution and

tell the user which facts and rules were used to find it

• Tells the user how the decisions were made and explain why a particular decision

was made

• They try to offer explanations similarly to how a human DE would.

• Distinguish between:

o WHY traces

o HOW traces

7.8.5 Advantages and Disadvantages of RBS

• Advantages:

o Represent knowledge in near-linguistic manner that is close to how experts

explain their reasoning.

o Provide consistent answers for repetitive decisions or processes.

o Can acquire and maintain a high volume of information.

o Have a centralized decision-making process.

o Can reduce the time needed to solve problems.

o Reduce the amount of human errors.

• Disadvantages:

o Does not have human common sense (which is sometimes required during

the decision-making process).

o Cannot offer creative solutions (unlike a human domain expert).

o Not very flexible/adaptable. Difficult to update if the Knowledge Base is large.

o Requires a lot of collaboration between a domain expert and a KE.

o Difficult to avoid conflicting rules in large knowledge bases.

o Fails very quickly outside of the domain (rigid behaviour).

7.8.6 Reasoning Approaches

Two different inference approaches:

• Forward Chaining

• Backward Chaining

7.8.6.1 Forward Chaining

• Data-driven top-down approach

• Begins with a set of data

Page 27: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Works towards a goal of inferring all possible information

• Changes data live (data is manipulated while program is running)

• Can be time consuming when a large set of rules is involved (One rule at a time and

all rules are checked)

• Since each rule can only fire once, forward chaining stops when there are no more

rules to fire

• Steps:

o Input a set of data

o Fire forward-chaining rules

o Run inference engine on new data and add resulting data to a database

o Repeat steps 2 and 3 until there are no more new rules to be found

o Present the solution or state that no solution was found

• Example:

Let’s assume we input the following three rules into our system:

1. If someone is a 2nd

semester student, then he/she takes the course “Introduction to

Knowledge Management.”

2. If someone is a 2nd

semester student, then he/she studies at Technical University of

Graz.

3. If someone is taking the course “Introduction to Knowledge Management,” then

he/she can explain the concept “forward chaining.”

If we manually add a new rule

4. Michael is a 2nd

semester student.

Every rule will be inspected and our rule system will try to infer new information. For

example, if we add rule 4, the system will first infer that

5. Michael is taking the course “Introduction to Knowledge Management.”

6. Michael studies at Technical University of Graz.

After completing the first round and inferring two new rules, the system will go through all

rules again to look for new information. This time, based on rule 3 and 5, more information

can be found:

7. Michael can explain the concept “forward chaining.”

Since new information was obtained, the system will go through all rules again looking for

new information to infer. However, this time, no new information will be found and forward

chaining will stop.

7.8.6.2 Backward Chaining

• Originates at bottom and is termed “goal-driven”

Page 28: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Starts with a conclusion (or hypothesis) to determine rules

• Works backwards from given goal to the top, which consists of a set of rules

• Only works upon request: if a hypothesis to check is given, it will try to prove it

• Step:

o Provide a hypothesis / goal / question.

o Look for rules that provide an answer to your hypothesis.

o If need be, to fulfil the task the system asks the user questions.

o A result is obtained. The question may or may not be answered.

Example:

If we assume the same set of three rules as for forward chaining above and add the fourth

rule,

4. Michael is a 2nd

semester student.

Nothing will happen immediately (assuming the backward chaining approach). Unlike

forward chaining, backward chaining will not infer the 5th

, 6th

and 7th

rules. The system will

only take into account the four rules we provided. The system does not have a goal yet, and

we can add another rule for the inference engine to use once we activate it. When we ask

• Is there anyone who can explain the concept “forward chaining”?

The backward chaining process begins looking for an answer by looking for facts that can

directly provide the answer or, in their absence, for rules that offer a positive outcome if

they are “true.” This process is repeated until the answer is found or all layers moved from

the bottom to the top and no answer was found. In our case, we don’t have a fact that

answers the posed question directly. We do, however, have a rule that provides a “true“

outcome for the question: if rule 3 is set to “true“ ( i.e., if we find someone taking the

course “Introduction to Knowledge Management”), we can answer the question. For rule 3

to be “true,“ rule 1 must be “true.“ Moreover, rule 4 must satisfy the condition of rule 1.

The answer to the question can be obtained by working from the bottom up to the top:

7.8.6.3 Comparing Forward and Backward Chaining

• Forward Chaining:

o Analysis

o Interpretation

• Backward Chaining

o Diagnosis

o Less efficient than Forward Chaining when many hypothesis are involved

Page 29: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Reasons for Forward Chaining:

o If we want to obtain all possible outputs of a given data set

o One set of data can produce many different outcomes

o Situations in which it is necessary to quickly inform the user of new

conclusions (e.g. , turn off machines in a factory if a serious error occurred)

o Good for answering “What is the situation?” types of questions, e.g. ,“What

kind of animal is this?”

• Reasons for Backward Chaining:

o Check whether one or more statements or hypotheses are true

o Situations in which a wide variety of questions is asked but few specific ones

suffice

o f an interactive dialogue with the user is desired

o If it is too expensive to gather the data but the execution of the rules depends

on it. In this case, the user can input the data obtained via observations.

• Summarized:

Forward Chaining: Analysis and Interpretation;

Backward Chaining: Diagnosis

• Both systems are often combined to achieve best results

• Examples:

1. A forward chaining system may be useful for production lines. Sensors can be

deployed at various stages of a production pipeline to monitor mechanical problems.

The obtained data will be analysed by a forward chaining system, which will calculate

all the possible outputs of the given data set (multiple faults can happen

simultaneously). If we were to use backward chaining in this case, we would need to

“guess“ what the faults may be and have the system check our hypothesis. If the

faults are not obvious, we may never bring them up and, as such, never get to the

core of the problem through backward chaining.

2. A client wants to know if he/she is entitled to a tax concession. This is essentially a

“ yes or no“’ question, which backward chaining can answer efficiently. Only

questions that are related to finding the solution are asked. Although forward

chaining could also be applied to this situation, this process would be much less

efficient. All data, including unrelated data that would not contribute to the answer,

would have to be collected and fed into the system beforehand. After that, the

system would calculate all possible outcomes and present its conclusion.

7.9 Information Retrieval (Statistics)

7.9.1 Data Retrieval

• Deals with structured data:

o Data well organized in tables � Can be interpreted by computers

o Each row represents an entry for one object � Inputs must respect the

constraints (type, range, ...)

Page 30: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

o Column describes an attribute of an entry

• Searching within a database (e.g. MySQL)

• Strict rules for queries (SQL: Select * From ...)

• Data is typically “clean”

• Computers can interpret data

7.9.2 Information Retrieval

• Information not structured, just a collection of documents

• No strict rules for the query

• Example: Web search engines retrieve documents, news, articles, images, ...

• Information Retrieval deals with unstructured data

o Data is organized in „bags“ (documents, websites, …)

o Data is represented in text, pixels, …

o The range of values is not predefined

o The data is often not „clean“

o Computers cannot (easily) interpret the data

• Definitions:

o „Information Retrieval (IR) is a field concerned with the structure, analysis,

organization, storage, searching, and retrieval of information.“ [Salton 1968]

o Finding material (usually documents) of an unstructured nature (usually text)

that satisfies an information need, from within large collections (usually

stored on computers). [wikipedia]

• Imprecision:

User can not precisely specify what she is looking for

o Imprecise query

o Iterative query formulation

• Uncertainty

System has uncertain (incomplete) knowledge of the content of the managed

objects

o Uncertain representation � wrong answers

o Incomplete representation � missing answers

• Differences to Data Retrieval:

7.9.2.1 Searching in unstructured data

• Two Approaches (mixed approaches possible)

• Find representation of the data which allows

for direct comparison of two data objects

• Add structured data (Metadata) to the data

objects which then can be treated in databases

(�Knowledge Objects)

Page 31: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Approach(2) Knowledge Objects

Create Knowledge Objects by adding structured data (Metadata) to the data

objects which then can be treated in databases

o Content ���� unstructured data

o Metadata ���� describe content, structured

o Semantic Structures ���� describe relationships between metadata

o Example Word Document:

� Content: unstructured data

� Metadata: author, keywords, date of creation

� Semantic Structures: author writes book; book contains chapters

o Example Image:

� Content: pixels

� Metadata: photographer, keywords, date of creation

� Semantic Structures: photographer takes picture; picture contains

visible objects

• Approach(1) Information Spaces

Create Information Objects by representing data in a way which allows for direct

comparison of two information objects

o Text: represent documents as term vectors

o Images: represent images as colour schemas

o Audio: represent audio tracks as amplitude streams

7.9.3 Google Search Engine

• Is based on Vector Space Model

• Content in Internet is crawled and preprocessed

• Preprocessed content is indexed

• User enters query

• Documents containing the words of the query are selected

• Relevance of documents in relation to query is calculated

• Documents are displayed in the order of relevance

7.9.3.1 Preprocessing of Content

Prepares the document for indexing

• Strip unwanted characters (HTML tags, punctuations...)

• Break into word sequence

• Remove common words/stopwords (to, too, the, is, it...)

• Process the words and create terms (e.g. effects → effect)

�Document is now a set (bag) of words

7.9.3.2 Indexing of Preprocessed Content

Incidence matrix of Shakespears‘ books

Page 32: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• „1“ if word contained, else „0“

• Term vector: column of specific

document within matrix

�The document is represented

by its term vector

• Document vector: row of specific term within matrix

• Example Query: Caesar & Brutus

o Vector1: “Caesar”: 110111

o Vector2: “Brutus”: 110100

o V1 & V2: 110111 & 110100 = 110100

�First, second and fourth document contain all query words

7.9.3.3 Query

• User types in search query

• Query is preprocessed the same way as the content

• Documents which contain all the words in the query are selected

• How can this easily be done?

7.9.3.4 Relevance Ranking

• Relevance of documents in relation to query is calculated

• Google uses at least 200 relevance metrics in addition to pagerank

• Pagerank identifies the „most influential“ websites and ranks them higher than other

websites

• Documents are then displayed in the order of relevance

[(Google Explained: https://www.youtube.com/watch?v=KyCYyoGusqs)]

7.9.4 Vector Space Model

Text document: “The house has a nice garden.”

How to create an information object out of this text document:

• Step 1: Preprocessing

o remove all very common words (the, a, …)

o reduce to lexical base (stemming) (houses � house, has � have, …)

o remove punctuation, ...

• Step 1: Result

o “house have nice garden”

• Step 2: Create Term Vector for a Document

o Boolean

o Occurrences

o Normalized (e.g. occurrence / occurrence in overall information space)

• Step 2: Result

o

Page 33: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• Example Information Space is spanned by 4 vectors (4 dimensional information

space)

• The original example document

„The house has a nice garden.“ is

represented by

• Example: How would you represent the text document „I like my nice garden house.“

within the 4 dimensional information space?

o Answer -->>

• Example: How would you post a query within

the 4 dimensional information space?

o Query: “Is your house nice?”

o Answer -->>

o Query result: d1, d3

o Create term vector of query (q)

o Compute Similarity between q and

all term vectors of the documents (d1..dn)

within the information space

• Similarity Function

o Cos (q, di)

o Relates to angle between the two vectors in high dimensional space

o Cos (q, di) = 0 � query and document vectors are orthogonal (no match, i.e.

the query term does not exist in the document being considered)

o Cos (q, di) = 1 � there is a document vector which is identical with the query

vector (perfect match)

7.9.4.1 Vector Space Model Advanced

Term Frequency

• Not all terms have the same relevance

• Simple approach: Count the number of occurences of the term in the document

o Assign this value to the term

o Set of weights of terms can be seen as digest of the document (bag of words)

• Also called term frequency (tft,d)

• If a word is mentioned very often (e.g. word car, in collection about cars), it is

logically less relevant

o �we need a further approach

Page 34: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

Collection and Document Frequency

• Collection Frequency: divide the number of occurences in the document with overall

occurrence of the term in the collection

o Value is now normalised (in range [0, 1])

o Denoted as cf

• Document Frequency: number of documents containing a certain term

o Makes more sense than cf (document-level statistics)

• Documents are represented as vectors of terms, and their tf -idf (termfrequency –

inverse document frequency) weight

o If the term not in document, its weight is 0

• Number of dimensions corresponds to number of terms in the collection

• Similarity between two documents can be calculated by calculating the cosine of

angle between them

o If cosine = 1, angle is 0 � documents identical

• Query is also a vector

o Retrieved documents and their relevance calculated by the cosine function

o Relevance normalized and in range [0, 1]

7.9.4.2 Pros and Cons of the Vector Space Model

• Similarity of documents can be calculated

o But if they use different terminology, they will be not considered similar

• Documents can be ranked depending of the query

• Allows partial matching

o Not all words in query must be contained in the document

7.9.5 Information Retrieval Process

A retrieval model consists of

• Set D of representations of documents

• Set Q of representations of user queries

• Ranking function R which associates a real number (the ranking) to each

query/document pair.

• According to this number the documents are then sorted.

• R: Q x D ���� IR

IR Process

• IR process deals with the processing of the query and the iterative feedback of the

user

o From the user input to the retrieved results

• Key part: Query

o Often ambiguous, e.g. word „apache“ (server, helicopter, ancient tribe...)

Page 35: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

o Iteratively reformulated for the best results

Query Types (rough classification

• Content-related queries (e.g. keywords)

• Query by example

• Context-extended queries

User Feedback

• Retrieval is an iterative process

• Users rate the current search results

• Refinement of the need of information

o Approximation of the „optimal“ search

o Increasing of precision and recall

o Adaption of the similarity measurement i.e. increasing the weight of

keywords in the VSM

• Relevance Feedback: the user defines, which results are relevant and which are not

relevant.

o Pros: More detailed specification of the query is necessary

o Cons: User interaction is necessary

• Pseudo-Relevance Feedback: the most important results, will be automatically

defined as relevant and used for the improvement of the query.

o Pro: No use interaction is necessary

o Con: May lead to a huge amount of results

7.9.6 Evaluating IR Systems

• Not all IR systems are equally efficient

• Some sort of evaluating of performance needed (How?)

• But what to evaluate? (What?)

• What to evaluate?!

o Quality of the documents in the collection

� Does it contain relevant information?

o Time needed from query to result set

Page 36: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

o Presentation of output

o Effort the user needs to put in

o Recall of the system

o Precision of the system

• Standard measurements used in IR and pattern recognition:

• Precision

o Fraction of relevant documents which are retrieved in relation to all

documents which are retrieved

• Recall

o Fraction of relevant documents which are retrieved in relation to all the

relevant documents in the collection

• E.g. Precision

o How many of the documents found are relevant?

• E.g. Recall

o How many of the relevant documents have been found?

• A trade-off can be made between precision and recall

o E.g. adjust the system to retrieve a lot of documents (higher precision), where

not all will be relevant (lower recall)

o E.g. Retrieve a small amount of documents, but with higher relevance (lower

precision, higher recall)

7.10 Social Web and Web 2.0

7.10.1 Definition of Web 2.0

„Web 2.0 is the business revolution in the computer industry caused by the move to the

internet as a platform, and an attempt to understand the rules for success on that new

platform.” Tim O'Reilly: 'What is Web 2.0?'

An enhancement of Web 1.0, Web 2.0 has more dynamic sites and uses the wisdom of the

crowd by focusing on community and user actions.

• Web 2.0: The internet as a platform

o Web-based applications

o No software installation necessary

o Independent from OS, time, place

• The Web in our hands

o Content is created by the collective intelligence (wisdom of the crowd)

• The Web is dynamic

Page 37: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

• The Web is simple (to extend)

o Lightweight programming models

� Data and services for implementation easily accessible

� HTTP, XML via HTTP or with the help of a web-service

o There is no software release cycle anymore

� Constant beta version

� Open source

� Trusting users as co-developers

• Typical examples for Web 2.0

o Blogs (Technorati, Engageded)

o Microblogs (Twitter; max 140 chars, Nav by hashtags)

o Wikis (Wikipedia, Wikispaces, collaborative editing)

o Data sharing platforms (Google Drive, Dropbox)

7.10.1.1 Social Tagging Systems / Folksonomy

• Important content of Web 2.0: (social) tagging systems

• What is tagging:

o A system that provides the user the possibility to apply tags to resources

• What are tags:

o Lightweight keywords (free form vocabulary)

o Generated by users

o For users

• Why Web System designers like tags:

o Tags add additional meta data to resources for which typically just sparse

meta data information exists (such as pictures, movies, etc.)

o Trough tags system designers are able to provide the user with simple

navigational tools that improve the systems information retrieval properties

• Why Web users like tags:

o Trough tags users are able to categorize or describe resources

� Can find information faster (through personal tags)

� Can find related content faster (through related tags)

• Mentioned Term in relation with tags, tagging systems or tags: The word folksonomy

• What is a folksonomy:

o Folksonomy is a from ‚Folks‘ generated ‚Taxonomy‘

o Folksonomy is the result of personal free tagging of information and objects

(anything with a URL) for one's own retrieval […] in a social environment

(usually shared and open to others).

• Pros of Folksonomies (compared to taxonomies/onthologies):

o No controlled vocabulary (better search)

o Emerging: Created by the wisdom of the crowd (not manually)

• Cons:

Page 38: EWM SS2013 – Questions & Summary¼hrung in das... · EWM SS2013 – Questions & Summary 1 Knowledge Acquisition Q: For which types of Knowledge-Based-Systems (KBS) is knowledge

o Synonyms: Same tags can be used for different concepts. (e.g. apple vs Apple

Inc. vs Big Apple)

o Spelling Erros...

7.10.1.2 Social Software

• Wikis (Wikipedia is NOT coined with collaborative tagging!)

• Weblogs

• Podcasts

• Social networking platforms

• Instant messaging (NOT part of Web 2.0!)

7.10.1.3 Social Networks / Privacy / Trust

• What are Social Networks:

o Social networks is defined as all those networks, which arise out of social

interaction between two or more users

o This means that a social network not only emerges from social network sites

such as for instance Facebook

o But also networks which emerge from other communications channels such

as for instance email, telephone, etc.

• What is social network analysis

o The (scientific) discipline dealing with the analysis and the interpretation of

social network data and to gain insights of how people behave or how people

can be trusted.

• Why is social network analysis important:

o world is becoming more and more social

o more and more social data available

• Privacy:

o Ability to control what information one reveals about oneself over the

Internet, and to control who can access that information

• Trust:

o Alice trusts Bob if she commits to an action based on a belief that Bob's future

actions will lead to a good outcome

• Social Trust Relationships

7.10.1.4 Social Semantic Web (S2W) / Web 3.0

• Combines: Semantic Web – Social Software – Web 2.0

• Examples: dbPedia, SIOC – Semantically-Interlinked Online Communities

• What is Web 3.0:

o Web 2.0 + Semantic Web = Web 3.0

o Semantic Web = Web 3.0

o The internet of things

o Another upgrade to the Web