knowledge access semantic technology for km

110
1 Knowledge Access Semantic technology for KM John Davies BT Research [email protected] ACAI 05 SEKT SUMMER SCHOOL ON KNOWLEDGE TECHNOLOGY

Upload: cooper-cruz

Post on 01-Jan-2016

33 views

Category:

Documents


1 download

DESCRIPTION

ACAI 05 SEKT SUMMER SCHOOL ON KNOWLEDGE TECHNOLOGY. Knowledge Access Semantic technology for KM. John Davies BT Research [email protected]. Overview. Introduction to the Semantic Web Language stack Semantic Search and Browse Knowledge Sharing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Knowledge Access Semantic technology for KM

1

Knowledge AccessSemantic technology for KM

John DaviesBT Research

[email protected]

ACAI 05 SEKT SUMMER SCHOOL ON KNOWLEDGE

TECHNOLOGY

Page 2: Knowledge Access Semantic technology for KM

2

Overview

• Introduction to the Semantic Web– Language stack

• Semantic Search and Browse• Knowledge Sharing• Natural Language Generation &

Summarisation• Knowledge Delivery via Device Independence• Quiz!

Page 3: Knowledge Access Semantic technology for KM

3

Limitations of the Web today

Machine-to-human, not machine-to-machine

Page 4: Knowledge Access Semantic technology for KM

4

The Semantic Web

• allowing information to be shared and processed – adding context and structure Tim Berners-Lee– “an extension of the current web in which

information is given well-defined meaning, better enabling computers and people to work in cooperation”

• An open platform

Page 5: Knowledge Access Semantic technology for KM

5

Semantic Web

„The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in co-operation.“

[Berners-Lee et al., 2001]

Page 6: Knowledge Access Semantic technology for KM

6

10000

100000

1000000

10000000

100000000D

ez 9

4

Jun

95

Dez

95

Jun

96

Dez

96

Jun

97

Dez

97

Jun

98

Dez

98

Jun

99

Dez

99

Jun

00

Dez

00

Jun

01

Dez

01

Jun

02

Dez

02

Jun

03

Dez

03

Time

Web

Ser

ver

Nu

mb

er

[ Source: http://www.zakon.org/robert/internet/timeline/ ]

„Web data transfer larger than FTP data transfer“„Kifer, Lausen, Woo, Logical foundations of object-orientedand frame-based languages“„A. Borgida, On the relative expressiveness of descriptionLogics and predicate logic“

... Semantic Web HISTORY

„W3C Semantic Web Standardization:Work on Web Ontology Language (OWL)“

„W3C standardization of Semantic Web startsWork on Resource Description Framework (RDF)Work on RDF Schema (RDFS)“

10.2.2004: Resource Description Framework (RDF)Resource Description Framework (RDF)Web Ontology Language (OWL)Web Ontology Language (OWL)become W3C recommendationsbecome W3C recommendations

„W3C Standardization of XML starts“

„Research projects on Web Ontologiesstart EU : On-To-Knowledge (01/00)and US (DARPA): DAML (07/00)“

Page 7: Knowledge Access Semantic technology for KM

7

Semantic Web Layers

Data Exchange

Entailment of the Implicit

Explicit Semantics

Relational Distributed Data

Page 8: Knowledge Access Semantic technology for KM

8

Where we are Today: the Syntactic Web

[Hendler & Miller 02]

Page 9: Knowledge Access Semantic technology for KM

9

i.e. the Syntactic Web is…

• A place where – computers do the presentation (easy)

and – people do the linking and interpreting

(hard).

• Why not get computers to do more of the hard work?

[Goble 03]

Page 10: Knowledge Access Semantic technology for KM

10

Hard Work using the Syntactic Web…

• Complex queries involving background knowledge– Find information about “animals that use sonar

but are not either bats, dolphins or whales”• Locating information in data repositories

– Travel enquiries– Prices of goods and services– Results of human genome experiments

• Delegating complex tasks to web “agents”– Book me a holiday next weekend somewhere

warm, not too far away, and where they speak French or English

e.g. Barn Owl

Page 11: Knowledge Access Semantic technology for KM

11

Motivation – Knowledge ManagementKnowledge workers are overwhelmed with

information:• from intranets, emails, external newslines …• but may still lack the information required

They need information identified:• by semantics, not just keywords• by their interests and their task context• in a form appropriate to their current

physical context– mobile phone, PDA, blackberry, laptop, …

Page 12: Knowledge Access Semantic technology for KM

12

Knowledge access

• context-aware tools for access to semantically-annotated knowledge– search, browse, share, summarise– integrated into day-to-day business

processes– automatic knowledge delivery based on

current context• activity, location, device, interests

– support multiple end-user devices

Page 13: Knowledge Access Semantic technology for KM

13

XML is a first step

• Semantic markup– HTML layout

• use bold font• Insert an image here

– XML content• this part of the document is the product price• this document describes a telecommunications

service

Page 14: Knowledge Access Semantic technology for KM

14

XML

<play> <title>The Life and Death of King John</title> <Dramatis Personae> <persona>The Earl of PEMBROKE</persona> <persona>The Earl of ESSEX</persona> …… </Dramatis Personae> <Stagedir>SCENE England, the Court.</Stagedir> <act>Act 1 <scene>Scene I. <speech> <speaker>John</speaker> <line>Now, Chatillon, what would France with us?</line> </speech>

Page 15: Knowledge Access Semantic technology for KM

15

QuizXML

• Standard search engine– WWW pages indexed– maps keywords to WWW pages

• QuizXML– A finer-grained index– maps keywords to documents and the XML tags

in which they occur

Page 16: Knowledge Access Semantic technology for KM

16

QuizXML demo

Page 17: Knowledge Access Semantic technology for KM

17

XML is a first step

• Metadata (with limitations)– within documents, not across documents– prescriptive, not descriptive– No commitment on vocabulary and modelling

primitives (subclass, instance, etc)<vehicle>

<car>ford<engine>xyz123-4</engine><model>mondeo></mondeo>

</car></vehicle>

• RDF and ontologies are the next step

Page 18: Knowledge Access Semantic technology for KM

18

What are Ontologies?

• Ontologies provide a shared and common understanding of a domain (medicine, finance, …)– a shared specification of a conceptualisation– ‘Concept map’– A simple example - Yahoo

• Business&Economy > Finance > Banking

– for WWW, defined using RDF(S) & OWL

Page 19: Knowledge Access Semantic technology for KM

19

Taxonomies

Animals

Invertebrates

Insects …..ArachnidsReptilesMammals

Vertebrates

Page 20: Knowledge Access Semantic technology for KM

20

Ontology of People and their Roles

Employee

Manager Expert Analyst

Programme Mgr Project Mgr

funds

advises

Contractor

Page 21: Knowledge Access Semantic technology for KM

21

Structure of an Ontology

Typically two distinct components:– Names for important concepts and relationships in

the domain• Elephant is a concept whose members are a kind

of animal• Herbivore is a concept whose members are those

animals who eat only plants – Background knowledge/constraints on the

domain• Adult_Elephants weigh at least 2,000 kg• No individual can be both a Herbivore and a

Carnivore

Page 22: Knowledge Access Semantic technology for KM

22

Why develop an ontology?• Define web resources more precisely and make

them amenable to machine processing• Make domain assumptions explicit

– Easier to change domain assumptions– Easier to understand and update legacy data

• Separate domain and operational knowledge– Re-use separately

• A community reference for applications• To share a consistent understanding of what

information means

Page 23: Knowledge Access Semantic technology for KM

23

Ontologies - Some Examples• General purpose ontologies:

– The Upper Cyc Ontology, http://www.cyc.com/cyc-2-1/index.html– IEEE Standard Upper Ontology, http://suo.ieee.org/

• Domain and application-specific ontologies:– RDF Site Summary RSS,

http://groups.yahoo.com/group/rss-dev/files/schema.rdf– Dublin Core, http://dublincore.org/– UMLS, http://www.nlm.nih.gov/research/umls/– Open Biological Ontologies: http://obo.sourceforge.net/– FOAF – www.foaf.org

• Ontologies in a wider sense– Agrovoc, http://www.fao.org/agrovoc/– UNSPSC, http://eccma.org/unspsc/

• DAML.org library http://www.daml.org/

Page 24: Knowledge Access Semantic technology for KM

24

Ontology and Logic

• Reasoning over ontologies• Inferencing capabilities

X is author of Y Y is written by X

X co-wrote D; Y co-wrote D X and Y collaborate

Cars are a kind of vehicle;Vehicles have 2 or more wheels

Cars have 2 or more wheels

Page 25: Knowledge Access Semantic technology for KM

25

RDF and RDF-S

• W3C standards• RDF-S defines the ontology

– classes and their properties and relationships• There are books and authors. Authors write books.

• RDF defines the instances of these classes and their properties

• Mark Twain is an author• Mark Twain wrote “Adventures of Tom Sawyer”• “Adventures of Tom Sawyer” is a book

Page 26: Knowledge Access Semantic technology for KM

26

An example RDF Schema

Writer hasWritten Book

FamousWriter

/twain.com/mark books.com/ISBN00010475

Schema(RDFS)Data(RDF)

hasWrittentype

subClassOf

domain range

type

Annotation of WWW resources and semantic links

DoB “25/12/68”

Page 27: Knowledge Access Semantic technology for KM

27

hasName(‘http://www.famouswriters.org/twain/mark’,“Mark Twain”)

hasWritten(‘http://www.famouswriters.org/twain/mark’,‘http://www.books.org/ISBN00001047582’)

title(‘http://www.books.org/ISBN00001047582’,“The Adventures of Tom Sawyer”)

XML version:<rdf:Description rdf:about=http://www.famouswriters.org/twain/mark>

<s:hasName>Mark Twain</s:hasName><s:hasWritten rdf:resource=http://www.books.org/ISBN0001047/>

</rdf:Description>

RDF

Page 28: Knowledge Access Semantic technology for KM

28

QuizRDF

• Searching RDF-annotated web resources

Page 29: Knowledge Access Semantic technology for KM

29

RDF metadata annotations

Annotation(metadata)

Data (WWW document)

RDF

Lost information

• Subjective• One of several interpretations• Not exhaustive

Page 30: Knowledge Access Semantic technology for KM

30

RDF as an Enrichment

Annotation

Text

RDF Text

Page 31: Knowledge Access Semantic technology for KM

31

Precision and recall - the IR dilemma• Trade-off between

precision and recall– recall - how many of

relevant were found– precision - how many of

found were relevant

• Holy grail: high precision & high recall

• QuizRDF offers both– separately– closely-coupled

Page 32: Knowledge Access Semantic technology for KM

32

Indexing: data model

EmployeeProject Skill

Person

rdfs:Resource

rdf:Literal

rdf:Literal

works_in_project

first_name

last_name

has_skills

malta.bt.com/gm/cv first_name

last_name

“George”

“Miller”

George MillerJoined BT in1997

RDFRDF(S)

Content ofWeb resource

subClassOf (isA)

typeOf (instance)

Property

EmployeeProjectProject Skill

Person

rdfs:Resource

rdf:Literal

rdf:Literal

works_in_project

first_name

last_name

has_skills

malta.bt.com/gm/cv first_name

last_name

“George”

“Miller”

George MillerJoined BT in1997

RDFRDF(S)

Content ofWeb resource

subClassOf (isA)

typeOf (instance)

Property

Page 33: Knowledge Access Semantic technology for KM

33

Multidimensional Indexing• “Traditional” search engine indexing

term {documents} “employee” {URI1, URI3, URI9}“miller” {URI3, URI7}

• QuizRDF indexing<literal,class,property> {URIs}<“george”, Employee, first_name> {URI2}<“miller”, Employee, last_name> {URI1, URI3}<“miller”, Employee, > {URI1, URI3, URI7}

Page 34: Knowledge Access Semantic technology for KM

34

QuizRDF demo

Page 35: Knowledge Access Semantic technology for KM

35

Two Retrieval Channels

RDF Text

• Original content• “Complete”• Imprecise• Higher recall

• Precise• Machine readable• Subjective • Incomplete• Higher precision

RQL Keyword query

Browser interface

Precision

Recall

Page 36: Knowledge Access Semantic technology for KM

36

Contribution

• Combination of–User familiar keyword search–More precise RDF querying

• Data and metadata as complementary• Low threshold, high ceiling

–Works on non-RDF information–Exploits RDF where it exists

• Integrates browsing and querying–Fits users’ info seeking behavior

Page 37: Knowledge Access Semantic technology for KM

37

Conclusions about RDF(S)• Next step up from plain XML:

– (small) ontological commitment to modeling primitives

– possible to define domain vocabulary– limited reasoning

• subsumption, but no transitivity, symmetry, …

– limited expressive power• no cardinality constraints, equality, disjointness, …

Page 38: Knowledge Access Semantic technology for KM

38

Web Ontology Language RequirementsDesirable features identified for Web Ontology

Language:

• Extends existing Web standards

– Such as XML, RDF, RDFS

• Easy to understand and use

– Should be based on familiar KR idioms

• Formally specified

• Of “adequate” expressive power

• Possible to provide automated reasoning support

Page 39: Knowledge Access Semantic technology for KM

39

OWL Language

• OWL is based on Description Logics knowledge representation formalism

• OWL (DL) benefits from many years of DL research:– Well defined semantics– Formal properties well understood (complexity,

decidability)– Known reasoning algorithms– Implemented systems (highly optimised)

• Three species of OWL– OWL Full – maximum expressivity, undeciable – OWL DL – based on SHIQ DL, decidable– OWL Lite - subset of OWL DL, most efficient reasoning

Page 40: Knowledge Access Semantic technology for KM

40

Why OWL?

• OWL = Web Ontology Language• Owl’s superior intelligence is known

throughout the Hundred Acre Wood, as are his talents for Writing, Spelling, other Educated and Special tasks.

• "My spelling is Wobbly. It's good spelling, but it Wobbles, and the letters get in the wrong places."

Page 41: Knowledge Access Semantic technology for KM

41

QuizOWL!

Page 42: Knowledge Access Semantic technology for KM

42

Re-cap

• XML, RDF, OWL language stack• Increasingly sophisticated search

– QuizXML• subdocument searching

– QuizRDF• browsing by concept and across relations• searching on metadata and full-text

• Next steps in semantic search– identification of named entities within documents– Exploitation of world knowledge– KIM (Ontotext)

Page 43: Knowledge Access Semantic technology for KM

43

The KIM Platform

• A platform offering services and infrastructure for:

– (semi-) automatic semantic annotation – ontology population– semantic indexing and retrieval of content

– query and navigation • Based on an Information Extraction technology• Aim: to underpin Semantic Web applications

- by providing a metadata generation technology- in a standard, consistent, and scalable framework

Page 44: Knowledge Access Semantic technology for KM

44

Ontologies

- PROTON - a light-weight upper-level ontology;

- 250 NE classes;

- 100 relations and attributes;

- covers mostly NE classes, and to a smaller degree general concepts;

http://proton.semanticweb.org/

Page 45: Knowledge Access Semantic technology for KM

45

Ontologies II

Page 46: Knowledge Access Semantic technology for KM

46

KIM World KB

• Aims to cover the most popular entities in the world

– Entities of general importance … like the ones that appear in the news …

• KIM “knows about”:– Organizations, all important sorts of: business,

international, political, government, sport, academic…

– Specific people, (e.g. Politicians)– Locations: countries, regions, cities, roads, etc.

Page 47: Knowledge Access Semantic technology for KM

47

KIM World KB: Content

• Collected from various sources, like geographical and business intelligence gazetteers.

• KIM also learns from documents indexed– via GATE information extraction

KB scaleRDF Statements Small KB Full KB - explicit 444,086 2,248,576 - after inference 1,014,409 5,200,017

Page 48: Knowledge Access Semantic technology for KM

48

KIM Scaling on Data

• The Semantic Repository is based on Sesame/OWLIM.

• Our practical tests demonstrate a perfect performance on top of:

– 1.2M entity descriptions:– about 15M explicit statements;– above 30M statements after forward

chaining. • Fulltext indexing with Lucene:

– .5M docs, retrieval in milliseconds

Page 49: Knowledge Access Semantic technology for KM

49

Semantic Annotation

Page 50: Knowledge Access Semantic technology for KM

50

Simple Usage: Highlight, Hyperlink, and …

Page 51: Knowledge Access Semantic technology for KM

51

Simple Usage: Explore and Navigate

Page 52: Knowledge Access Semantic technology for KM

52

People search for PeopleA recent large-scale human interaction study on a

personal content IR system, carried out by Microsoft demonstrated that:

“The most common query types in our logs were People/places/things, Computers/internet and Health/science. In the People/places thing category, names were especially prevalent. Their importance is highlighted by the fact that 25% of the queries involved people’s names ... . In contrast, general informational queries are less prevalent.”

Page 53: Knowledge Access Semantic technology for KM

53

Semantic Queries

• The standard IR query is: –“give me documents that contain the words ‘company’, ‘Europe’, ‘telecommunication’…”

• KIM provides indexing & retrieval wrt NEs–More precise specification and satisfaction of information needs–specify the NEs we are interested in, and to restrict them by their attributes and relations–“Give me documents that mention a company in Europe from the telecommunications industry sector…”

Page 54: Knowledge Access Semantic technology for KM

54

Precision in Semantic Search• KIM can match

– a query: Documents concerning a telecom company in Europe, John Smith, and a date in the first half of 2002.

– With a document containing: “At its meeting on the 10th of May, the board of Vodafone appointed John G. Smith as CTO"

– Classical IR cannot do the required reasoning:- Vodafone is a mobile operator, which is a kind of

telecom company;

- Vodafone is in the UK, which is a part of Europe.

- 5th of May is a "date in first half of 2002“;

- “John G. Smith” matches “John Smith”.

Page 55: Knowledge Access Semantic technology for KM

55

Entity Pattern Search

Page 56: Knowledge Access Semantic technology for KM

56

Pattern Search: Entity Results

Page 57: Knowledge Access Semantic technology for KM

57

Entity Pattern Search: KIM Explorer

Page 58: Knowledge Access Semantic technology for KM

58

Predefined Pattern Search

Page 59: Knowledge Access Semantic technology for KM

59

Pattern Search: Multiple-Entity Results

Page 60: Knowledge Access Semantic technology for KM

60

Pattern Search, Referring Documents

Page 61: Knowledge Access Semantic technology for KM

61

Document Details

Page 62: Knowledge Access Semantic technology for KM

62

KIM - summary

KIM is a platform for: - semantic annotation,- ontology population,- semantic indexing and retrieval,- providing an API for remote access and

integration,- based on Information Extraction (IE) using

mature HLT (GATE).- powered by massive world knowledge;- http://www.ontotext.com/kim

Page 63: Knowledge Access Semantic technology for KM

63

SEKTAgent

• Periodic agent search for named entities– e.g. a person in an organisation– Returns relevant documents and metadata– Proactive knowledge delivery– Linked to device indepedence module (see later)

• Based upon KIM architecture• Result-led indexing

– Adds relevant pages to next crawl list

Page 64: Knowledge Access Semantic technology for KM

64

SEKTAgent demo

Page 65: Knowledge Access Semantic technology for KM

65

TAP

• Uses Google for “traditional” search• Augments results with relevant data aggregated

from distributed (and semantically annotated) data• Offers distributed query interface

Page 66: Knowledge Access Semantic technology for KM

66

TAP tap.stanford.edu for more information

Page 67: Knowledge Access Semantic technology for KM

67

Swoogle

• Searching for semantic web documents and ontologies

• See swoogle.umbc.edu

Page 68: Knowledge Access Semantic technology for KM

68

Google vs. Swoogle

• How to find a popular ontology that defines the concept of person?

• Ask Google?– Type “Person filetype:rdf”– Type “Person filetype:owl”– More complicated query “person rdfs:Class

filetype:rdf”• Ask Swoogle?

– Type “person” in document search• [1] http://xmlns.com/foaf/0.1/index.rdf

Page 69: Knowledge Access Semantic technology for KM

69

Find “Time” Ontology

We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.

Page 70: Knowledge Access Semantic technology for KM

70

Beyond search, beyond documents

• a long list of documents is rarely the ultimate information need of the end user

• “there’s too much relevant information!”• support for the next step - the analysis of

the returned information• e.g. key points on a topic from a large

document you don’t want to read• e.g. creation of a digest of information

from multiple documents about Bush’s statements on a given topic

Page 71: Knowledge Access Semantic technology for KM

71

Search Engine trends

• Seamless and integrated– one search engine for Web and desktop– implicit queries based on user activity

• Personalisation– based on user interaction

• Beyond document lists– sub-document analysis

• Taxonomies and classification– taxonomy / enterprise search growing at 10% p.a.

• Ontologies and semantic annotation– A coherent approach to all these issues

markets

Page 72: Knowledge Access Semantic technology for KM

72

Knowledge Sharing

• Sharing knowledge through an organisation– learning from success and failures of others–avoiding duplication of effort

• (Virtual) communities of practice–Groups with shared interests who will benefit from collaboration and sharing knowledge

–(Using WWW technology to increase “collaborative radius”)

Page 73: Knowledge Access Semantic technology for KM

73

Communities & the Semantic Web

• Communities require a shared conceptual vocabulary

• Consensual, evolving “concept map”–Ontologies!

• OntoShare • automates sharing of knowledge in an organisation via community-based RDF(S) ontologies

Page 74: Knowledge Access Semantic technology for KM

74

OntoShare

• Sharing and Classifying resources according to an Ontology

• Informs users when relevant document added to store–Ontology-based personalisation

• Provides knowledge store for browsing and searching

Page 75: Knowledge Access Semantic technology for KM

75

Page 76: Knowledge Access Semantic technology for KM

76

OntoShare :Sharing knowledge• User shares

knowledge–WWW document–Any textual data–Can supply annotation

Page 77: Knowledge Access Semantic technology for KM

77

OntoShare: Sharing knowledge

• System automatically extracts keywords & summary

• System assigns knowledge to concepts

Page 78: Knowledge Access Semantic technology for KM

78

OntoShare: Sharing knowledge• System emails an alert to

selected users based on match to user profile

Page 79: Knowledge Access Semantic technology for KM

79

OntoShare: Evolving Ontologies• OntoShare

automatically suggests changes to concept characterisation

• Concept characterisations evolve over time

Page 80: Knowledge Access Semantic technology for KM

80

OntoShare: Evolving Ontologies

• User can suggest new concepts for ontology at any time

• System emails community on suggestion (à la Usenet) and counts votes

Page 81: Knowledge Access Semantic technology for KM

81

Finding People & Collaboration• Use of personal profiles

–Who else is interested in this document?–Who else is interested in this topic?

• Encouraging exchange of tacit knowledge• Discussion threads around shared

knowledge • Adding value to the knowledge stored

Page 82: Knowledge Access Semantic technology for KM

82

SWAP – Semantic Web and Peer-to-Peer

• Distributed Knowledge Management– Different participants with different

conceptualizations of their domain– Different knowledge sources– Physically distributed, dynamic environment

• Peer-To-Peer Approach– Decentralized nature: Local control– Symmetry: Everyone is provider and consumer– P2P networks as a reflection of social networks– Flexible collaboration beyond hierarchical

structures

Page 83: Knowledge Access Semantic technology for KM

83

Case Study: The Bibster System• Scenario: Sharing of bibliographic

metadata in a Peer-to-Peer network– Bibliographic metadata is created and

maintained in a decentralized manner,– Researchers are willing to share their data– Use of semantics is crucial in this setting

• The Bibster system allows users to: – Easily share bibliographic data– Save work in finding this data– Avoid re-typing this data by hand

Page 84: Knowledge Access Semantic technology for KM

84

Semantic Methods in Bibster• Semantic representation and querying of metadata

– Extraction and classification from e.g. BibTeX files– Semantic Web Research Community Ontology and

ACM Topic hierarchy as light-weight ontologies

• Peer selection using semantic topologies– Scalability requires intelligent query routing– Semantic descriptions of peers´ expertise as basis for peer

selection

• Semantic duplicate detection– Highly redundant and inconsistent representation of

bibliographic metadata– Semantic similarity measures to detect duplicates

Page 85: Knowledge Access Semantic technology for KM

85

Bibster Screenshot

Open Source: http://bibster.sourceforge.net/

Semantic Search

Query Results

Integration and Export of Query Results

Page 86: Knowledge Access Semantic technology for KM

86

NLG - Summarisation

• NLG takes as input structured data in a knowledge base or ontology and produces natural language text

• Applied to provide automatic documentation of ontologies or generate textual reports from formal knowledge

• Keeps texts constantly up-to-date so they reflect changes in the ontology

• OntoSum, University of Sheffield

Page 87: Knowledge Access Semantic technology for KM

87

The Property Hierarchy

• Special linguistically-motivated properties introduced to make the NLG modules more generic: – active-action (e.g. works-for) – passive-action (e.g., published-by)– Attribute (e.g. has-age, has-web-address)– part-whole (e.g., consists-of)

• All properties from the ontology were made sub-properties of one of these 4

• Attribute properties recognised using heuristics, such as property name starts with “has” (hasWebPage)

Page 88: Knowledge Access Semantic technology for KM

88

Summary Structuring

• Capture regular patterns; can be applied recursively• Describe-Instance ->

Describe-Attributes,Describe-Part-Whole,Describe-Active-Actions,Describe-Passive-Actions

• Describe-Attributes ->

[attribute(Instance, Attribute)],

Describe-Attributes *

Collect all subproperties of Attribute property relating to Instance

Attribute(John, hasMobileNumber)…..

Page 89: Knowledge Access Semantic technology for KM

89

Ontology-Based Aggregation• Joining attribute and part-whole properties

with the same first argument to have more coherent sentences– ATTR(Researcher: XXX, Appellation: Dr)

ATTR(Researcher: XXX, string: my_email@sheff)ATTR(Researcher: XXX, string: 012344567)ATTR(Researcher: XXX, string: www.mypage.ac.uk)

• Without aggregation:Kalina Bontcheva has a Dr appellation. Kalina Bontcheva has email [email protected]. Kalina Bon…

• With aggregation:Kalina Bontcheva has a Dr appellation, email [email protected] and …

Page 90: Knowledge Access Semantic technology for KM

90

Lexicalisation of Classes & Properties• 3 options:• Specified by ontology engineer• Same as concept/property name• Added manually when parameterising

OntoSum

Page 91: Knowledge Access Semantic technology for KM

91

Description of “HSBC”

Bank

Financial Institution

HSBC

Person OrganisationlendsTo lendsTo

€43bn 137000market-cap employees

Page 92: Knowledge Access Semantic technology for KM

92

Description of “HSBC”

Page 93: Knowledge Access Semantic technology for KM

93

Innovative aspects

• Can tailor summary to device profile– Apply length restriction

• e.g. for text message for mobile phone

– Generate HTML for web browser or plain text for email

• See device independence (next!)• Readability heuristics

– introduce lists when verbalising more than 3 attributes

• Use of ontology mapping rules to run same system on multiple ontologies

Page 94: Knowledge Access Semantic technology for KM

94

Related work

• Wilcock (Helsinki)– Fully automatic, no lexicon– “Talking OWLs”, ISWC-03

• MIAKT– Some manual input– More effort, more fluency– OntoSum based on MIAKT– Bontcheva, NLDB04

Page 95: Knowledge Access Semantic technology for KM

95

OntoSum demonstration

Page 96: Knowledge Access Semantic technology for KM

96

Device Independence

• context-aware tools for access to semantically-annotated knowledge– search, browse, share, summarise– integrated into day-to-day business

processes– automatic knowledge delivery based on

current context• activity, location, device, interests

– support multiple end-user devices

Page 97: Knowledge Access Semantic technology for KM

97

Device independence

3 approaches:• Hand-craft different sites for different devices

– Labour intensive, difficult to maintain• Extend HTML to describe interaction, navigation and selection

– Server software generates output in suitable format using CC/PP– Inflexible – difficult to control output precisely– No support for large volume sites– Unclear what extensions are necessary and sufficient

• SEKT approach– Use templates to format data content appropriate for each class of

device– Fine control of output based on CC/PP profiles– can handle large volumes of structured data - XML; databases– device-dependencies coded in the templates, e.g. ± mouse capability

Page 98: Knowledge Access Semantic technology for KM

98

Device Profiles in RDF

• CC/PP - W3C RDF standard for describing device characteristics

• CC/PP vocabularies define device components and component attributes – UAProf is an application of CC/PP adopted by

many terminal device manufacturers– An ontology of devices – inheritance and

specialisation

• Profile references and Profile Diffs are sent with an information request

• javax.ccpp package for processing profiles

Page 99: Knowledge Access Semantic technology for KM

99

User Profiles

• Effective presentation must take user preferences & accessibility issues into account– Font size– Colour preference – Hi res/Lo res

• Device characteristics and preference/ accessibility requirements need to be combined

• Effective screen size depends on both physical size and user preferences (e.g. font size)

• Specialisation/extension of UAProf

Page 100: Knowledge Access Semantic technology for KM

100

Profile Engine

• The Profile engine combines device and user profiles to generate a set of conditions

• The engine can be queried by other applications

• PROLOG is being used as a prototyping language– Arithmetic calculations of effective screen size (for

example) require more than RDF/OWL– DL (DIG) interface to SWI-Prolog

Page 101: Knowledge Access Semantic technology for KM

101

Content Adaptation

• The content adaptation engine uses conditions generated by profile engine queries

• Example conditions:– Screen size x font size →

number of characters of text– GraphicsSupported?– Colour or B&W

• Device characteristic or• Accessibility issue

Page 102: Knowledge Access Semantic technology for KM

102

Content Generation

• Different content must be generated for different devices

• The current context (set of conditions) will be made available to SEKT applications

• Natural Language Processing techniques are be used to generate or modify information– Mobile phone – 400 character text message– PC – multimedia document

• NLG – describing ontology-based knowledge in natural language (OntoSum!)

Page 103: Knowledge Access Semantic technology for KM

103

Device Independence

• A functional presentation of a resource should be available via any suitable device

• Requirements include content selection, layout transformation and style selection

• At present, no one language can be interpreted by all clients

• It follows that content must be formatted for the target device on the server

Page 104: Knowledge Access Semantic technology for KM

104

Templates

• Declarative templates are used to format the (XML-based) data

• Context (conditions) can be used to select templates, and sections within templates– Template 1 – WML

• InputEnabled?– Template 2 – HTML

• GraphicsWanted?

• Separation of data storage, processing and display

• W3C working group on device independence– No standard for templates (yet)

Page 105: Knowledge Access Semantic technology for KM

105

Overview

Device Properties

User preferences

Context

Raw Information

Repurposed Information

Profiling engine ContentAdaptation

UAProf (RDF(S))

(syntactic & semantic)

Page 106: Knowledge Access Semantic technology for KM

106

Device Independence demo

Page 107: Knowledge Access Semantic technology for KM

107

Device Independence Summary • Device and User profiles need to be

combined using a suitable ontology • A profile reasoning engine is used to

generate conditions on the format• Content can be generated according to the

context (set of conditions)• NLP techniques can be used to

generate/summarise text (semantic)• Templates are used to transform the results

to a format suitable for the device at hand (syntactic)

Page 108: Knowledge Access Semantic technology for KM

108

Conclusion

• Semantic Web technology can offer enhancements to a range of KM tools– Search, Share, Summarise, Deliver

• Also– Visualisation

• RDF or OWL statements as a graph

– Integration of heterogeneous information

• Outstanding Issues– Trade-off between reasoning and scalability– Where does the metadata come from?

• Only KIM starting to address this point• See also SEKT project (www.sekt-project.com)

– Who will find the killer app?!– Plenty of topics still on the research agenda

Page 109: Knowledge Access Semantic technology for KM

109

Acknowledgements

• Peter Haase, University of Karlsruhe• Kalina Bontcheva, University of Sheffield• Naso Kiryakov, Ontotext• Ian Horrocks, University of Manchester• Tim Glover & Alistair Duke, BT

Page 110: Knowledge Access Semantic technology for KM

110

Thank you – questions?

• Here’s a few for you:– What are the semantic web layers?– Name 3 ontologies in widespread use today– Name 3 semantic search tools– What RDF ontology is used to characterise devices– Why use NLG techniques on ontological information?– What are the advantages of RDF over XML? And

OWL over RDF?– Names 3 trends in search engine development– Describe briefly the way(s) in which metadata can

improve search performanceWIN A PRIZE!!!!!

John DaviesNext Generation Web Research, BT

[email protected]