exploiting large scale web semantics to build end user applications enrico motta professor of...

70
Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open University

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Exploiting large scale web semantics to build end user applications

Enrico MottaProfessor of Knowledge Technologies

Knowledge Media InstituteThe Open University

Page 2: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Aims of the Talk

• What is the Semantic Web– Perspectives

• The SW as a ‘web of data’• The SW as a new context in which to build semantic

applications and an unprecedented opportunity in which to address some classic AI problems

– Typical misconceptions• What the SW is not!

• Semantic Web for Users– Applications that do something interesting and useful

to users, by exploiting available web semantics

Page 3: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

The Semantic Web as a ‘Web of Data’

Making data available to SW-aware software

Page 4: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 5: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 6: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

<foaf:Person rdf:about="http://identifiers.kmi.open.ac.uk/people/enrico-motta/">

<foaf:name>Enrico Motta</foaf:name> <foaf:firstName>Enrico</foaf:firstName> <foaf:surname>Motta</foaf:surname> <foaf:phone rdf:resource="tel:+44-(0)1908-653506"/> <foaf:homepage rdf:resource="http://kmi.open.ac.uk/people/motta/"/> <foaf:workplaceHomepage rdf:resource="http://kmi.open.ac.uk/"/> <foaf:depiction rdf:resource="http://kmi.open.ac.uk/img/members/enrico.jpg"/> <foaf:topic_interest>Knowledge Technologies</foaf:topic_interest> <foaf:topic_interest>Semantic Web</foaf:topic_interest> <foaf:topic_interest>Ontologies</foaf:topic_interest> <foaf:topic_interest>Problem Solving Methods</foaf:topic_interest> <foaf:topic_interest>Knowledge Modelling</foaf:topic_interest> <foaf:topic_interest>Knowledge Management</foaf:topic_interest> <foaf:based_near> <geo:Point> <geo:lat>52.024868</geo:lat> <geo:long>-0.707143</geo:long>

<contact:nearestAirport> <airport:name>London Luton Airport</airport:name> <airport:iataCode>LTN</airport:iataCode> <airport:location>Luton, United Kingdom</airport:location> <geo:lat>51.866666666667</geo:lat> <geo:long>-0.36666666666667</geo:long> <rdfs:seeAlso rdf:resource="http://www.daml.org/cgi-bin/airport?LTN"/> <foaf:currentProject>

<foaf:Project><foaf:name>AquaLog</foaf:name> </foaf:currentProject>

Page 7: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

The web of SW documents

Page 8: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Current status of the semantic web

• 10-20 million semantic web documents– Expressed in RDF, OWL, DAML+OIL

• 7K-10K ontologies– These cover a variety of domains - music, multimedia, computing, management, bio-medical sciences, upper level concepts, etc…

• Hence:– To a significant extent the semantic web is already in place

– However, domain coverage is very uneven

– Still primarily a research enterprise, however interest is rapidly increasing in both governmental and business organizations• “early adopters” phase

The above figures refer to resources which are publicly accessible on the web

Page 9: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

<data data data><data data data>

<data data data><da

ta d

ata

data

>

<data data data>

<data data data>

Page 10: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 11: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

<rdf:Description rdf:about="http:/ /ww w.ecs.soton.ac.uk/info/#person-01269"> <ns0:family-name>Gibbins</ns0:family-name> <ns0:full-name>Nicholas Gibbins</ns0:full-name> <ns0:given-name>Nicholas</ns0:given-name> <ns0:has-email-address>[email protected]</ns0:has-email-address> <ns0:has-affiliation-to-unit rdf:resource="http:// 194.66.183.26/ WEBSITE/GOW/Vie wDepartment.aspx?Department=750"/> </ rdf:Description> </ rdf:RDF>

CS Dept Data

AKT Reference Ontology

RDF Data

Bibliographic Data

Geography

Page 12: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 13: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

• A ‘corporate ontology’ is used to provide a homogeneous view over heterogeneous data sources.

• Often tackle Enterprise Information Integration scenarios

• Hailed by Gartner as one of the key emerging strategic technology trends– E.g., Garlik is a multi-million startup recently set up in UK to support

personal information management, which uses an ontology to integrate data mined from the web on a large scale

“Corporate Semantic Webs”

Page 14: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 15: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 16: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

AquaLog

Page 17: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Applications that exploit large scale semantic content

Page 18: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

The web of data

Page 19: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Gateways to the SW

ApplicationSemantic

Web

Page 20: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

• Sophisticated quality control mechanism– Detects duplications– Fixes obvious syntax problems

• E.g., duplicated ontology IDs, namespaces, etc..

• Structures ontologies in a network– Using relations such as: extends, inconsistentWith, duplicates

• Provides interfaces for both human users and software programs

• Provides efficient API• Supports formal queries (SPARQL)• Variety of ontology ranking mechanisms• Modularization/Combination support• Plug-ins for Protégé and NeOn Toolkit • Very cool logo!

Page 21: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 22: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 23: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Case Study 1: Automatic Alignment of Thesauri in the Agricultural/Fishery Domain

Page 24: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Method

Concept_A

(e.g., Supermarket)

Concept_B

(e.g., Building)

ScarletScarlet≡≡

Semantic Web

Semantic Relation

( )

Deduce

Access

- SCARLET - matching by Harvesting the SW

- Automatically select and combine multiple online ontologies to derive a relation

Page 25: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Two strategies

Supermarket Building

Supermarket

Shop

PublicBuilding⊆

⊆Building

ScarletScarlet

Cholesterol OrganicChemical

Cholesterol

Steroid

Lipid⊆

⊆OrganicChemical

ScarletScarlet

Steroid

≡≡ ≡ ≡

Deriving relations from (A) one ontology and (B) across ontologies.

Semantic Web

(A) (B)

Page 26: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Matching:• AGROVOC

•UN’s Food and Agriculture Organisation (FAO) thesaurus •28.174 descriptor terms•10.028 non-descriptor terms

• NALT•US National Agricultural Library Thesaurus•41.577 descriptor terms•24.525 non-descriptor terms

Experiment

Page 27: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

226 Used Ontologies

http://139.91.183.30:9090/RDF/VRP/Examples/tap.rdf

http://reliant.teknowledge.com/DAML/SUMO.daml

http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml

http://reliant.teknowledge.com/DAML/Economy.damlhttp://gate.ac.uk/projects/htechsight/Technologies.daml

Page 28: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Evaluation 1 - Precision

• Manual assessment of 1000 mappings (15%)• Evaluators:

– Researchers in the area of the Semantic Web– 6 people split in two groups

• Results:– Comparable to best results for background

knowledge based matchers.

Page 29: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Evaluation 2 – Error Analysis

Page 30: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Case Study 2:Folksonomy Tagspace Enrichment

Page 31: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

• Tagging as opposed to rigid classification

• Dynamic vocabulary does not require much annotation effort and evolves easily

• Shared vocabulary emerge over time – certain tags become

particularly popular

Features of Web2.0 sites

Page 32: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Limitations of tagging

• Different granularity of tagging– rome vs colosseum vs roman monument– Flower vs tulip– Etc..

• Multilinguality• Spelling errors, different terminology, plural vs

singular, etc…

• This has a number of negative implications for the effective use of tagged resources– e.g., Search exhibits very poor recall

Page 33: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Giving meaning to tags

Page 34: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

1. Mapping a tag to a SW element "japan"

<akt:Country Japan>

What does it mean to add semantics to tags?

2. Linking two "SW tags" using semantic relations

{japan, asia} <japan subRegionOf asia>

Page 35: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Applications of the approach

• To improve recall in keyword search

• To support annotation by dynamically suggesting relevant tags or visualizing the structure of relevant tags

• To enable formal queries over a space of tags– Hence, going beyond keyword search

• To support new forms of intelligent navigation– i.e., using the 'semantic layer' to support navigation

Page 36: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Concept and relation identification

No

END

Remainingtags?

Clustering

Google

Folksonomy

Cluster tags

Cluster1 Cluster2 Clustern…

2 “related” tags

Find mappings & relation for pair of tags

Yes

Analyze co-occurrence of tags

Co-occurence matrix

Pre-processing

Tags

Group similar tags

Filter infrequent tags

Concise tags

Clean tags

Wikipedia

SW search engine

<concept, relation, concept>

Page 37: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

participant

innovation

event

developer

activity

creatorplanning example

applica-tion

user

admin

resource

typeRange component

interface

partici-patesIn

in-eventarchive

Information Object

has-mention-of

Examples

Cluster_1: {admin application archive collection component control developer dom example form innovation interface layout planning program repository resource sourcecode}

Page 38: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Examples

Cluster_2: {college commerce corporate course education high instructing learn learning lms school student}

education

training1,4 qualification

corporate1 institution

university2,3 college2

postSecondarySchool2

school2

student3 studiesAt

course3

offersCoursetakesCourse

activities4

learning4 teaching4

1http://gate.ac.uk/projects/htechsight/Employment.daml.2http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml. 3http://www.mondeca.com/owl/moses/ita.owl.4http://www.cs.utexas.edu/users/mfkb/RKF/tree/CLib-core-office.owl.

Page 39: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Faceted Ontology

• Ontology creation and maintenance is automated

• Ontology evolution is driven by task features and by user changes

• Large scale integration of ontology elements from massively distributed online ontologies

• Very different from traditional top-down-designed ontologies

Page 40: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Case Study 3:Reviewing and Rating on the Web

Page 41: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Revyu.com

Page 42: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 43: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 44: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 45: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 46: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 47: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

expertise the source has relevant expertise of the domain of the recommendation-seeking; this may be formally validated through qualifications or acquired over time.

experience the source has experience of solving similar scenarios in this domain, but without extensive expertise.

impartiality the source does not have vested interests in a particular resolution to the scenario.

affinity the source has characteristics in common with the recommendation seeker, such as shared tastes, standards, values, viewpoints, interests, or expectations.

track record the source has previously provided successful recommendations to the recommendation seeker.

Trust Factors

Page 48: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

subjective

affinity expertiseexperience

objectivesolution

factorsemphasised

Page 49: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Applying the framework to revyu.com

• Affinity– Operationalised as the degree of overlap in items

reviewed, and in ratings given

• Experience– Proxy metric: Usage of particular tags (as proxies for

topics)• Experience scores based on tagging data• Integrates also data from del.icio.us for those users

who have chosen to publish their del.icio.us account on FOAF

• Expertise– Proxy metric: Credibility– Captures the social aspect of expertise: endorsement

Page 50: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Using trust factors for ranking reviews

Page 51: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 52: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

PowerAqua and PowerMagpie

Page 53: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

How does the Semantic Web relate to Artificial Intelligence research?

Page 54: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

AI as Heuristic Search

Page 55: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

The knowledge-based paradigm in AI

“Today there has been a shift in paradigm. The fundamental problem of understanding intelligence is not the identification of a few powerful techniques, but rather the question of how to represent large amounts of knowledge in a fashion that permits their effective use”

Goldstein and Papert,1977

Page 56: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open
Page 57: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Knowledge Representation Hypothesis in AI

Any mechanically embodied intelligent process will be comprised of structural ingredients that we as external observers naturally take to represent a propositional account of the knowledge that the overall process exhibits, and independent of such external semantic attribution, play a formal but causal and essential role in engendering the behaviour that manifests that knowledge

Brian Smith, 1982

Page 58: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Knowledge-Based Systems

Large Bodyof Knowledge

Intelligent Behaviour

Page 59: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

The Knowledge Acquisition Bottleneck

Large Bodyof Knowledge

Intelligent Behaviour

KA Bottleneck

Knowledge

Page 60: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

The Cyc project

Page 61: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Problem SolvingMethodGeneric Task

Domain Model

MappingKnowledge

Application-specificProblem-Solving Knowledge

Application Configuration

Parametric Design Library of PSMs

Mapping Ontology Ontology

Structured libraries of reusable components

Classification

Scheduling

Etc…

Page 62: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

The next knowledge medium

“An information network with semi-automated services for the generation, distribution, and consumption of knowledge”

• However, our approach based on structured libraries of problem solving components only addressed the economic cost of KBS development…

Page 63: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

SW as Enabler of Intelligent Behaviour

Intelligent Behaviour

Both a platform for knowledge publishing and a large scale source of knowledge

Page 64: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

KBS vs SW Systems

Classic KBS SW Systems

Provenance Centralized Distributed

Size Small/Medium Extra Huge

Repr. Schema Homogeneous Heterogeneous

Quality High Very Variable

Degree of trust High Very Variable

Page 65: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Key Paradigm Shift

Classic KBS SW Systems

Intelligence A function of sophisticated, logical, task-centric problem solving

A side-effect of being able to integrate different types of reasoning to handle size and heterogeneous quality and representation

Page 66: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Conclusions

Page 67: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Typical misconceptions…

• “The SW is a long-term vision…”– Ehm…actually… it already exists…

• “The SW will never work because nobody is going to annotate their web pages”– The SW is not about annotating web pages, the SW is

a web of data, most of which are generated from DBs, or from web mining software, or from applications which produce SW technology

• “The idea of a universal ontology has failed before and will fail again. Hence the SW is doomed”– The SW is not about a single universal ontology.

Already there are around 10K ontologies and the number is growing…

– SW applications may use 1, 2, 3, or even hundreds of ontologies.

Page 68: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Large Scale Distributed Semantics

• Widespread production of formalised knowledge models (ontologies and metadata), from a variety of different groups and individuals– E.g., legal, bio-medical, governmental, environmental, music, art, multimedia,

computing, etc..– “Knowledge modelling to become a new form of literacy?”

• Stutt and Motta, 1997

• This large scale heterogenous resource will enable a new generation of semantic-aware technologies

• These developments may provide a new context in which to address the economic barriers to KBS development

• The SW already exists to some extent, however there is still a way to go, before it will reach the required degree of maturity

Page 69: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open

Large Scale Distributed Semantics

• Much like AI, the semantic web will only succeed if it becomes ubiquitous and hidden

“There's this stupid myth out there that A.I. has failed, but A.I. is everywhere around you every second of the day. People just don't notice it. You've got A.I. systems in cars, tuning the parameters of the fuel injection systems. When you land in an airplane, your gate gets chosen by an A.I. scheduling system. Every time you use a piece of Microsoft software, you've got an A.I. system trying to figure out what you're doing, like writing a letter, and it does a pretty damned good job. Every time you see a movie with computer-generated characters, they're all little A.I. characters behaving as a group. Every time you play a video game, you're playing against an A.I. system.”

Rodney Brooks

Page 70: Exploiting large scale web semantics to build end user applications Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open