collaborative ontology building project

Collaborative Ontology Building Project

- a multiagent-based ontology editing and discovery environment

Jie BaoArtificial Intelligence Research Laboratory Dept of Computer ScienceIowa State UniversityAmes IA 50010

[email protected]://www.cs.iastate.edu/~baojie

Project homepage:http://boole.cs.iastate.edu:9090/COB/

A Research proposalDec 02, 2003

2003-12-02 COB proposal 2

COB

Without SHOE how can you be a RACER?Without Sesame how can you make OIL?

Semantic Web is a plan of goodBut with no ontology it’s only a nil.

Everyone makes a small piece of brick Not in one day can we make Rome real.

Let’s build ontology together and hardJust like ants build their hill.


Outline

1. Objectives2. Key difficulties3. Background review4. A tentative framework


What is the problem

Semantic web needs general and open ontology library, but ontology building is a time-consuming, knowledge sensitive process.

Domain experts are needed, and nobody has full knowledge

Also, intellectual asset/copyright issue hinders the wide usage of commercial ontology (e.g. Cyc)

Automatic ontology discovery and mapping are still impossible in general

Existent ontology editing and discovery tools are standalone and too complex

Not suitable for team ontology generation. Jargons are horrible for common people who knows little

about ontology. Data sources are distributed, heterogonous, dynamic

New concept appears everyday: Election2004


Related problems

Distributed Learning Learning from distributed, heterogonous, dynamic,

multiple dataset Software engineering

Concurrent version control and management

Open Source Issue (copyright vs. copyleft) Knowledge Management

Knowledge sharing in group/project Automatic knowledge aggregation


Design Philosophy (1) ----- about people

Teamwork is needed Nobody can know everything

But everyone is an expert somehow Everybody knows something: your dog, your department,

your favorite TV show You can build big things from small pieces

One expert can write several articles for an encyclopedia And hundreds of experts can work together.

However, People always have different viewpoints Conflict: 21st century begins at 2000/2001 Redundancy: IraqWar, WarInIraq, GulfWarII


Design Philosophy (2) ----- about agent and software

Small pieces of ontologies are generated by agents Those agents are domain experts or trained

agents Light-weight ontology editor which requires

minimal user effort: browser-based Automatic and controllable information collection

by software robots. Ontology repository is maintained by

machine learning algorithms Ontology mapping on controlled topics. Detect and reduce redundancy and conflicts by

inference


A Desirable Case -- Pop Music Ontology (1)

Suppose we want to build an ontology and knowledge base about pop music called PopOnt

Even kids know

John is a teenager student and knows nothing about ontology. But he knows much about pop music. He’d like to share his knowledge to PopOnt.

I’m willing to spend 5 minutes for you

There are millions of pop music fans like John, their knowledge is complementary each other. Some of them may go to the website of PopOnt and write one or two pieces of simple sentences, like [M. Jackson] [isn’t] a [country music artist]. They may also correct others’ mistakes


A Desirable Case -- Pop Music Ontology (2)

You even don’t need to go to the website

There are also mailing lists, newsgroups, weblogs, p2p applications and websites about pop music, which can be used for validation or mining. For example, if [M. Jackson] hardly coincides with [country music], it’s more possible [M. Jackson] [isn’t] a [country music artist] is true

Agent can be expert, too.

It will be more desirable if those articles have subject, abstract, or even keywords, which can be used as labeled instances for machine learning. New concepts can be mined and cross-validated by people, too.

Finally, PopOnt is built in a couple of months and free to use for everyone.


Outline



Key Difficulties 1 : Logic breakdown

How to make ontology editing as easy as writing diary?

Ontology

[subject][predicate][object]




ClassSubClass

SubSubClassSubSubClass

SubClassSubSubClassSubSubClass

Classes and Slots Instances

Can complex ontology be broken down into group of single sentences? Or say, how to decompose complex description logic statement into very simple FOPL sentences? And inverse composition is also needed.

Each single sentences is as simple as A is B , A has B


Key Difficulties 2 : Ontology Evolution

How to refine an ontology by cooperation of experts and software agents?

• People and agents are all error-prone. Interactive and iterative cross-validation are central.

• People are “lazy” and “natural”. An ontology piece may be firstly written in short natural language and be refined latterly by other people or agents into a former and more complex piece.

• Inference are needed to rule out conflict information, to detect malicious/wrong information


Key Difficulties 3 : Ontology Mining

Where to collect source information? Google search? No Pull: agents search and know where are “go

od” sources. That can be verified by whether the source is well cited(referenced) or not.

Push: information are automatic pushed to agent via credible channels.

Automatic extraction is still impossible Depends on NLP Article summary/keywords are helpful, espec

ially when the summary overlaps with existent ontology.

Such summarized text can be used as labeled instance.

Simplified tasks are feasible It the keyword a consistent concept? Do some keywords are related?

Comparison: In content-based retrieval of video database, automatic discovery of semantics based on image processing / pattern recognition are proven not quite successful. Semantics from expert knowledge are needed in MPEG 7 stream.


Key Difficulties 4 : Ontology Mapping

People always name same thing with different names, or divide concepts into groups in multiple ways.

Automatic general ontology mapping is still hard.

Simplified mapping is more feasible while still useful Check concept pair (with

instances) are same or not Detect redundancy and

suggest merge.


Outline



Beyond INDUS

INDUS is a distributed learning system, while COB is a MAS learning system Agents in different channels have different focus for

learning They work together for the same goal.

INDUS have a heavy-weight database mechanism while COB aims at light-weight implementation Ontology/KB are stored in atom sentences Interface for dummies, not for gurus. Data sources are usually small but change quickly,

and their number is huge. In query, uses the inference power of ontology

language.


Semantic Web meets MAS

COB is an application of MAS learning from data on web Learn new concept from instances Validate concept of other agents/human Learner can be any form: BayesNet, Neural Net, Decision

Tree, KNN Everything is about semantics

Agents share an ontology but also have dialect issue Small pieces of semantics are carried by agents and aggr

egated in the “home” Guess semantics from labeled instance.

An application shows how to implement proof and trust on semantic web


Ready Techniques

Dynamic knowledge sharing RSS(RDF site summary): answering questions like "Who wrote this?", "W

hen was this published?", and "What is/are the topic(s) of discussion?" RSS is widely used for news aggregation and automatic news discovery.

Grid/Social Computation Grid: distribute the compuation task across the internet and compose r

esult together. Blog and Wiki: easy to use site building tools, instead of HTML editor. T

opics are refined by the effort of a community. Peer-to-peer communication

Local repository can be shared to other peer The other peer can be a agent in COB !

However, they are all somehow missing of semantics. The unfiltered information may flood the user.


Collaborative Ontology Building Example FOAF

http://xml.mfd-consult.dk/foaf/explorer/

FoaF is an acronym for Friend of a Friend, an experimental project and vocabulary for the Semantic Web.

It is based on the idea of a machine-readable version of the current World Wide Web, with homepages, mailling lists, travel itineraries, calendars, address books and the likes.

Everyone can join and add their own information

It’s RDF based


Collaborative Ontology Building Example

wikipedia 170,000 concepts in

English only, more in other language.

An open encyclopedia

Everyone can edit any page.

Based on the assumption that most of people are nice

And it’s proven true!

Limitation: the relation between items is not formal, and it’s to human read only(at least for now)


Collaborative Ontology Building Example Open Directory Project

http://www.dmoz.org/

60,000 editors460,000 concepts

Collaborative taxonomy building

Open to everyone

Limitation: Taxonomy only


Outline



System design

OntologyRepository

OntoWikiOWL-like

syntax

Hum

an

Exp

ert

Email list

Newsgroup

Forum

Blog

Wiki

P2P node

SemanticRSS-aware

Channel

SemanticRSS-aware

Channel

SemanticRSS-aware

Channel

Agents: Ontology

MiningBrowser

Onto

logy A

lignm

ent

• Version Control• Redundancy Check• Conflict Check• Cross Validation

A

B

C D


Part A (1): OntoWiki

Everyone can edit any concept

Version control is enabled

Ontology-guide editing

Should have a ontology visualizer


Part A (2): OWL-like syntax// COB terms cob:equals cob:documentation // OWL terms

owl:AllDifferent owl:allValuesFrom owl:backwardCompatibleWith owl:cardinality owl:Class owl:complementOf owl:DatatypeProperty owl:DeprecatedClass owl:DeprecatedProperty owl:differentFrom owl:disjointWith owl:distinctMembers owl:equivalentClass owl:equivalentProperty owl:FunctionalProperty owl:hasValue owl:imports owl:incompatibleWith owl:intersectionOf owl:InverseFunctionalProperty owl:inverseOf owl:maxCardinality

owl:minCardinality owl:Nothing owl:ObjectProperty owl:oneOf owl:onProperty owl:Ontology owl:priorVersion owl:Restriction owl:sameAs owl:someValuesFrom owl:SymmetricProperty owl:Thing owl:TransitiveProperty owl:unionOf owl:versionInfo rdf:List rdf:nil rdf:type rdfs:comment rdfs:Datatype rdfs:domain rdfs:label rdfs:Literal rdfs:Literal rdfs:range rdfs:subClassOf rdfs:subPropertyOf

A subset of OWL is used

Single statement are RDF-like triple[subject] [predicate] [object]

Name Space are usedcob:instanceOfowl:Classrdfs:subClassOf

Core COB language is defined in it’s own namespace (see right)


Part A (3): Instance Example

# [cob:Instance] # [cob:instanceOf] [Student] # [cob:instanceOf] [Chinese]# [cob:equals][ 鲍捷 ]# [hasSurname] Bao# [hasFirstname] Jie# [worksOn] [semanticWeb]# [worksOn] [MAS]# [worksOn] [complexSystem]# [advisedBy] [Honavar]# [memberOf] [aiLab]# [hasEmail] [email protected]# [hasHomepage] http://www.cs.iastat

e.edu/~baojie# [cob:documentation] Hi, I love cats

BaoJie

cob:Instance cob:instanceOf Student? cob:instanceOf Chinese? cob:equals 鲍捷 hasSurname Bao hasFirstname Jie worksOn semanticWeb? worksOn MAS? worksOn complexSystem? advisedBy Honavar? memberOf aiLab? hasEmail [email protected] hasHomepage http://www.cs.iastate.edu/~ba

ojie cob:documentation Hi, I love cats

Edit this page More info... Attach file... Source Screen shows


Part A (4): Name Space

Java-like package naming, which shows the relatedness of concepts even when they don’t inherit from the same concept.

Packages are in DAG Internationalization

is enabled

//cob:Thing.Country.US.Iowa.Ames.ISU//cob:Thing.Education.University.Iowa.ISU

[cob:instanceOf] [PublicUniversity][cob:instanceOf] [dmoz:University][cob:equals] [Iowa State University]

// cobZH: 事物 . 美国大学 . 艾奥瓦州立大学

[cob:language] zh // Chinese[cob:equals] [cob:Thing.Country.US.Iowa.Ames.ISU]

//cob:Thing.Education.University.Idaho.ISU

[cob:instanceOf] [PublicUniversity][cob:instanceOf] [dmoz:University][cob:equals] [Idaho State University]


Part B: Semantic RSS

RSS has no semantics

We can use Dublin Core to enhance RSS

Keywords are concepts or concept candidates in the ontology

Agents listen to S-RSS channels and discover new concepts

<channel rdf:about="http://boole.cs.iastate.edu:9090/COB/"> <title>COB Project</title> <link>http://boole.cs.iastate.edu:9090/COB/</link> <description>AI Ontology</description> <language>en-us</language> <items> <rdf:Seq> <rdf:li rdf:resource="http://boole.cs.iastate.edu:9090/COB/Wiki.jsp?page=Main" /> </rdf:Seq> </items> </channel>

<item rdf:about="http://boole.cs.iastate.edu:9090/COB/Wiki.jsp?page=Main"> <title>Main</title> <link>http://boole.cs.iastate.edu:9090/COB/Wiki.jsp?page=Main</link> <description>129.186.93.7 changed this page on Wed Dec 03 19:18:23 CST 2003:<br /><hr /><br /></description> <wiki:version>27</wiki:version> <wiki:diff>http://boole.cs.iastate.edu:9090/COB/Diff.jsp?page=Main&r1=-1</wiki:diff> <dc:date>2003-12-04T01:18:23Z</dc:date> <dc:contributor> <rdf:Description> <rdf:value>129.186.93.7</rdf:value> </rdf:Description> </dc:contributor> <wiki:history>http://boole.cs.iastate.edu:9090/COB/PageInfo.jsp?page=Main</wiki:history> </item>


Part C (1): Agent

Each agent does Trace back information source and check its credibility. Do filtering and text normalization Extract new concept from instances Extract possible general relationship (like [cob:alsoSee]) between conce

pts And they may differs

Not necessarily should use the same learning algorithm Learning from email header are different from learning from free text co

ntent Dialect

Agent 1: I listens to Idaho S.U. maillist and know ISU = Idaho State University

Agent 2: I watch a blog in Iowa and know ISU = Iowa State University Communication helps

Agent 1: P([M. Jackson]^[CountryMusic])=0.1 Agent 2: P([M. Jackson]^[CountryMusic])=0.03


Part C (2): Ontology Alignment

Do mapping on restricted cases When an agent or expert doubts if some co

ncepts are same, it will ask OntologyAlignmenter with instance set

Merge detected duplicated concepts like IraqWar and WarInIraq

be careful: UniversityOfWashington, WashtingtonUniversity are different. It can be learnt from instances.

Manual alignment enabled, too


Part D : Ontology Repository

Version control Keep version for each concept, lock mature concepts, detec

t malicious changes Redundancy check

[I.S.U] [cob:instanceOf] [University][I.S.U] [cob:alsoSee] [Cyclone]

[Iowa Stete University] [cob:instanceOf] [PublicUniversity][Iowa Stete University] [cob:alsoSee] [Cyclone]

[PublicUniversity] [cob:subClassOf][University] Conflict check

[ISU] [locatedIn] [Ames] [ISU] [locatedIn] [Des Moines]

Cross validation Score agent and expert for it’s credibility Check soundness of inputs from it’s peer inputs.

Refactoring (rename, remove, merge)


Summary

What’s new Light-weight ontology editor for community Collaborative, distributed ontology learning based

on logic decomposition Semantic extension to RSS Mulitagent ontology mining from trusted channel. Do ontology management based on proof and trust

COB doesn't want to Solve ontology mapping in general Solve ontology extract from free text in general

collaborative ontology building project

Technology