almaden services research almaden research center, san jose, ca 20 april 2006 multifaceted approach...

7
Almaden Services Research Almaden Research Center, San Jose , CA 20 April 2006 Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism, tools, services, and standards, and social collaboration E. Michael (Max) Maximilien Almaden Services Research http:// maximilien.org

Upload: timothy-hunter

Post on 27-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Almaden Services Research Almaden Research Center, San Jose, CA 20 April 2006 Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism,

Almaden Services Research

Almaden Research Center, San Jose, CA 20 April 2006

Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism, tools, services, and standards, and social collaboration

E. Michael (Max) Maximilien Almaden Services Research

http://maximilien.org

Page 2: Almaden Services Research Almaden Research Center, San Jose, CA 20 April 2006 Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism,

Almaden Services Research

2 Almaden Research Center, San Jose, CA 20 April 2006

Approach and thesis

Two key problems

– Lack of pragmatism in the goals of ontologies

– Heterogeneity of usage and use cases

Summary of approach

– Simple tagging for human collaboration (folksonomies) as well as rating systems for content parts

– Covert audio automatically into annotated text transcript

– Mining tools to automate annotation of content and infer taxonomies

– Ontology for outline of content

Secret sauce is in how we combine the semantics, i.e., algorithm, and the use cases we try to solve

Page 3: Almaden Services Research Almaden Research Center, San Jose, CA 20 April 2006 Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism,

Almaden Services Research

3 Almaden Research Center, San Jose, CA 20 April 2006

Tagging and ratings – Human collaboration Tagging

– Idiosyncratic

– Results in bag of tags forming folksonomies

– Various available services, e.g., http://del.icio.us, http://flikr.com, and so on

– Need incentives for humans, e.g., easier search

– Evolving into some form of “ontology” (see Peter Mika’s paper “Ontologies are us: A unified model of social networks and semantics” at ICSW 2005)

Ratings

– Enables feedback

– Rate ratings to avoid collusion

– Similar to http://digg.com, Amazon’s rating system, and eBay.com reputation system (various works in literature)

Page 4: Almaden Services Research Almaden Research Center, San Jose, CA 20 April 2006 Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism,

Almaden Services Research

4 Almaden Research Center, San Jose, CA 20 April 2006

Audio content

Automated transcript

– Use services to convert audio to text transcript

– Some services, e.g., http://podzinger.com, also annotate the transcript and do more than close captions

– May involve human collaboration to gradually improve content (especially resolving context errors)

Issues

– ONTOLOG audio (Podcast) have some low quality MP3

– Static noise and “voice storms”

Page 5: Almaden Services Research Almaden Research Center, San Jose, CA 20 April 2006 Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism,

Almaden Services Research

5 Almaden Research Center, San Jose, CA 20 April 2006

Mining

Automatic annotation of content

– Mature tool set in UIMA

– Others (?)

– Generate initial taxonomy

– Continual process to update annotation

Dr. David Ferrucci (IBM Research) lead architect of UIMA project to present to community on May 11, 2006

Page 6: Almaden Services Research Almaden Research Center, San Jose, CA 20 April 2006 Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism,

Almaden Services Research

6 Almaden Research Center, San Jose, CA 20 April 2006

Ontology

Outline

– Create initial outline of site content with some upper ontology

– Reuse existing ontology

IMO this ontology can be specific to ONTOLOG

What are the primary goals for this outline and ontology?

– Cataloguing

– Search (why not just use Google services?)

– Statistics (why not just use Amazon’s Alexa services?)

– Others (?)

Page 7: Almaden Services Research Almaden Research Center, San Jose, CA 20 April 2006 Multifaceted approach to ontologizing the ONTOLOG content Rooted in pragmatism,

Almaden Services Research

7 Almaden Research Center, San Jose, CA 20 April 2006

Thank You

MerciGrazie

Gracias

Obrigado

Danke

Japanese

English

French

Russian

German

Italian

Spanish

Brazilian Portuguese

Arabic

Traditional Chinese

Simplified Chinese

Hindi

Tamil

Thai

Korean