hans uszkoreit german research center for artificial intelligence and saarland university hans...

95
Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Semantic Annotation and Hyperlinking for Associative Digital Memories Vision, Methods, Applications

Upload: sabrina-johns

Post on 25-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

Hans UszkoreitGerman Research Center for Artificial Intelligence

and Saarland University

Hans UszkoreitGerman Research Center for Artificial Intelligence

and Saarland University

Semantic Annotation and Hyperlinkingfor

Associative Digital Memories Vision, Methods, Applications

Semantic Annotation and Hyperlinkingfor

Associative Digital Memories Vision, Methods, Applications

Page 2: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

The Vision of the Semantic Web The Vision of the Semantic Web

A new development in web technology is aimed at structuring some of the rich knowledge contained in unstructured data. The envisaged result will be a growing layer of formalized knowledge above and associated with the wealth of unstructured data.A multitude of ontologies will provide the conceptual texture for annotating rich unstructured content. The result will be a semantically structured densely associated web of knowledge.

A new development in web technology is aimed at structuring some of the rich knowledge contained in unstructured data. The envisaged result will be a growing layer of formalized knowledge above and associated with the wealth of unstructured data.A multitude of ontologies will provide the conceptual texture for annotating rich unstructured content. The result will be a semantically structured densely associated web of knowledge.

Page 3: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Structure of the Semantic WebStructure of the Semantic Web

The well-known layer cake of the Semantic Web proposedby Tim Berners-Lee employs... • XML for markup,

• relational ontologies as the basis for describinginformation resources,

• RDF coded in XMLas the language for suchsemantic descriptions,

• a logic language such asOWL coded in RDFas the format for further logical descriptions suchas rules and constraints.

The well-known layer cake of the Semantic Web proposedby Tim Berners-Lee employs... • XML for markup,

• relational ontologies as the basis for describinginformation resources,

• RDF coded in XMLas the language for suchsemantic descriptions,

• a logic language such asOWL coded in RDFas the format for further logical descriptions suchas rules and constraints.

Page 4: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Semantic Web and Language TechnologySemantic Web and Language Technology

I will first point out five central issues for language technology resulting from the visison of the Semantic Web.

I will then briefly argue that there are no feasible models for realizing the semantic web through creation or evolution.

Next, I will argue that there is a less ambitious stage of a semantically enriched web, that can be realized gradually.

This vision is built on the notion of associative digital memories lying in between digital repositories and digital knowledge.

Then I will describe the language and web technologies needed for realizing such digital memories.

I will first point out five central issues for language technology resulting from the visison of the Semantic Web.

I will then briefly argue that there are no feasible models for realizing the semantic web through creation or evolution.

Next, I will argue that there is a less ambitious stage of a semantically enriched web, that can be realized gradually.

This vision is built on the notion of associative digital memories lying in between digital repositories and digital knowledge.

Then I will describe the language and web technologies needed for realizing such digital memories.

Page 5: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Semantic Web and Language Technology 1

Semantic Web and Language Technology 1

The employment of language technology for the construction of useful ontologies:

One of the shortcomings of hand-crafted AI ontologies has been their artificial nature. Useful ontologies do rarely meet the high aesthetic standards of philosophers or domain-specialized theoreticians. Can data-oriented language technology facilitate the detection of useful ontologies that reflect the needs and daily tasks of their users?

The employment of language technology for the construction of useful ontologies:

One of the shortcomings of hand-crafted AI ontologies has been their artificial nature. Useful ontologies do rarely meet the high aesthetic standards of philosophers or domain-specialized theoreticians. Can data-oriented language technology facilitate the detection of useful ontologies that reflect the needs and daily tasks of their users?

Page 6: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Semantic Web and Language Technology 2

Semantic Web and Language Technology 2

The exploitation of Semantic Web ontologies for LT applications such as information extraction:

• Domain modelling is a serious bottleneck for many language technology applications. Can the Semantic Web movement help us by providing well-designed ontologies for a multitude of knowledge domains?

The exploitation of Semantic Web ontologies for LT applications such as information extraction:

• Domain modelling is a serious bottleneck for many language technology applications. Can the Semantic Web movement help us by providing well-designed ontologies for a multitude of knowledge domains?

Page 7: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Semantic Web and Language Technology 3

Semantic Web and Language Technology 3

The challenge of (partially) automating the detection and annotation of concepts:

• One of the major shortcomings of the original Semantic Web vision is its reliance on extensive hand annotation of large volumes of digital resources. As we know from daily experience, content developers (authors) do not even exploit the modest means for encoding meta-information that is provided by HTML. They do not have the time and patience to find and insert the most useful hyperlinks. How can one expect that the web will become semantified by human annotation?

The challenge of (partially) automating the detection and annotation of concepts:

• One of the major shortcomings of the original Semantic Web vision is its reliance on extensive hand annotation of large volumes of digital resources. As we know from daily experience, content developers (authors) do not even exploit the modest means for encoding meta-information that is provided by HTML. They do not have the time and patience to find and insert the most useful hyperlinks. How can one expect that the web will become semantified by human annotation?

Page 8: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Semantic Web and Language Technology 4

Semantic Web and Language Technology 4

The utilization of the Semantic Web as a resource for machine learning in NLP:

• Supervised learning from hand-annotated texts plays a major role in language technology research and development. Will the Semantic Web movement create large volumes of annotated texts? Can these texts be used for machine learning techniques that improve topic detection, information extraction, question answering and other language technologies? Can systems for automatic annotation be trained in a bootstrapping fashion?

The utilization of the Semantic Web as a resource for machine learning in NLP:

• Supervised learning from hand-annotated texts plays a major role in language technology research and development. Will the Semantic Web movement create large volumes of annotated texts? Can these texts be used for machine learning techniques that improve topic detection, information extraction, question answering and other language technologies? Can systems for automatic annotation be trained in a bootstrapping fashion?

Page 9: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Is the vision realistic?Is the vision realistic?

Authors make little use of the available means of annotation/markup such as

~ hyperlinks~ metainformation

The enrichment of the available volumes of digital information is a huge task.

Authors make little use of the available means of annotation/markup such as

~ hyperlinks~ metainformation

The enrichment of the available volumes of digital information is a huge task.

Page 10: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Semantic Web and Language Technology 5

Semantic Web and Language Technology 5

The relationship between the Semantic Web and multilinguality:

• The planned dense semantic markup will facilitate cross-lingual navigation and information retrieval. Will the semantic web really contribute to overcoming language barriers by making information better accessible across languages? Will contents in all languages be annotated and crosslinked at the same time and in comparable proportions? What is the role of language technology in this process? Will the Semantic Web help to reduce the knowledge gap among or will this gap be widened?

The relationship between the Semantic Web and multilinguality:

• The planned dense semantic markup will facilitate cross-lingual navigation and information retrieval. Will the semantic web really contribute to overcoming language barriers by making information better accessible across languages? Will contents in all languages be annotated and crosslinked at the same time and in comparable proportions? What is the role of language technology in this process? Will the Semantic Web help to reduce the knowledge gap among or will this gap be widened?

Page 11: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

the concept of digital memory

automatic semantic hyperlinking

personal digital memories

collective digital memories

conclusions

the concept of digital memory

automatic semantic hyperlinking

personal digital memories

collective digital memories

conclusions

outlineoutline

Page 12: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

digital libraries

digital archives

digital knowledge

digital memories

digital libraries

digital archives

digital knowledge

digital memories

more than metaphors?more than metaphors?

Page 13: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

(associative) memory(associative) memory

stored information

associatively interconnected

immediately accessible by association

grounded in experience

Special form of memory: episodic memory

stored information

associatively interconnected

immediately accessible by association

grounded in experience

Special form of memory: episodic memory

Page 14: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

knowledgeknowledge

stored information

strongly semantically interconnected

immediately accessible

suited for inferencing

grounded in more basic knowledge and

perception

stored information

strongly semantically interconnected

immediately accessible

suited for inferencing

grounded in more basic knowledge and

perception

Page 15: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

associationsassociations

neighborhood in a high-dimensional space

accessibility paths

connections in a graph

neighborhood in a high-dimensional space

accessibility paths

connections in a graph

Page 16: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

hyperlinkshyperlinks

the concept behind the success of the Internet

hypertext: associately interconnected text

hypermedia: associately interconnected medial representation of information

association is more than a reference it is an access mechanism

the concept behind the success of the Internet

hypertext: associately interconnected text

hypermedia: associately interconnected medial representation of information

association is more than a reference it is an access mechanism

Page 17: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

THE ONE-CLICK APPROACHTHE ONE-CLICK APPROACH

New wireless voice technology introduced Posted at 5:09 PM PT, Feb 8, 1999

By Stephen Lawson, InfoWorld Electric

NTT Labs on Monday brought Dick Tracy into the enterprise, introducing a wireless voice and data system that can use a wrist radio at the Demo 99 conference.

AirWave technology, demonstrated for the first time in the United States at this week's confe- rence in Indian Wells, Calif., is based on a wireless PBX. Small, handheld phones -- and a wrist radio that looks like an oversized watch -- can be used to make voice calls and exchange data around a building or campus. The handheld phones can be switched to a public cellular mode to become conventional cell phones.

Company representatives touted the system as offering higher voice quality than a typical PBX. Airwave is based on NTT's Personal Handyphone System, which is currently deployed by more than 600 users in Japan, according to the company.

Modems built in to both devices allow users to plug in a notebook or portable device for dial-up data connections as fast as 64Kbps. Users can exchange files or e-mail, or access a LAN or the Internet. There is no airtime charge for AirWave communications in the building or campus. AirWave systems are scheduled to be available through distribution partners by the end of this

year, priced as low as $400 per user.

NTT Labs, the research and development arm of NTT Corp., in Tokyo, can be reached at www.nttlabs.com.

Company InfoHomepageOther News ProductsIndicatorsContact ExpertsContacts Accounts

Company InfoHomepageOther News ProductsIndicatorsContact ExpertsContacts Accounts

Page 18: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Language TechnologyLanguage Technology

recognition of domain-relevant named entities with statistical and rule-based methods

tolerance with respect to morphological and syntactic variation

recognition of synonyms

exploitation of thesauri and ontologies with conceptual relations

recognition of syntactic functions and thematic roles for appropriate anchor specification

annotation of documents with hyperlink designators

recognition of domain-relevant named entities with statistical and rule-based methods

tolerance with respect to morphological and syntactic variation

recognition of synonyms

exploitation of thesauri and ontologies with conceptual relations

recognition of syntactic functions and thematic roles for appropriate anchor specification

annotation of documents with hyperlink designators

Page 19: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Web TechnologyWeb Technology

Hyperlinks need to be:

relational

typed

external

possibly multidirectional

Hyperlinks need to be:

relational

typed

external

possibly multidirectional

Page 20: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

functional hyperlinksfunctional hyperlinks

today's hyperlinks are

functional

unidirectional

untyped

today's hyperlinks are

functional

unidirectional

untyped

Page 21: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

relational hyperlinksrelational hyperlinks

Relational Hyperlink {person, homepage

person, email-address}

Relational Labelled Hyperlink

{person, „homepage“, homepage

person, „email“, email-address}

Relational Typed Labelled Hyperlink

person: {person, „homepage“, homepage

person, „email“, email-address}

Page 22: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Relational Hyperlinks as TypesRelational Hyperlinks as Types

FunRef1 target1FunRef2 target2. .. .. .FunRefn targetntype

HomepagehomepagesStocks get–from–NYSE

News

CNR–briefscnr–bulletinPaperball paperballReuters get–reutersOlderNewscnr–archivenewscompany

Key Account type–of–accountMarketingContactka–managerAccountAccess secure–connect–ka–DBka–customer

Page 23: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Link OntologiesLink Ontologies

Link Ontologies

and link DBs

Link Ontologies

and link DBs

Page 24: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Customized OntologiesCustomized Ontologies

Ontologies can be customized by

Extension

Expansion

Overwriting

Merging

Ontologies can be customized by

Extension

Expansion

Overwriting

Merging

Page 25: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

RecursionRecursion

A type can have an attribute a with the value

person :=

name: title: string first_name: string other_given_names: string last_name: string aka.: string

...father: person...

A type can have an attribute a with the value

person :=

name: title: string first_name: string other_given_names: string last_name: string aka.: string

...father: person...

Page 26: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

RecursionRecursion

The embedded type can can be expanded:

person :=

name: title: stringfirst_name: stringother_given_names: stringlast_name: stringaka.: string

...father: name: name

...father: person...

Page 27: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Multiple InheritanceMultiple Inheritance

location

palace

building

Page 28: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

EqualityEquality

person :=

name: title: stringfirst_name: stringother_given_names: stringlast_name: 1stringaka.: string

...father: name: last_name: 1string

...father: person ...

Page 29: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

ExtensionExtension

New attributes are added

examples:

• new attribute: restored• new attribute for: citation in <Coleman et

al.>

New attributes are added

examples:

• new attribute: restored• new attribute for: citation in <Coleman et

al.>

Page 30: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

ExpansionExpansion

An atomic type is expanded into an AVM

location: Berlin is expanded into an address

technique: oil_on_canvas

is expanded into canvas:paints:layers:etc.

An atomic type is expanded into an AVM

location: Berlin is expanded into an address

technique: oil_on_canvas

is expanded into canvas:paints:layers:etc.

Page 31: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

OverwritingOverwriting

The value of attributes can be overwritten

for corrections

alternative pointers to information sources

alternative representations

The value of attributes can be overwritten

for corrections

alternative pointers to information sources

alternative representations

Page 32: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

MergingMerging

The attributes of concepts from two ontologies can be mergedontologies from different disciplinesor from a discipline and a metadata initiativeequality can be employed to state identity between valuesexample: the value of "creator" in the Dublin Corecan be set equal to "author" in BibTex bibliography format

The attributes of concepts from two ontologies can be mergedontologies from different disciplinesor from a discipline and a metadata initiativeequality can be employed to state identity between valuesexample: the value of "creator" in the Dublin Corecan be set equal to "author" in BibTex bibliography format

Page 33: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Problem: AmbiguityProblem: Ambiguity

Im Jahr 1942 wurde von Essen in einer kleinen Stadt in Südschweden geboren.

In the year 1942 von Essen was born in a small town in

the south of Sweden.

"Essen" may be

• the name of a city• the plural of "Esse" meaning smokestack• the word for food• a family-name, • the name of a Bank "Von Essen Bank"

Im Jahr 1942 wurde von Essen in einer kleinen Stadt in Südschweden geboren.

In the year 1942 von Essen was born in a small town in

the south of Sweden.

"Essen" may be

• the name of a city• the plural of "Esse" meaning smokestack• the word for food• a family-name, • the name of a Bank "Von Essen Bank"

Page 34: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Problem: Polysemy, Aspects, ViewsProblem: Polysemy, Aspects, Views

Often a descriptor or designator can be used in different aspects of meaning.

One of the sources of this type of uncertainty is systematic polysemy.

Another one is the aspects or views associated with a context or a user type.

Often a descriptor or designator can be used in different aspects of meaning.

One of the sources of this type of uncertainty is systematic polysemy.

Another one is the aspects or views associated with a context or a user type.

Page 35: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

PolysemyPolysemy

The assembly takes five minutes.

The assembly is in Building Five.

The iBook has a G3 processor and a DVD drive.

The PowerBook can be checked ouat but the iBook is

currently in use by the project BABEL.

CNN has a special at 5 p.m.

Then he became Senior Vice President of CNN.

The assembly takes five minutes.

The assembly is in Building Five.

The iBook has a G3 processor and a DVD drive.

The PowerBook can be checked ouat but the iBook is

currently in use by the project BABEL.

CNN has a special at 5 p.m.

Then he became Senior Vice President of CNN.

Page 36: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

AspectsAspects

The iBook has a G3 processor and a DVD drive.

The iBook is reduced by 15% in our clearance sale.

Peter Norman will answer your questions.

The new Department Chair is Peter Norman.

The iBook has a G3 processor and a DVD drive.

The iBook is reduced by 15% in our clearance sale.

Peter Norman will answer your questions.

The new Department Chair is Peter Norman.

Page 37: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

The Ultimate Information ManagementThe Ultimate Information Management

Provide:

the right information

to the right people

in the right time

and in the right form

Provide:

the right information

to the right people

in the right time

and in the right form

Page 38: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

decision triggersdecision triggers

All kinds of forms requiring

• Approvals,

• Recommendations,

• Selections

Examples:

• Application for a Building Permit

• Credit application

• Request for a comment on a hiring decision

Good decision triggers contain information relevant for the

decision or references to such pieces of information

All kinds of forms requiring

• Approvals,

• Recommendations,

• Selections

Examples:

• Application for a Building Permit

• Credit application

• Request for a comment on a hiring decision

Good decision triggers contain information relevant for the

decision or references to such pieces of information

Page 39: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Short Pieces of Information (e.g., translation into Spanish)Regular Hyperlink (e.g., homepage)DB Access (e.g., lookup of account status)Start of a Process (e.g., start a credit check)Notification of a person (e.g., send query to expert)Search out of context (e.g., search in inter-, intra-. extranet)

Short Pieces of Information (e.g., translation into Spanish)Regular Hyperlink (e.g., homepage)DB Access (e.g., lookup of account status)Start of a Process (e.g., start a credit check)Notification of a person (e.g., send query to expert)Search out of context (e.g., search in inter-, intra-. extranet)

Possible Targets Possible Targets

Page 40: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

first step: densely hyperlinked textsfirst step: densely hyperlinked texts

in the ideal case: every meaningful unit carries typed

relational hyperlinks

words, names, symbols, pictures, elements of pictures,

in the ideal case: every meaningful unit carries typed

relational hyperlinks

words, names, symbols, pictures, elements of pictures,

Page 41: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Possible TargetsPossible Targets

Short Pieces of Information (e.g., translation into Spanish)Regular Hyperlink (e.g., homepage)DB Access (e.g., lookup of account status)Start of a Process (e.g., start a credit check)Notification of a person (e.g., send query to expert)Search in a context (e.g., search in inter-, intra-. extranet)

Short Pieces of Information (e.g., translation into Spanish)Regular Hyperlink (e.g., homepage)DB Access (e.g., lookup of account status)Start of a Process (e.g., start a credit check)Notification of a person (e.g., send query to expert)Search in a context (e.g., search in inter-, intra-. extranet)

Page 42: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

ApplicationsApplications

Enrichment of specialized Web Sitesexample: SOG and Saarland Online

Enrichment of Portalsexample: LT World

Email Processingexample: MailMinder Extension

Legacy Code example: Dresdner Bank HyperCode

Information and Knowledge Managementno example yet

Associative Digital Memories

Enrichment of specialized Web Sitesexample: SOG and Saarland Online

Enrichment of Portalsexample: LT World

Email Processingexample: MailMinder Extension

Legacy Code example: Dresdner Bank HyperCode

Information and Knowledge Managementno example yet

Associative Digital Memories

Page 43: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

second step: grounding in experience second step: grounding in experience

From densely associative hypertexts to episodically enriched memoriescalendar, biography, timelinemediasituations

From densely associative hypertexts to episodically enriched memoriescalendar, biography, timelinemediasituations

Page 44: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

example 1: personal digital memoryexample 1: personal digital memory

calendar: 2000-12000 calender entries

email: 20.000-100.000 messages

addresses: 100-2000 addresses

photographs: 1000-30000 pictures

written papers, reports, reviews: 100-1000 documents

music: 500-5000 titles

talks: 50-500 slide sets

read electronic papers: 200-2000 documents

visited web-pages: 20.000-100.000 pages

calendar: 2000-12000 calender entries

email: 20.000-100.000 messages

addresses: 100-2000 addresses

photographs: 1000-30000 pictures

written papers, reports, reviews: 100-1000 documents

music: 500-5000 titles

talks: 50-500 slide sets

read electronic papers: 200-2000 documents

visited web-pages: 20.000-100.000 pages

Page 45: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

example 1: personal digital memory

example 1: personal digital memory

establish a set of relevant entities and concepts

• dates

• persons

• themes

• locations

• functions

• sources

find connections among email, calendar, photographs, addresses, papers,

establish a set of relevant entities and concepts

• dates

• persons

• themes

• locations

• functions

• sources

find connections among email, calendar, photographs, addresses, papers,

Page 46: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

example 1: personal digital memory

example 1: personal digital memory

entry points

• words

• names

• topics

• dates

dynamically adapt the links to new entities and concepts

entry points

• words

• names

• topics

• dates

dynamically adapt the links to new entities and concepts

Page 47: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

long term vision: third level long term vision: third level

First Level: structure the personal information space, i.e., information that is already on your machine, e.g., texts, correspondence, direct messages (SMS etc.), calendar, graphics

Second Level: add personal archives by digitizing additional information: e.g., photographs, personal sound recordings, musical records, movie clips, etc.

Third Level: add episodal memory (life records), create extensive sound (and image) archives of selected episodes of your daily life, including meetings and pictures of people, sights documents

Not manageable without dense associative hyperlinking

First Level: structure the personal information space, i.e., information that is already on your machine, e.g., texts, correspondence, direct messages (SMS etc.), calendar, graphics

Second Level: add personal archives by digitizing additional information: e.g., photographs, personal sound recordings, musical records, movie clips, etc.

Third Level: add episodal memory (life records), create extensive sound (and image) archives of selected episodes of your daily life, including meetings and pictures of people, sights documents

Not manageable without dense associative hyperlinking

Page 48: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

example 1: personal digital memory

example 1: personal digital memory

related projects

• LifeBits (Microsoft)

• Haystack (MIT)

and relevant but less related

• LifeLog (DARPA)

related projects

• LifeBits (Microsoft)

• Haystack (MIT)

and relevant but less related

• LifeLog (DARPA)

Page 49: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

example 1: personal digital memory

example 1: personal digital memory

related projects

• LifeBits (Microsoft)

• Haystack (MIT)

and relevant but less related

• LifeLog (DARPA)

related projects

• LifeBits (Microsoft)

• Haystack (MIT)

and relevant but less related

• LifeLog (DARPA)

Page 50: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

example 2: collective/social memories

example 2: collective/social memories

historical memories

memories of scientific developments

a combination of both

historical memories

memories of scientific developments

a combination of both

Page 51: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

large digital archives

first attempts to annotate and interlink

linguistic data collection and annotation

projects Perseus and Archimedes

annotation by markup (XML stand-off markup)

large digital archives

first attempts to annotate and interlink

linguistic data collection and annotation

projects Perseus and Archimedes

annotation by markup (XML stand-off markup)

Humanities & Social Sciences Humanities & Social Sciences

Page 52: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

from individual to collective research

individual research groups joint projects community research

working on shared interpreted data creates new collaboration communities

by e-science new paradigms of humanities research are emerging

from individual to collective research

individual research groups joint projects community research

working on shared interpreted data creates new collaboration communities

by e-science new paradigms of humanities research are emerging

Humanities & Social Sciences Humanities & Social Sciences

Page 53: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

interpretation by multilayer stand-off annotation

interpretation of data does not destroy data

interpretations can refer to other interpretations

replicability of results by sharing of data and

disclosure

of interpretations

interpretation by multilayer stand-off annotation

interpretation of data does not destroy data

interpretations can refer to other interpretations

replicability of results by sharing of data and

disclosure

of interpretations

Humanities & Social Sciences IIHumanities & Social Sciences II

Page 54: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Disciplines in the humanities exhibit large overlaps in a primary sources and data.

We expect that many digitized sources can be exploited by more than one discipline.

Moreover, we can expect that also the scientific interpretations of the data may be used in parts across traditional disciplines.

Disciplines in the humanities exhibit large overlaps in a primary sources and data.

We expect that many digitized sources can be exploited by more than one discipline.

Moreover, we can expect that also the scientific interpretations of the data may be used in parts across traditional disciplines.

Humanities & Social Sciences IIIHumanities & Social Sciences III

Page 55: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Topics and data are largely culture-dependent and often language-specific.

Digital data collections can be anywhere

• Wittgenstein Archives in Bergen• Corpus of historical English in Helsinki• Roman Texts at Tufts University• Japanese Grammar in Saarbruecken

By shared methodology, new forms of transcultural and

multilingual research will become possible.

Topics and data are largely culture-dependent and often language-specific.

Digital data collections can be anywhere

• Wittgenstein Archives in Bergen• Corpus of historical English in Helsinki• Roman Texts at Tufts University• Japanese Grammar in Saarbruecken

By shared methodology, new forms of transcultural and

multilingual research will become possible.

Humanities & Social Sciences IVHumanities & Social Sciences IV

Page 56: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Natural Sciences vs. HumanitiesNatural Sciences vs. Humanities

Data in the natural sciences and engineering (NS&E) are often so-called structured data such as databases, time sequences of measurements or matrices of numeric values.Scientific data in NS&E can also be very large two - four dimensional images (pictures, spacial images, videos, VR scenes), cases of so-called unstructured data.Most data in the humanities are so-called unstructured data: texts, pictures, sound files, films...

For the researcher in the humanities, these unstructured data possess much more structure–on the conceptual level– than the structured data in databases.

Data in the natural sciences and engineering (NS&E) are often so-called structured data such as databases, time sequences of measurements or matrices of numeric values.Scientific data in NS&E can also be very large two - four dimensional images (pictures, spacial images, videos, VR scenes), cases of so-called unstructured data.Most data in the humanities are so-called unstructured data: texts, pictures, sound files, films...

For the researcher in the humanities, these unstructured data possess much more structure–on the conceptual level– than the structured data in databases.

Page 57: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Densely Associated ContentDensely Associated Contentvidete nunc quo adfectent iter apertius quam antea. nam superiore parte legis

quem ad modum Pompeium oppugnarent, a me indicati sunt;

nunc iam se ipsi indicabunt. iubent venire

agros Attalensium atque Olympenorum quos populo

Romano Servili, fortissimi viri, victoria adiunxit, deinde agros in Macedonia regios qui partim T.

Flaminini, partim L. Pauli qui Persen vicit virtute

parti sunt, deinde agrum optimum et fructuosissimum

Corinthium qui L. Mummi imperio ac felicitate ad vectigalia populi

Romani adiunctus est, post autem agros in Hispania apud

novam duorum Scipionum eximia virtute possessos;

tum vero ipsam veterem Carthaginem vendunt quam P.

Africanus nudatam tectis ac moenibus sive ad notandam

Carthaginiensium calamitatem, sive ad testificandam

nostram victoriam, sive oblata aliqua religione ad

aeternam hominum memoriam consecravit.

Attalia (Attaleia, Antalya)Coastal city in Pamphylia

MapOther ReferencesHistoryComments

My Comments

Attalia (Attaleia, Antalya)Coastal city in Pamphylia

MapOther ReferencesHistoryComments

My Comments

All meaningful units are associated via semantic links with related information distributed all over the digital global knowledge base.

All meaningful units are associated via semantic links with related information distributed all over the digital global knowledge base.

Page 58: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

CULTURAL MEMORYCULTURAL MEMORY

Cultural Heritage can be preserved, shown,

Cultural Heritage

What you have inherited from your fathers, you must acquire in

order to possess.

(Was Du ererbt von deinen Vätern hast, erwirb es, um es zu

besitzen. J.W. v. Goethe)

Memory is much more than storage.

Memory is associative.

Associative memory is the basis for retrieval.

Conceptually structured associative Memory is the basis for

inferencing and learning.

Cultural Heritage can be preserved, shown,

Cultural Heritage

What you have inherited from your fathers, you must acquire in

order to possess.

(Was Du ererbt von deinen Vätern hast, erwirb es, um es zu

besitzen. J.W. v. Goethe)

Memory is much more than storage.

Memory is associative.

Associative memory is the basis for retrieval.

Conceptually structured associative Memory is the basis for

inferencing and learning.

Page 59: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Interdisciplinary InterpretationInterdisciplinary Interpretation

interconnect documents from

literature

political history

history of arts

geography

sociology

etc

interconnect documents from

literature

political history

history of arts

geography

sociology

etc

Page 60: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

linking of data and interpretationlinking of data and interpretation

linking of interpretation and meta-interpretation

creates a history of interpretation

add comments, criticism, approval, further

evidence

link from and to your own work

linking of interpretation and meta-interpretation

creates a history of interpretation

add comments, criticism, approval, further

evidence

link from and to your own work

Page 61: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

The True ChallengeThe True Challenge

The main challenge for the humanities and social sciences is ...the semantic interlinking of their contents across languages, disciplines, cultures, media and formats,to prepare and share treasures of human thought and cultural heritage via the web as data for research and education,to help semantically structure the types of content exhibiting the most complex inherent conceptual structure,to accept an active role in the most exciting process in contem-porary IT affecting human mind, culture and society.

The main challenge for the humanities and social sciences is ...the semantic interlinking of their contents across languages, disciplines, cultures, media and formats,to prepare and share treasures of human thought and cultural heritage via the web as data for research and education,to help semantically structure the types of content exhibiting the most complex inherent conceptual structure,to accept an active role in the most exciting process in contem-porary IT affecting human mind, culture and society.

Page 62: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Conclusion IConclusion I

there is an important step in between digital

information and digital knowledge

human memory is a scarce resource

this is true for personal and social memory

we do not need machines that think for us

but we could surely use some memory

extensions

to be useful these have to be adapted to human

ways of information storage and access

there is an important step in between digital

information and digital knowledge

human memory is a scarce resource

this is true for personal and social memory

we do not need machines that think for us

but we could surely use some memory

extensions

to be useful these have to be adapted to human

ways of information storage and access

Page 63: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Conclusion IIConclusion II

Associative Memories are the natural next step beyond digital content

Digital Content Digital Memories Digital Knowledge

Building associative memories from large document collections cannot be done with todays web technology

Here we need the contributions of language and knowledge technologies

No wire in the head

Associative Memories are the natural next step beyond digital content

Digital Content Digital Memories Digital Knowledge

Building associative memories from large document collections cannot be done with todays web technology

Here we need the contributions of language and knowledge technologies

No wire in the head

Page 64: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

SProUT MotivationSProUT Motivation

A platform for development of multilingual and domain adaptive shallow text processing and information extraction systems

Trade-off between efficiency and expressiveness Modularity (Fine-grained modeling of linguistic components into clear-cut modules)

Portability and industrial standards

A platform for development of multilingual and domain adaptive shallow text processing and information extraction systems

Trade-off between efficiency and expressiveness Modularity (Fine-grained modeling of linguistic components into clear-cut modules)

Portability and industrial standards

Page 65: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

SPROUT ComponentsSPROUT Components

Linguistic Processing Resources

Tokenizer (easily adaptable for indoeuropean languages)

Gazetteer

Morphology Component (6 languages, Japanese under consideration)

Interface to MMorph

Core Tools

JTFS (Implementation without PET)

FSM Toolkit (Adaptation of the FSM Interface to the requirements of the Grammar Interpreter)

Tries for NLP processing

Page 66: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

System Architecture

FINITESTATE TOOLKIT

REGULARCOMPILER

SHALLOWGRAMMAR

INTERPRETER

JTFS

SHALLOWGRAMMAR

EXTENDEDOPTIMIZED

FSTREPRES.

LEXICALRESOURCES

INPUTDATA

STRUCTUREDOUTPUT DATA

G R A M M A R D E V E L O P M E N T E N V I R O N M E N T O N L I N E P R O C E S S I N G

STREAM OFTEXT ITEMS

…. [..] [..] [..] ….

LINGUISTICPROCESSINGRESOURCES

Page 67: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

System ComponentsSystem Components

Linguistic Processing Resources

Tokenizer (easily adaptable for indoeuropean languages)

Gazetteer

Morphology Component (8 languages)

Named Entity Recognition (6 languages)

Core Tools

JTFS (Implementation without PET)

FSM Toolkit

Regular Compiler

Shallow Grammar Interpreter

Tries for NLP processing

Page 68: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

TFS and TFS-XML

TFS as data interchange format in SProUT

unification and subsumption check as basic operations for evaluation

compact XML encoding of typed feature structures (following TEI-SGML)

exchange format for linguistic resources:

• grammars

• feature structure tree banks

exchange format for visualization

Page 69: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

TFS-XML: Example

<FS type="pred_argument"><F name="PRED"> <FS type=„übernehmen"/> </F><F name="AGENT"><FS coref="1" type="argument">

<F name="NAME"> <FS type="Maria_Müller"/> </F>

</FS></F><F name="THEME"><FS coref="2" type="argument">

<F name="NOM"> <FS type="Vorsitz"/> </F></FS>

</F></FS>

Page 70: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Morphological Resources

English 200,000 entries (Mmorph (Multext))

German 830,000 entries (Mmorph (Multext))

French 225,000 entries (Mmorph (Multext))

Spanish 570,000 entries (Mmorph (Multext))

Italian 330,000 entries (Mmorph (Multext))Czech 600,000 entries (Institue of Formal and Applied Linguistics in Prague)

Chinese Shanxi-Tokenizer

Japanese ChaSen

Asian langauge resources

Indo-European language resources

Page 71: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Parser

MMorph

Architecture

MMorph fullform lexica are stored as trieExternal modules (Asian and Czech) are integrated via Client/Server

Tokenizer Shanxi ChaSen

Czech Morph

Page 72: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

A Type-Driven Method for Compacting Mmorph

A Type-Driven Method for Compacting Mmorph

redundant and spurious ambiguous readings German Mmorph: 5.8 readings per wordform in DNF

compacts Mmorph by deletion of redundant readings

substitution of special readings through more general ones using

type generalization and subsumption checking

generation of a type hierarchy

the average number of readings in German is now 1.6

Page 73: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Compacting an MMorph Entry

Compacting an MMorph Entry"evaluierten" = "evaluieren" Adjective[ gender=masc number=singular case=gen|dat|acc] "evaluierten" = "evaluieren" Adjective[ gender=fem|neutrum number=singular case=gen|dat]"evaluierten" = "evaluieren" Adjective[ gender=masc|fem|neutrum number=singular case=nom|gen|dat|acc]"evaluierten" = "evaluieren" Adjective[ gender=masc|fem|neutrum number=plural case=nom|gen|dat|acc]

compacting

"evaluierten" = "evaluieren" Adjective[ gender=fem_masc _neutrum number=singular_plural case=acc_dat_gen_nom]

plural_singular

plural singular

fem_ masc_neutrum

fem_masc fem_neutrum masc_neutrum

fem masc neutrum

Page 74: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

A SProUT Grammar RuleA SProUT Grammar Rule

*

Page 75: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

UnificationUnification

Matched input structure Extended Rule StructureAfter Match

Fully Unified Structure

Page 76: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Title of SlidTitle of Slid

Item 1 Item 2

COLLATE, Scientific Advisory Board Meeting, Saarland University, 22 November 2002

Page 77: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Multilingual Named Entitity GrammarsMultilingual Named Entitity Grammars

Languages

• English, French, German, Spanish

• Chinese, Japanese Grammar Style

• MUC-7 named entity classes with some variations

• NAMEX: person, location, organisation

• TIMEX: time point, time span (instead of date, time)

• NUMEX: percentage, money

• Named entity types with internal attribute-value structures, e.g.,span := timex & [FROM point,

TO point ].

Page 78: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Future WorkFuture Work

Extension of NE grammars to other languages, e.g., Czech, Polish

Grammar Evaluation with JTACO

Efficiency isssues

• Experiments with different search strategies

• Grammar processing optimization

Extension of XTDL expressiveness

• Functional operators

• Seek operator

Page 79: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

SEMANTIC ANNOTATION AND HYPERLINKING • EUROLAN 2003© 2003 H. Uszkoreit

Thank you for your attention...

Page 80: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

DIGITAL EXTENSIONS OF PERSONAL AND COLLECTIVE

MEMORY

Page 81: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

OUTLINE

MOTIVATION

COMPONENTS OF MEMORY

FUNTIONALITIES

SIDE EFFECTS

REALIZATION

Page 82: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

MANMADE MEMORY EXTENSIONS

Extension of short-term memory

scratch pad, abakus

Extension of medium-term memory

note pad

Extension of long term memory

written records, sound recordings, books, storage of data in databases and documents

Page 83: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

MOTIVATION HUMAN DESIRE TO REMEMBER

DIARY MEMOIRS FAMILY COMMUNICATION

EXPLOSIVE GROWTH OF DIGITIZED PERSONAL INFORMATION CORRESPONDENCE (EMAIL, LETTERS) WRITINGS (REPORTS, PAPERS) VISUAL MEMORIES (PHOTOGRAPHS, VIDEOS) ORGANIZERS (CALENDERS, TO-DO-LISTS, ADDRESSES) WEB MATERIAL (BOOKMARKS, VISITED PAGES)

COMPETITIVE ADVANTAGE SETTLING OF DISPUTES BRUSHING UP DETERIORATED KNOWELDGE LEGAL ISSUES

Page 84: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

MEMORY EXTENSION

Memory consists of

Recording Preparation Integration Storage Restructuring Recall

From Storage to Memory to Knowledge

Page 85: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

EXTENDED PERSONAL MEMORY

First Level: structure the personal information space, i.e., information that is already on your machine, e.g., texts, correspondence, direct messages (SMS etc.), calendar, graphics

Second Level: add personal archives by digitizing additional information: e.g., photographs, personal sound recordings, musical records, movie clips, etc.

Third Level: add episodal memory (life records), create extensive sound (and image) archives of selected episodes of your daily life, including meetings and pictures of people, sights documents

Not manageable without dense associative hyperlinking

Page 86: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

EXTENDED COLLECTIVE MEMORY

First level: intranet with databases

Second level: add archives that were kept on paper, films, or sound recordings.

Third level: integrate workflow and intranet, produce records of

meetings, processes, transactions

Not manageable without dense associative hyperlinking

Page 87: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

MEMORY EXTENSION

Memory consists of

Recording Preparation Integration Storage Restructuring Recall

From Storage to Memory to Knowledge

Page 88: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

RECORDING

TYPING IN (PAPERS, MAIL, PRESENTATIONS)

DOWNLOADING (FILES, MAIL, WEB PAGES)

COPYING (CDS, CD-ROMS, DVDS)

SCANNING IN OF IMAGES (PHOTOS, ART)

SCANNING IN AND OCR OF TEXTS

SOUND RECORDING (DICTATION, CONVERSATION)

VIDEO RECORDING (TV, SEEN, MEETINGS, SELF)

Page 89: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

PREPARATION

SOURCE PROPERTIES

TOKENIZATION

POS TAGGING

CLASSIFICATION

NAMED ENTITY RECOGNITION

RELATION DETECTION

TIME MARKING

META DATA PROCESSING

Zur Anzeige wird der QuickTime™ Dekompressor “Foto - JPEG”

benötigt.

Page 90: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

INTEGRATION

META-DATA INDEXING

IR-TYPE INDEXING

CONCEPTUAL INDEXING

Page 91: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

META-DATA INDEXING

ORIGIN (ADDRESS, URL)

KIND OF INFO (TYPE, FORMAT, SIZE)

TIME (CREATED, CHANGES ACCESSED)

THEMATIC (KEYWORDS, AUTHOR, THEME)

Page 92: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

IR-TYPE INDEXING

FULL-TEXT INDEXING STEMMING, MORPHOLOGY, THESAURI? MULTI WORD TERMS TRANSLATION?

CLASSIFICATION

SUMMARIZATION

Page 93: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

CONCEPTUAL INDEXING

CONSTRUCTION OF A DYNAMIC CONCEPTUAL INDEX

IDENTIFICATION OF MAJOR CLASSESPEOPLE, TIMES, EVENTS, LOCATIONS, DOCUMENTS, THEMES, FUNCTIONS, SOURCES, ORGANIZATIONS

ONTOLOGICAL STRUCTURING

LINK DBs

NAMED ENTITY RECOGNITION

LINKING

Page 94: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

ISSUES: PERSONAL MEMORY

GENERAL AND PERSONAL ONTOLOGIES

WEIGHT ASSIGNMENT AND RESTRUCTURING

EXTERNAL LINKS

THE ROLE OF THE TIMELINE

PREACTIVATION & PREFETCHING

SEARCH FROM A CONTEXT

PRIVACY & SECURITY

SAFETY & TRUST

INTEREST MODELS

LANGUAGE MODELS

POSSESSION, IMMORTALITY

Page 95: Hans Uszkoreit German Research Center for Artificial Intelligence and Saarland University Hans Uszkoreit German Research Center for Artificial Intelligence

HANS USZKOREIT SAARBRÜCKEN 17.01.2003

REALIZATION PLAN

STARTING IN PARALLEL:

THEORETICAL CONSIDERATIONS

BRAIN STORMING AND SPECIFICATIONDiscussion Group:Callmeier, Eisele, Erbach, Schäfer, Siegel, Uszkoreit

IMPLEMENTATION OF MOCK-UP AND SYSTEM

RECORDING AND PREPARATION OF DATA (Calendar, Email, Pictures, Papers, WebPages)RA Jobs

DEFINITION AND ACQUISITION OF PROJECTSSFB preparations, EU project for cognition call

Zur Anzeige wird der QuickTime™ Dekompressor “Foto - JPEG”

benötigt.