artequakt harith alani, sanghee kim, wendy hall, paul lewis, david millard, nigel shadbolt, mark...

24
ArtEquAKT ArtEquAKT Harith Alani, Sanghee Kim, Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, Wendy Hall, Paul Lewis, David Millard, Nigel David Millard, Nigel Shadbolt, Mark Weal Shadbolt, Mark Weal

Upload: desiree-overley

Post on 16-Dec-2015

218 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

ArtEquAKTArtEquAKT

Harith Alani, Sanghee Kim, Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Millard, Nigel Shadbolt, Mark

WealWeal

Page 2: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

OverviewOverview

Union of three projects : Artiste, Equator, and Union of three projects : Artiste, Equator, and AKTAKT

Aims:Aims:• Use NLT to automatically extract relevant Use NLT to automatically extract relevant

information about the life and work of artists from information about the life and work of artists from online documentsonline documents

• Feed this information automatically to an ontology Feed this information automatically to an ontology designed for this domaindesigned for this domain

• Generate stories by extracting and structuring Generate stories by extracting and structuring information from the knowledge base in the form information from the knowledge base in the form of biographical narratives in response to user of biographical narratives in response to user requestsrequests

Page 3: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

ObjectivesObjectives

To find out how effective these technologies To find out how effective these technologies are when used togetherare when used together

To explore the way in which the limitations of To explore the way in which the limitations of one process effects the others one process effects the others (e.g. how ambiguity during extraction mind be (e.g. how ambiguity during extraction mind be

reflected at the generation stage)reflected at the generation stage) To generate biographies that might not be as To generate biographies that might not be as

readable as those on the web but which : readable as those on the web but which : contain information that is difficult to find out contain information that is difficult to find out

manuallymanually gather information from disparate sourcesgather information from disparate sources

Page 4: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Ontology

1. E

xtra

ctio

n

Web

webpages

InformationExtraction

ServletsServlets

Servlets

5. In

tera

ctio

n

Narrative Generation

Knowledge Management

3.Consolidation 4. IndexingKB DB

Linky

storytemplate

6. Instantiation

6.In

stan

tiatio

n

KB

2. P

opul

atio

n

7. R

ende

ring

Page 5: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Ontology

1. E

xtra

ctio

n

Web

webpages

InformationExtraction

ServletsServlets

Servlets

5. In

tera

ctio

n

Narrative Generation

Knowledge Management

6.In

stan

tiatio

n

KB

2. P

opul

atio

n

3.Consolidation 4. IndexingKB DB

Linky

storytemplate

6. Instantiation

7. R

ende

ring

InformatioInformationn

ExtractionExtraction

Page 6: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Knowledge Extraction ProcedureKnowledge Extraction Procedure

Ontology

WordNet

Load Resources Downloaded Text

~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~

Paragraph andsentence recognition

Apple Pie Parser(Syntactic Analyser)

Semantic Analysis(Relational Learning)

XML output

Query

~ ~ ~

GATE

Page 7: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Search and Filter DocumentsSearch and Filter Documents

Query search engines (‘Yahoo’, Query search engines (‘Yahoo’, ‘Altavista’) given artist name as a query‘Altavista’) given artist name as a query

Calculate the similarity of retrieved Calculate the similarity of retrieved documents to an example documentdocuments to an example document

Use term frequency with normalisation Use term frequency with normalisation for similarity computation for similarity computation

Apply some heuristics (e.g. sentence Apply some heuristics (e.g. sentence length) to filter out documents which length) to filter out documents which contain mostly tables and/or linkscontain mostly tables and/or links

Page 8: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Relation ExtractionRelation Extraction

Natural language processing techniques to Natural language processing techniques to extract relationextract relation

Guided by an ontology Guided by an ontology Use GATE (General Architecture for Text Use GATE (General Architecture for Text

Engineer) and WordNet for entity recognition Engineer) and WordNet for entity recognition (e.g. person name, place name, or date) (e.g. person name, place name, or date)

Term expansion using WordNet (synonym, Term expansion using WordNet (synonym, hypernym, and hyponym, e.g. ‘depict’ maps to hypernym, and hyponym, e.g. ‘depict’ maps to ‘portray’ (synonym) and ‘represent’ ‘portray’ (synonym) and ‘represent’ (hypernym))(hypernym))

Page 9: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

An ExampleAn Example

Given the sentence:Given the sentence: Rembrandt Harmenszoon van Rijn was Rembrandt Harmenszoon van Rijn was

born on July 15, 1606, in Leiden, the born on July 15, 1606, in Leiden, the NetherlandsNetherlands..

The following facts are extracted:The following facts are extracted:

Rembrandt Harmenszoon van Rijn was born on July 15,1606, in Leiden, the Netherlands

Person name Date

Place

Birth

Page 10: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal
Page 11: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Future Information Extraction Future Information Extraction WorkWork

Incorporate a learning capability in Incorporate a learning capability in extracting relationextracting relation

Need to widen the scope of the NLP tool Need to widen the scope of the NLP tool to increase performanceto increase performance

Extract information about ‘painting’Extract information about ‘painting’ Extract links to painting imagesExtract links to painting images Further investigation about term Further investigation about term

expansion using WordNet (e.g. consider expansion using WordNet (e.g. consider contexts in mapping synonyms or contexts in mapping synonyms or hypernyms)hypernyms)

Page 12: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Ontology

1. E

xtra

ctio

n

Web

webpages

InformationExtraction

ServletsServlets

Servlets

5. In

tera

ctio

n

Narrative Generation

Knowledge Management

3.Consolidation 4. IndexingKB DB

Linky

storytemplate

6. Instantiation

6.In

stan

tiatio

n

KB

2. P

opul

atio

n

7. R

ende

ring

Knowledge Knowledge ManagemeManageme

ntnt

Page 13: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Knowledge ManagementKnowledge Management

Ontology of artists based on CIDOC Ontology of artists based on CIDOC CRMCRM

The ontology The ontology guidesguides the extraction the extraction process process

Populating the Ontology (feeding the Populating the Ontology (feeding the KB)KB)

Knowledge consolidationKnowledge consolidation Ontology server providing a set of Ontology server providing a set of

inference queriesinference queries

Page 14: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Artequakt OntologyArtequakt Ontology

Page 15: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

<Paragraph> <url>Potted_biography.html</url> <text>>In 1631, when Rembrandt's work had become

well known and his studio in Leiden was flourishing, he moved to Amsterdam. He became the leading portrait painter in Holland and received many commissions for portraits as well as for paintings of religious subjects. …..It is estimated that he painted between 50 and 60 self-portraits. </text>

<Painter> <name>Rembrandt</name> <place_of_work>leiden</place_of_work> <has_location>amsterdam</has_location> <number_of_paintings>between 50 and

60 self-portraits</number_of_paintings> </Painter> <Sentence> <url>Potted_biography.html</url> <text>He became the leading portrait painter in Holland and received received many commissions for portraits as well as for paintings of religious subjects</text> <Sentence> <url>Potted_biography.html</url> <text>He became the leading portrait painter in Holland and received</text> <mood>third-person</mood> <tense>past</tense> <order>0</order> </Sentence> ……… </Paragraph>

Populating the OntologyPopulating the Ontology

Page 16: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Knowledge ConsolidationKnowledge Consolidation After extracting info on Rembrandt from After extracting info on Rembrandt from

10 web sites, the KB was populated with 10 web sites, the KB was populated with the following:the following: Rembrandt instance:Rembrandt instance:

26 Rembrandt, 37 Rembrandt Harmenszoon, 2 Van 26 Rembrandt, 37 Rembrandt Harmenszoon, 2 Van RijnRijn

Date of birthDate of birth 15/7/1606, 1606, 1620, 164115/7/1606, 1606, 1620, 1641

Place of birthPlace of birth Leiden, Leyden, Netherlands, HollandLeiden, Leyden, Netherlands, Holland

We need to merge duplications, and verify We need to merge duplications, and verify inconsistencies before we can use this inconsistencies before we can use this knowledgeknowledge

Page 17: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

DuplicationDuplication Same old problem!Same old problem! Our approach for consolidationOur approach for consolidation

Simple heuristics to consolidate most Simple heuristics to consolidate most duplicatesduplicates

Artist names are unique Artist names are unique all Rembrandts are mergedall Rembrandts are merged

Merge less specific info into more detailed ones Merge less specific info into more detailed ones 1606 is merged into 15/7/16061606 is merged into 15/7/1606

Term expansion using WordNetTerm expansion using WordNet Synonyms: Synonyms: Leiden and Leyden, The Netherlands and HollandLeiden and Leyden, The Netherlands and Holland Holonyms (part of): Holonyms (part of): Leiden is part of The NetherlandsLeiden is part of The Netherlands

Knowledge ComparisonKnowledge Comparison Rembrandt, Rembrandt Harmenszoon, and Van Rijn Rembrandt, Rembrandt Harmenszoon, and Van Rijn

share a date of birth and a place of birthshare a date of birth and a place of birth Difficult with multiple info – verification might helpDifficult with multiple info – verification might help

Page 18: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

VerificationVerification Inconsistency Inconsistency

We don’t aim for “the right answer”, but for We don’t aim for “the right answer”, but for some sort of a confidence value some sort of a confidence value

Different sources may provide different info, Different sources may provide different info, eg. Renoir’s dob is:eg. Renoir’s dob is:

5 Feb 1841 in5 Feb 1841 in www.pillipscollection.org/html/lbp.htmlwww.pillipscollection.org/html/lbp.html

25 Feb 1841 in 25 Feb 1841 in www.abcgallery.com/R/renoir/renoirbio.htmlwww.abcgallery.com/R/renoir/renoirbio.html

which one is which one is more likely more likely to be correct?to be correct? TrustTrust: certain sources can be more trusted than : certain sources can be more trusted than

others, but how do we judge that?others, but how do we judge that? FrequencyFrequency: certain facts might be extracted : certain facts might be extracted

more often than othersmore often than others ExtractionExtraction: some extraction rules are more : some extraction rules are more

reliable than others reliable than others

Page 19: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Ontology

1. E

xtra

ctio

n

Web

webpages

InformationExtraction

ServletsServlets

Servlets

5. In

tera

ctio

n

Narrative Generation

Knowledge Management

3.Consolidation 4. IndexingKB DB

Linky

storytemplate

6. Instantiation

6.In

stan

tiatio

n

KB

2. P

opul

atio

n

7. R

ende

ring

Narrative Narrative GenerationGeneration

Page 20: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Biography TemplatesBiography Templates

Specified as XML FOHM structures Specified as XML FOHM structures in Auld Linkyin Auld Linky

Leaves of the template may be:Leaves of the template may be: Queries into the DB for whole Queries into the DB for whole

paragraphsparagraphs NLG using queries into the KBNLG using queries into the KB

Context can be used to adjust the Context can be used to adjust the shape of the template according to shape of the template according to user preferencesuser preferences

Page 21: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

1

1 2 1 2

3

3

4

1 2

2

Birth Family Art Death

The greatest artist of the Dutch school, Rembrandt Harmenszoon van Rijn, was born on July 15, 1606.

Search for: Paragraph with DOB

Rembrandt was born on July 15, 1606.

Construct Sentence with DOB

In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time.

Paragraph about paintings

His early work was devoted to showing the lines, light and shade, and color of the people he saw about him.

Paragraph about style

He was influenced by the work of Caravaggio and was fascinated by the work of many other Italian artists.

Paragraph about influences

Low Expertise

Low Expertise

High Expertise

Sequence

LoD Sequence LoD

Page 22: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

3

2

Birth Family Art Death

In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time.

Paragraph about paintings

His early work was devoted to showing the lines, light and shade, and color of the people he saw about him.

Paragraph about style

He was influenced by the work of Caravaggio and was fascinated by the work of many other Italian artists.

Paragraph about influences

Low Expertise

Low Expertise

High Expertise

1 2

3

Sequence

1

1 2

4

1 2

Sequence

LoD LoD

The greatest artist of the Dutch school, Rembrandt Harmenszoon van Rijn, was born on July 15, 1606.

In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time.

His early work was devoted to showing the lines, light and shade, and color of the people he saw about him.

On October 4, 1669, Rembrandt died in Amsterdam

Page 23: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Future Biography Generation Future Biography Generation WorkWork

Use co-referencing techniques to smooth Use co-referencing techniques to smooth out chosen paragraphsout chosen paragraphs

Develop a ‘memory’ of what has been Develop a ‘memory’ of what has been previously said (to catch paragraphs that previously said (to catch paragraphs that include multiple ‘facts’)include multiple ‘facts’)

Use conflicting factual data as a resource:Use conflicting factual data as a resource: compare conflicting accountscompare conflicting accounts generate statistical sentences “Most sources generate statistical sentences “Most sources

agree that…”agree that…” Reference material so readers can evaluate the Reference material so readers can evaluate the

sourcesource

Page 24: ArtEquAKT Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Weal

Future Direction for Future Direction for ArtEquAKTArtEquAKT

Improve the individual processesImprove the individual processes Incorporate images Incorporate images

Use their context (descriptions etc) to extract Use their context (descriptions etc) to extract knowledge about themknowledge about them

Deploy them in biographies to accompany the Deploy them in biographies to accompany the texttext

Use inferenceUse inference generate new relations in the KBgenerate new relations in the KB use NLP to generate sentences to describe themuse NLP to generate sentences to describe them

Apply technology to a physical setting (e.g. Apply technology to a physical setting (e.g. on a PDA around a gallery space)on a PDA around a gallery space)