artequakt harith alani, sanghee kim, wendy hall, paul lewis, david millard, nigel shadbolt, mark...
TRANSCRIPT
ArtEquAKTArtEquAKT
Harith Alani, Sanghee Kim, Harith Alani, Sanghee Kim, Wendy Hall, Paul Lewis, David Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, Mark Millard, Nigel Shadbolt, Mark
WealWeal
OverviewOverview
Union of three projects : Artiste, Equator, and Union of three projects : Artiste, Equator, and AKTAKT
Aims:Aims:• Use NLT to automatically extract relevant Use NLT to automatically extract relevant
information about the life and work of artists from information about the life and work of artists from online documentsonline documents
• Feed this information automatically to an ontology Feed this information automatically to an ontology designed for this domaindesigned for this domain
• Generate stories by extracting and structuring Generate stories by extracting and structuring information from the knowledge base in the form information from the knowledge base in the form of biographical narratives in response to user of biographical narratives in response to user requestsrequests
ObjectivesObjectives
To find out how effective these technologies To find out how effective these technologies are when used togetherare when used together
To explore the way in which the limitations of To explore the way in which the limitations of one process effects the others one process effects the others (e.g. how ambiguity during extraction mind be (e.g. how ambiguity during extraction mind be
reflected at the generation stage)reflected at the generation stage) To generate biographies that might not be as To generate biographies that might not be as
readable as those on the web but which : readable as those on the web but which : contain information that is difficult to find out contain information that is difficult to find out
manuallymanually gather information from disparate sourcesgather information from disparate sources
Ontology
1. E
xtra
ctio
n
Web
webpages
InformationExtraction
ServletsServlets
Servlets
5. In
tera
ctio
n
Narrative Generation
Knowledge Management
3.Consolidation 4. IndexingKB DB
Linky
storytemplate
6. Instantiation
6.In
stan
tiatio
n
KB
2. P
opul
atio
n
7. R
ende
ring
Ontology
1. E
xtra
ctio
n
Web
webpages
InformationExtraction
ServletsServlets
Servlets
5. In
tera
ctio
n
Narrative Generation
Knowledge Management
6.In
stan
tiatio
n
KB
2. P
opul
atio
n
3.Consolidation 4. IndexingKB DB
Linky
storytemplate
6. Instantiation
7. R
ende
ring
InformatioInformationn
ExtractionExtraction
Knowledge Extraction ProcedureKnowledge Extraction Procedure
Ontology
WordNet
Load Resources Downloaded Text
~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~
Paragraph andsentence recognition
Apple Pie Parser(Syntactic Analyser)
Semantic Analysis(Relational Learning)
XML output
Query
~ ~ ~
GATE
Search and Filter DocumentsSearch and Filter Documents
Query search engines (‘Yahoo’, Query search engines (‘Yahoo’, ‘Altavista’) given artist name as a query‘Altavista’) given artist name as a query
Calculate the similarity of retrieved Calculate the similarity of retrieved documents to an example documentdocuments to an example document
Use term frequency with normalisation Use term frequency with normalisation for similarity computation for similarity computation
Apply some heuristics (e.g. sentence Apply some heuristics (e.g. sentence length) to filter out documents which length) to filter out documents which contain mostly tables and/or linkscontain mostly tables and/or links
Relation ExtractionRelation Extraction
Natural language processing techniques to Natural language processing techniques to extract relationextract relation
Guided by an ontology Guided by an ontology Use GATE (General Architecture for Text Use GATE (General Architecture for Text
Engineer) and WordNet for entity recognition Engineer) and WordNet for entity recognition (e.g. person name, place name, or date) (e.g. person name, place name, or date)
Term expansion using WordNet (synonym, Term expansion using WordNet (synonym, hypernym, and hyponym, e.g. ‘depict’ maps to hypernym, and hyponym, e.g. ‘depict’ maps to ‘portray’ (synonym) and ‘represent’ ‘portray’ (synonym) and ‘represent’ (hypernym))(hypernym))
An ExampleAn Example
Given the sentence:Given the sentence: Rembrandt Harmenszoon van Rijn was Rembrandt Harmenszoon van Rijn was
born on July 15, 1606, in Leiden, the born on July 15, 1606, in Leiden, the NetherlandsNetherlands..
The following facts are extracted:The following facts are extracted:
Rembrandt Harmenszoon van Rijn was born on July 15,1606, in Leiden, the Netherlands
Person name Date
Place
Birth
Future Information Extraction Future Information Extraction WorkWork
Incorporate a learning capability in Incorporate a learning capability in extracting relationextracting relation
Need to widen the scope of the NLP tool Need to widen the scope of the NLP tool to increase performanceto increase performance
Extract information about ‘painting’Extract information about ‘painting’ Extract links to painting imagesExtract links to painting images Further investigation about term Further investigation about term
expansion using WordNet (e.g. consider expansion using WordNet (e.g. consider contexts in mapping synonyms or contexts in mapping synonyms or hypernyms)hypernyms)
Ontology
1. E
xtra
ctio
n
Web
webpages
InformationExtraction
ServletsServlets
Servlets
5. In
tera
ctio
n
Narrative Generation
Knowledge Management
3.Consolidation 4. IndexingKB DB
Linky
storytemplate
6. Instantiation
6.In
stan
tiatio
n
KB
2. P
opul
atio
n
7. R
ende
ring
Knowledge Knowledge ManagemeManageme
ntnt
Knowledge ManagementKnowledge Management
Ontology of artists based on CIDOC Ontology of artists based on CIDOC CRMCRM
The ontology The ontology guidesguides the extraction the extraction process process
Populating the Ontology (feeding the Populating the Ontology (feeding the KB)KB)
Knowledge consolidationKnowledge consolidation Ontology server providing a set of Ontology server providing a set of
inference queriesinference queries
Artequakt OntologyArtequakt Ontology
<Paragraph> <url>Potted_biography.html</url> <text>>In 1631, when Rembrandt's work had become
well known and his studio in Leiden was flourishing, he moved to Amsterdam. He became the leading portrait painter in Holland and received many commissions for portraits as well as for paintings of religious subjects. …..It is estimated that he painted between 50 and 60 self-portraits. </text>
<Painter> <name>Rembrandt</name> <place_of_work>leiden</place_of_work> <has_location>amsterdam</has_location> <number_of_paintings>between 50 and
60 self-portraits</number_of_paintings> </Painter> <Sentence> <url>Potted_biography.html</url> <text>He became the leading portrait painter in Holland and received received many commissions for portraits as well as for paintings of religious subjects</text> <Sentence> <url>Potted_biography.html</url> <text>He became the leading portrait painter in Holland and received</text> <mood>third-person</mood> <tense>past</tense> <order>0</order> </Sentence> ……… </Paragraph>
Populating the OntologyPopulating the Ontology
Knowledge ConsolidationKnowledge Consolidation After extracting info on Rembrandt from After extracting info on Rembrandt from
10 web sites, the KB was populated with 10 web sites, the KB was populated with the following:the following: Rembrandt instance:Rembrandt instance:
26 Rembrandt, 37 Rembrandt Harmenszoon, 2 Van 26 Rembrandt, 37 Rembrandt Harmenszoon, 2 Van RijnRijn
Date of birthDate of birth 15/7/1606, 1606, 1620, 164115/7/1606, 1606, 1620, 1641
Place of birthPlace of birth Leiden, Leyden, Netherlands, HollandLeiden, Leyden, Netherlands, Holland
We need to merge duplications, and verify We need to merge duplications, and verify inconsistencies before we can use this inconsistencies before we can use this knowledgeknowledge
DuplicationDuplication Same old problem!Same old problem! Our approach for consolidationOur approach for consolidation
Simple heuristics to consolidate most Simple heuristics to consolidate most duplicatesduplicates
Artist names are unique Artist names are unique all Rembrandts are mergedall Rembrandts are merged
Merge less specific info into more detailed ones Merge less specific info into more detailed ones 1606 is merged into 15/7/16061606 is merged into 15/7/1606
Term expansion using WordNetTerm expansion using WordNet Synonyms: Synonyms: Leiden and Leyden, The Netherlands and HollandLeiden and Leyden, The Netherlands and Holland Holonyms (part of): Holonyms (part of): Leiden is part of The NetherlandsLeiden is part of The Netherlands
Knowledge ComparisonKnowledge Comparison Rembrandt, Rembrandt Harmenszoon, and Van Rijn Rembrandt, Rembrandt Harmenszoon, and Van Rijn
share a date of birth and a place of birthshare a date of birth and a place of birth Difficult with multiple info – verification might helpDifficult with multiple info – verification might help
VerificationVerification Inconsistency Inconsistency
We don’t aim for “the right answer”, but for We don’t aim for “the right answer”, but for some sort of a confidence value some sort of a confidence value
Different sources may provide different info, Different sources may provide different info, eg. Renoir’s dob is:eg. Renoir’s dob is:
5 Feb 1841 in5 Feb 1841 in www.pillipscollection.org/html/lbp.htmlwww.pillipscollection.org/html/lbp.html
25 Feb 1841 in 25 Feb 1841 in www.abcgallery.com/R/renoir/renoirbio.htmlwww.abcgallery.com/R/renoir/renoirbio.html
which one is which one is more likely more likely to be correct?to be correct? TrustTrust: certain sources can be more trusted than : certain sources can be more trusted than
others, but how do we judge that?others, but how do we judge that? FrequencyFrequency: certain facts might be extracted : certain facts might be extracted
more often than othersmore often than others ExtractionExtraction: some extraction rules are more : some extraction rules are more
reliable than others reliable than others
Ontology
1. E
xtra
ctio
n
Web
webpages
InformationExtraction
ServletsServlets
Servlets
5. In
tera
ctio
n
Narrative Generation
Knowledge Management
3.Consolidation 4. IndexingKB DB
Linky
storytemplate
6. Instantiation
6.In
stan
tiatio
n
KB
2. P
opul
atio
n
7. R
ende
ring
Narrative Narrative GenerationGeneration
Biography TemplatesBiography Templates
Specified as XML FOHM structures Specified as XML FOHM structures in Auld Linkyin Auld Linky
Leaves of the template may be:Leaves of the template may be: Queries into the DB for whole Queries into the DB for whole
paragraphsparagraphs NLG using queries into the KBNLG using queries into the KB
Context can be used to adjust the Context can be used to adjust the shape of the template according to shape of the template according to user preferencesuser preferences
1
1 2 1 2
3
3
4
1 2
2
Birth Family Art Death
The greatest artist of the Dutch school, Rembrandt Harmenszoon van Rijn, was born on July 15, 1606.
Search for: Paragraph with DOB
Rembrandt was born on July 15, 1606.
Construct Sentence with DOB
In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time.
Paragraph about paintings
His early work was devoted to showing the lines, light and shade, and color of the people he saw about him.
Paragraph about style
He was influenced by the work of Caravaggio and was fascinated by the work of many other Italian artists.
Paragraph about influences
Low Expertise
Low Expertise
High Expertise
Sequence
LoD Sequence LoD
3
2
Birth Family Art Death
In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time.
Paragraph about paintings
His early work was devoted to showing the lines, light and shade, and color of the people he saw about him.
Paragraph about style
He was influenced by the work of Caravaggio and was fascinated by the work of many other Italian artists.
Paragraph about influences
Low Expertise
Low Expertise
High Expertise
1 2
3
Sequence
1
1 2
4
1 2
Sequence
LoD LoD
The greatest artist of the Dutch school, Rembrandt Harmenszoon van Rijn, was born on July 15, 1606.
In addition to portraits, Rembrandt attained fame for his landscapes, while as an etcher he ranks among the foremost of all time.
His early work was devoted to showing the lines, light and shade, and color of the people he saw about him.
On October 4, 1669, Rembrandt died in Amsterdam
Future Biography Generation Future Biography Generation WorkWork
Use co-referencing techniques to smooth Use co-referencing techniques to smooth out chosen paragraphsout chosen paragraphs
Develop a ‘memory’ of what has been Develop a ‘memory’ of what has been previously said (to catch paragraphs that previously said (to catch paragraphs that include multiple ‘facts’)include multiple ‘facts’)
Use conflicting factual data as a resource:Use conflicting factual data as a resource: compare conflicting accountscompare conflicting accounts generate statistical sentences “Most sources generate statistical sentences “Most sources
agree that…”agree that…” Reference material so readers can evaluate the Reference material so readers can evaluate the
sourcesource
Future Direction for Future Direction for ArtEquAKTArtEquAKT
Improve the individual processesImprove the individual processes Incorporate images Incorporate images
Use their context (descriptions etc) to extract Use their context (descriptions etc) to extract knowledge about themknowledge about them
Deploy them in biographies to accompany the Deploy them in biographies to accompany the texttext
Use inferenceUse inference generate new relations in the KBgenerate new relations in the KB use NLP to generate sentences to describe themuse NLP to generate sentences to describe them
Apply technology to a physical setting (e.g. Apply technology to a physical setting (e.g. on a PDA around a gallery space)on a PDA around a gallery space)