information extraction and linked data cloud
DESCRIPTION
In the media industry there is a great emphasis on providing descriptive metadata as part of the media assets to the consumers. Information extraction (IE) is considered an important tool for metadata generation process and its performance largely depend on the knowledge base it utilizes. The advances in the “Linked Data Cloud” research provide a great opportunity for generating such knowledge base that benefit from the participation of wider community. In this talk, I will discuss our experiences of utilizing Linked Data Cloud in conjunction with a GATE-based IE system.TRANSCRIPT
![Page 1: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/1.jpg)
04/12/23
Information Extraction & Linked Data Cloud
Dr. Dhaval Thakker KTP Research Associate
Press Association Images & Nottingham Trent University
© Dhaval Thakker, Press Association , Nottingham Trent University
![Page 2: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/2.jpg)
2
OutlineOutline
Press Association & its operations Introduction to the Semantic Technology Project at PA
Images IE and Knowledge base systems Semantic Web browsing
Problem of generating Knowledge bases Introduction to Linked Data Cloud (LDC) How do we use LDC
Current and Future Work Conclusions
![Page 3: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/3.jpg)
3
Press Association (pressassociation.com)Press Association (pressassociation.com)
Background Semantic Web project Knowledge base Conclusions
Press Association & its operations UK’s leading multimedia news & information provider Core News Agency operation Editorial services: Sports data, entertainment guides, weather
forecasting, photo syndication
![Page 4: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/4.jpg)
4
Free-text versus Semantic ApproachFree-text versus Semantic Approach
Free-Text Lack of structure
Have to rely on the annotator to provide all possible keywords
Repetitive annotation effort
Low accuracy
Semantic Adds structure, Concepts-Relationship
Provides Inference ( Implicit reasoning ) capacity
Accurate results
“Related”, “Similarity” based browsing
Background Semantic Web project Knowledge base Conclusions
![Page 5: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/5.jpg)
5
… the Semantic Web… the Semantic Web
Web was “invented” by Tim Berners-Lee (amongst others), a physicist working at CERN
“The next generation WWW is a Web in which machines can converse in a meaningful way, rather than a web limited to humans requesting HTML pages.“
Tim Berners-Lee
… need to Add “Semantics”
Use Ontologies (dictionary of terms) to help computers understand the meaning (semantics) of domain concepts
Background Semantic Web project Knowledge base Conclusions
![Page 6: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/6.jpg)
6
PA Images WorkflowPA Images Workflow
Agency/Photographers
Metadata
Company
Captioners
Website
Provides minimum metadata in IPTC
Images with metadata
passed to Captioners for batch processing
Modifies existing and adds new metadata
Information Extraction
Storage & Browsing
Semantic structure
Background Semantic Web project Knowledge base Conclusions
![Page 7: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/7.jpg)
7
Utilisation of Semantic Technologies for Intelligent Indexing and Retrieval of PA Images photo CollectionUtilisation of Semantic Technologies for Intelligent Indexing and Retrieval of PA Images photo Collection
Development of a comprehensive semantic-based taxonomy for PA Images domains of News, Entertainment and Sports.
Design and implementation of a web-based and semantics-transparent annotation tool.
Design and develop software programmes to semi-automate the annotation of legacy data.
Development of semantically-enabled search technology, specifically tailored for the PA Photos Image Retrieval engine.
Background Semantic Web project Knowledge base Conclusions
![Page 8: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/8.jpg)
8
Text Mining System OverviewText Mining System Overview
Images with captions
GATE-based IE System
Background Semantic Web project Knowledge base Conclusions
Gazetteer (known entities)
JAPE Grammar (context rules)
Disambiguation/Summarisation
Entities of interest
Annotated Image
captions
PA KB
Linked Data
Cloud
What to store
What to extract
Confirmation
Captions
Learned Facts
Schema
PA Images view
PA Images ontology
![Page 9: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/9.jpg)
9
PA Images Ontology (OWL)PA Images Ontology (OWL)
Background Semantic Web project Knowledge base Conclusions
![Page 10: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/10.jpg)
10
Knowledge base (KB)Knowledge base (KB)
Ontology (schema)
Royalty (Royal Family)‒ name‒ relationship
Type 1‒ Spouse‒ From‒ ToType 2‒ Partner‒ From‒ To
‒ predecessor‒ successor‒ father‒ mother‒ Title
Data
Royalty (Henry VIII )‒ name (Tudor, Henry/Henry VIII
of England )‒ relationship
Spouse (Anne Boleyn)Spouse (Catherine Parr)Spouse (Jane Seymour)Spouse (Anne of Cleves)Spouse (Catherine Howard)Spouse (Catherine of Aragon)
‒ Predecessor (Henry VII )‒ Successor (Edward VI)‒ Father (Henry VII of England)‒ Mother (Elizabeth of York )‒ Title (king of England and
Ireland)
Background Semantic Web project Knowledge base Conclusions
![Page 11: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/11.jpg)
11
Scale of Things for KBScale of Things for KB
Emphasis on : People, Places, Organisations, Events
About 50 types of sports Their Events Type of people in these sports (Referee, Players etc) Type of Locations for these sports Variety of Teams for these sports And relationships between all of them!!
Similarly for Entertainment and News
Background Semantic Web project Knowledge base Conclusions
![Page 12: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/12.jpg)
12
Outsourcing KB – Linked Data Cloud (LDC)Outsourcing KB – Linked Data Cloud (LDC)
Where do we get all these knowledge from? We don’t want it in free-text form but in a semantic
structure It has to be comprehensive and accurate Free, open, extractable, evolving Uniform Resource Identifiers (URIs) and Resource
Description Framework (RDF) language are the heart of the LoD
Background Semantic Web project Knowledge base Conclusions
![Page 13: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/13.jpg)
13
Linked DataLinked Data
“The term Linked Data is used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web”
“The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.”
Background Semantic Web project Knowledge base Conclusions
![Page 14: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/14.jpg)
14
Linked Data cloudLinked Data cloud
31/03/2008
Background Semantic Web project Knowledge base Conclusions
![Page 15: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/15.jpg)
15
DBPediaDBPedia
Epicentre of the Linked Data Cloud Generated primarily from the Wikipedia info-boxes
and improved with linkage to other sources in the cloud.
The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies.
Many organisations, researchers using it.
Background Semantic Web project Knowledge base Conclusions
![Page 16: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/16.jpg)
16
Linking Open Data CommunityLinking Open Data Community
Community effort to•Publish existing open license datasets as Linked Data on the Web•Interlink things between different data sources•Develop clients that consume Linked Data from the Web
Background Semantic Web project Knowledge base Conclusions
![Page 17: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/17.jpg)
17
Organizations participating in the LOD communityOrganizations participating in the LOD community
Companies
•Press Association (UK)
•New York Times (USA)
•Thompson Reuters (USA)- Opencalais
•BBC (UK) – Music Beta website, BBC Eath
• MusicBrainz
• Yahoo Microsearch
• OpenLink (UK)
• Talis (UK)
• Zitgist (USA)
• Garlik (UK)
• Mondeca (FR)
• Renault (FR)
• Boab Interactive (AUS)
•…..others who are indirect consumers..
Universities and Research Institutes
• Massachusetts Institute of Technology (USA)•University of Southampton (UK)•DERI (IRE)•KMi, Open University (UK)•University of London (UK)•Universität Hannover (DE)•University of Pennsylvania (USA)•Universität Leipzig (DE)•Universität Karlsruhe (DE)•Joanneum (AT)•Freie Universität Berlin (DE)•Cyc Foundation (USA)•SouthEast University (CN)
Background Semantic Web project Knowledge base Conclusions
![Page 18: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/18.jpg)
18
Background Semantic Web project Knowledge base Conclusions
![Page 19: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/19.jpg)
19
Interested in Linking up?Interested in Linking up?
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful RDF information
4. Include RDF statements that link to other URIs so that they can discover related things
Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html
Background Semantic Web project Knowledge base Conclusions
![Page 20: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/20.jpg)
20
Our approach for LDC utilisationOur approach for LDC utilisation
Why not DBPedia as it is?
Great deal of noisy data -If we store them as it is, storage will be huge DBpedia is less formally structured. The data quality is lower for production scale and there are some
inconsistencies within DBpedia. and we have our own domains and own view of them
Our approach is to combine the advantages of both worlds is to interlink DBpedia with hand-crafted ontologies such as PA Images ontology, which enables applications to use the formal knowledge from these ontologies together with the data from DBpedia.”
Background Semantic Web project Knowledge base Conclusions
![Page 21: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/21.jpg)
21
Linked Data CloudLinked Data Cloud
Ontology Mapping - Map the ontology and the data will follow..Ontology Mapping - Map the ontology and the data will follow..
PA Images Ontology
DBPedia YAGO
Geonames
......
sameAs
sameAs
sameAs
Knowledgebase/data for our ontology
Similar Entities & Their Features
Background Semantic Web project Knowledge base Conclusions
![Page 22: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/22.jpg)
22
SPARQL CONSTRUCTSPARQL CONSTRUCT
PREFIX dbpedia-ont: <http://dbpedia.org/ontology/>PREFIX db: <http://dbpedia.org/>PREFIX pa: <http://localhost/pa/images/media/entities.owl#>PREFIX owl: <http://www.w3.org/2002/07/owl#>PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT { ?newLoc a pa:City .?newLoc pa:locationName ?name .?newLoc pa:latitutedegrees ?lat
}WHERE{ ?newLoc a dbpedia-ont:City . ?newLoc foaf:name ?name . ?newLoc dbpedia-ont:latitutedegrees ?lat } DBPedia
PA Images ontolog
y
Background Semantic Web project Knowledge base Conclusions
![Page 23: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/23.jpg)
23
Has City -> City Of CountryHas City -> City Of Country
PREFIX dbpedia-ont: <http://dbpedia.org/ontology/>PREFIX pa: <http://localhost/pa/images/media/entities.owl#>PREFIX owl: <http://www.w3.org/2002/07/owl#>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX db-prop: <http://dbpedia.org/property/>
CONSTRUCT { ?newLoc a pa:City.?newLoc pa:cityOfCountry ?country .?newLoc pa:locationName ?name .?country pa:hasCity ?newLoc}WHERE{ ?newLoc a dbpedia-ont:City . ?newLoc db-prop:subdivisionName ?country . ?country a <http://dbpedia.org/ontology/Country> . ?newLoc foaf:name ?name }
Background Semantic Web project Knowledge base Conclusions
![Page 24: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/24.jpg)
24
People - Total > 200000People - Total > 200000
Footballers -> 24k Cricketers -> 4k American Footballers -> 8k Actors -> 12k Music Artists -> 22k Baseball players -> 1200 Basketball players -> 1200 British Royalty -> 800 Cyclists -> 2300 Politicians -> 15k F1 Racing Drivers ->1100……………….
Background Semantic Web project Knowledge base Conclusions
![Page 25: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/25.jpg)
25
Groups Total > 50k Groups Total > 50k
National Football Teams -> 400 Band -> 16000 Companies -> 24k Clubs -> 800
Background Semantic Web project Knowledge base Conclusions
![Page 26: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/26.jpg)
26
Work > 200000Work > 200000
Album – 80k Films – 80k Single -> 27k Books -> 17k ….
And.. Events -> 2000 Locations -> 200000
Background Semantic Web project Knowledge base Conclusions
![Page 27: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/27.jpg)
27
ConclusionsConclusions
Linked data very exciting The intention is that we move from a web of
documents to a web of data– The Web as database
PA Knowledge base generation using linked data cloud
A complete product that utilises semantic technologies to lower the cost of annotation and improved search experience
Background Semantic Web project Knowledge base Conclusions
![Page 28: Information Extraction and Linked Data Cloud](https://reader038.vdocuments.net/reader038/viewer/2022103000/55506820b4c905cc0f8b45f3/html5/thumbnails/28.jpg)
28
AcknowledgementAcknowledgement
KTP Project, Press Association & Nottingham Trent University