knowledge graph analysis - software engineering...dr. hamed shariat yazdi, prof. jens lehmann...
TRANSCRIPT
Knowledge Graph Analysis
2018-10-01 Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann
Introduction of Lecturers
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 2
Prof. Dr. Jens Lehmann
B Since 2015:Full Professor for Data Engineering at University of Bonn
B 2013-2015:Leader of AKSW Research Group at University of Leipzig
B Since 2010:Leader of Machine Learning and Ontology Engineering Groupin Leipzig
B 2010:PhD University of Leipzig
B (Co-)Founder of DBpedia, DL-Learner and LinkedGeoDataProjects
B Research interests: Machine Learning, Semantic Web, Logic
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 3
Dr. Hamed Shariat Yazdi
B 2016 - Now:Leader of the Knowledge Graphs and Artificial IntelligenceTeam and Senior Researcher at University of Bonn
B 2016:Prize of the University of Siegen for the Promotion of YoungScientists
B 2015:PhD of Computer Science at University of Siegen(Grade: Summa Cum Laude mit Auszeichnung)
B Research interests:Machine Learning, Knowledge Graphs, Artificial Intelligence
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 4
Organisational Matters
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 5
Organisational Matters: Overview
B Knowledge Graph Analysis consists of two modules:Lecture/Exercises + Seminar
B Mailing list: sign-up sheet
B Lecture notes will be provided athttps://sewiki.iai.uni-bonn.de/teaching/lectures/kga/2018/start
B Changes will be announced at the website and via mailing list
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 6
Organisational Matters: Exercises and Seminar
Exercises
B Start date: Will be announced soon!
B Time: Wednesdays 14:00-16:00, Thursdays 12:00-14:00 & 14:00-16:00 & 16:00-18:00,Fridays 10:00-12:00 & 12:00-14:00
B Location: Seminar Room 1.047, Informatik III
B Address: Endenicher Allee 19a, 53115 Bonn
Seminar
B Optional in addition to lecture and exercises
B Deepens understanding of topics discussed in the lecture
B Based on recently published articles
B 5 or 6 slots: initial distribution of topics, sprint presentations, final presentations
B Start date: 9 October 2018
B Time: Tuesdays 10:00-12:00
B Location: Seminar Room 1.047, Informatik III
B Address: Endenicher Allee 19a, 53115 Bonn
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 7
Organisational Matters: Exams
B Written exam (90 minutes)
B Date: Not Fixed Yet!
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 8
Organisational Matters: Overview
Tuesday 12:00 - 14:00B October 09: Motivation (Prof. Jens Lehmann, Dr. Hamed Shariat Yazdi)
B October 16: RDF Databases
B October 23: Property Graph Databases
B October 30: Statistical Relational Learning (SRL) for KGs
B November 06: Tensor-Factorization Methods
B November 13: - No Lecture -
B November 20: Alternating Least Squares and Stochastic Gradient Descent
B November 27: Introducing Neural Networks
B December 04: Neural Networks for KG analysis
B December 11: Latent Distance and Graph Feature Models
B January 15: Training SRL Models on KG
B January 22: Markov Logic Networks
B January 29: Summary
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 9
Organisational Matters: Literature
B A Review of Relational Machine Learning for Knowledge Graphs byMaximilian Nickel, Kevin Murphy, Volker Tresp, Evgeniy Gabrilovichhttps://arxiv.org/pdf/1503.00759.pdf
B O’Reilly’s Graph Databases by Ian Robinson, Jim Webber and Emil Eifremhttps://neo4j.com/book-graph-databases/
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 10
Background
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 11
Role of Knowledge
Douglas B. Lenat
B “Knowledge is Power” Hypothesis (the KnowledgePrinciple): “If a program is to perform a complextask well, it must know a great deal about theworld in which it operates.”
B The Breadth Hypothesis: “To behave intelligently inunexpected situations, an agent must be capable offalling back on increasingly general knowledge.”1
Edward A. Feigenbaum
1Taken from: Lenat & Feigenbaum, Artificial Intelligence 47 (1991), “On the Threshold of Knowledge”.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 12
Information vs. Knowledge
B Information is a precursor for knowledge: necessary, but not sufficient
B Knowledge requires cognitive and analytical processing; information does not.
B Knowledge = information processed and organized so as to be useful, e.g., amap of immediate environment
⇒ information alone is not enough!
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 13
Why Graphs?
B Data is increasingly connected (graph data)
B Relationships within data are often themselves an integral part of the data
B The world is structured: we are surrounded by entities connected byrelations
B Graphs are a natural abstraction that captures the relationships betweenentities
B Graphs are well-understood mathematical objects
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 14
Why Graphs?
A picture is worth a thousand words...
B 5th Ave intersects E 42nd St
B 5th Ave intersects E 41st St
B Madison Ave intersects E 39th St
B Lexington Ave intersects E 38th St
B Tunnel Exit St intersects E 38th St
B Park Ave Tunnel intersects E 39th St
B etc...
Goal: Get from point A to point B.Which description would you use?
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 15
Why Graphs?
A picture is worth a thousand words...
B It is natural for human beings to representinformation in graphs
B Information represented in a graph is alsoeasier to understand and process (for hu-mans)
B Example: we can think of a map as a graphstructure:
• Nodes = intersections• Edges = roads connecting the inter-
sections
Goal: Get from point A to point B.Which description would you use?
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 16
Knowledge Graph: Intuition
B Models entities and relationships between these entities
B Data stored in a graph
B Nodes correspond to entities
B Directed labeled edges between entities keep track of relationships
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 17
Knowledge Graph: Intuition
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 18
Knowledge Graph: Things, not Strings
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 19
Connections to Related Fields
Draws on a multitude of fields:
B Information retrieval
B Natural language processing
B Databases
B Machine learning
B Artificial intelligence
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 20
History
2000s have seen a rapid rise of knowledge graphs to prominence
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 21
2007: DBpedia
2007: DBpedia is released by collaborated effort of Leipzig University, Universityof Mannheim, and OpenLink Software
B Structured information is mined from Wikipedia
B Allows semantic queries
B Collected information, as well as code, are freely available
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 22
2007: Freebase
2007: A California-based startup Metaweb releases Freebase
B Described by Metaweb as “an open shared database of the world’sknowledge”
B Provided interface to enable users to fill in structured data, to categorize orto connect data
B No programming experience required
B Structured data automatically harvested from the web
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 23
2008: YAGO
2008: Max-Planck-Institute in Saarbrucken releases YAGO
B Stands for: Yet Another Great Ontology
B Extracts information automatically from Wikipedia, WordNet, andGeoNames
B Was used by the artificial intelligence system Watson (together withDBpedia and Freebase)
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 24
2010: NELL
2010: Carnegie Mellon proposes a design for a language learning system NELL
B Stands for Never-Ending Language Learner
B Builds a knowledge base by reading the web
B Continuously improves its ability to process information and to build theknowledge base
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 25
2012: Google
2012: Google announces addition of knowledge graph to their search engine
B Goal: make search easier for the user
B Step toward enabling search engines to understand the world similarly to howhumans understand it
B Facilitates a broader and deeper search
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 26
2012: Satori
2012: Microsoft introduces Satori into their Bing search engine
B Goal: improve search by making it the underlying architecture moreintelligent
B Provide most relevant information and key facts automatically
B In 2010 Microsoft releases Graph Engine 1.0: a general purpose distributedgraph system over a memory cloud. The goal of the engine is to enableprocessing of large graphs.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 27
Lastest Additions
2013: Facebook introduces “Entity Graph”
B Each “entity” (e.g. persons, institutions, restaurants, etc.) is treated as annode in a graph
B Relations between entities are modeled by their relationships in real-life (e.g.friendships, alumni, checked in at, etc)
2016: LinkedIn introduces their own knowledge graph
B A large knowledge base based on LinkedIn members, jobs, titles, companies,skills, etc.
B Goal: Create a large economic graph - a map of the world economy
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 28
In Summary, Knowledge Graphs...
B are applied both in academia and in-dustry
B are rapidly adopted by the most pop-ular search engines and social net-works
B present a natural way of accessing/s-toring knowledge in a multitude ofdomains
B promote better interface and accessto knowledge: dialog system, per-sonalization, emotion, question an-swering
2
2This Wikipedia and Wikimedia Commons image is from the user Chris 73 and is freelyavailable at //commons.wikimedia.org/wiki/File:WorldWideWebAroundGoogle.png under thecreative commons cc-by-sa 3.0 license.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 29
Detailed Examples
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 30
Detailed Example
Things, not Strings!https://www.youtube.com/watch?v=mmQl6VGvX-c
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 31
Structured Results
Augmentingpresentationwith relevantfacts
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 32
Google’s Knowledge Graph: Surfacing Facts Proactively
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 33
Google’s Knowledge Graph: Exploratory Search
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 34
Detailed Example: DBpedia
dbpedia.org
github.com/dbpedia/
B Crowd-sourced community effort toextract structured information fromWikipedia and Wikidata
B Currently describes 4.58 millionthings - 4.22 million are classified ina consistent ontology
B Available in 125 languages
B Enriches the extracted informationwith semantic layer
B Allows semantic query of relation-ships and properties + many othertools
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 35
DBpedia
B Community effort to extract structured information from Wikipedia and tomake this information available on the Web
B Allows to ask sophisticated queries against Wikipedia, and to link other datasets on the Web to Wikipedia data
B Semi-structured Wiki markup → structured information
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 36
Wikipedia Limitations
Simple Questions are hard to answer with Wikipedia:
B What do Innsbruck and Leipzig have in common?
B Who are mayors of central European towns elevated more than 1000m?
B Which movies are starring both Brad Pitt and Angelina Jolie?
B All soccer players, who played as goalkeeper for a club that has a stadiumwith more than 40.000 seats and who are born in a country with more than10 million inhabitants
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 37
Structure in Wikipedia
B Title
B Abstract
B Infoboxes
B Geo-coordinates
B Categories
B Images
B Links
• other language versions• other Wikipedia pages• To the Web• Redirects• Disambiguation
B ...
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 38
DBpedia Information Extraction Framework
DBpedia Information Extraction Framework (DIEF)
B Started in 2007
B Hosted on Sourceforge and Github
B Initially written in PHP but fully re-written Written in Scala and Java
B Around 40 Contributors
B See https://www.ohloh.net/p/dbpedia for detailed overview
Can potentially be adapted to other MediaWikis
B Currently Wiktionary http://wiktionary.dbpedia.org
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 39
DIEF - Overview
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 40
DIEF - Raw Infobox Extractor
WikiText syntax{{Infobox Korean settlement|title = Busan Metropolitan City...|area km2 = 763.46|pop = 3635389|region = [[Yeongnam]]}}RDF serializationdbp:Busan dbp:title ”Busan Metropolitan City”dbp:Busan dbp:area km2 ”763.46”ˆxsd:floatdbp:Busan dbp:pop ”3635389”ˆxsd:int
dbp:Busan dbp:region dbp:Yeongnam
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 41
DIEF - Raw Infobox Extractor/Diversity
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 42
DIEF - Raw Infobox Extractor/Diversity
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 43
DIEF - Mapping-Based Infobox Extractor
Cleaner data:
B Combine what belongs together (birth place, birthplace)
B Separate what is different (bornIn, birthplace)
B Correct handling of datatypes
Mappings Wiki:
B http://mappings.dbpedia.org
B Everybody can contribute to new mappings or improve existing ones
B ≈ 170 editors
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 44
DIEF - Mapping-Based Infobox Extractor
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 45
Linked Data Publication via 303 Redirects
B http://dbpedia.org/resource/Dresden - URI of the city of Dresden
B http://dbpedia.org/page/Dresden - information resource describing thecity of Dresden in HTML format
B http://dbpedia.org/data/Dresden - information resource describing thecity of Dresden in RDF/XML format
B further formats supported, e.g. http://dbpedia.org/data/Dresden.n3for N3
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 46
DBpedia Links
Data set Predicate Count ToolAmsterdam Museum owl:sameAs 627 SBBC Wildlife Finder owl:sameAs 444 SBook Mashup rdf:type 9 100
owl:sameAsBricklink dc:publisher 10 100CORDIS owl:sameAs 314 SDailymed owl:sameAs 894 SDBLP Bibliography owl:sameAs 196 SDBTune owl:sameAs 838 SDiseasome owl:sameAs 2 300 SDrugbank owl:sameAs 4 800 SEUNIS owl:sameAs 3 100 SEurostat (Linked Stats) owl:sameAs 253 SEurostat (WBSG) owl:sameAs 137CIA World Factbook owl:sameAs 545 S
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 47
DBpedia Links
Data set Predicate Count Toolflickr wrappr dbp:hasPhoto- 3 800 000 C
CollectionFreebase owl:sameAs 3 600 000 CGADM owl:sameAs 1 900GeoNames owl:sameAs 86 500 SGeoSpecies owl:sameAs 16 000 SGHO owl:sameAs 196 LProject Gutenberg owl:sameAs 2 500 SItalian Public Schools owl:sameAs 5 800 SLinkedGeoData owl:sameAs 103 600 SLinkedMDB owl:sameAs 13 800 SMusicBrainz owl:sameAs 23 000New York Times owl:sameAs 9 700OpenCyc owl:sameAs 27 100 COpenEI (Open Energy) owl:sameAs 678 S
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 48
DBpedia Links
Data set Predicate Count ToolRevyu owl:sameAs 6Sider owl:sameAs 2 000 STCMGeneDIT owl:sameAs 904UMBEL rdf:type 896 400US Census owl:sameAs 12 600WikiCompany owl:sameAs 8 300WordNet dbp:wordnet type 467 100YAGO2 rdf:type 18 100 000Sum 27 211 732
Abbreviations:S: Silk (A Link Discovery Framework), L: LIMES (A Link Discovery Framework),
C: Custom Script, Missing: no regeneration
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 49
DBpedia Links - A Query Example
Getting the list of all musicians who were born before 1900 in Germany orderedby their birth date.
SELECT ?name ?birth ?death ?person ?countryOfBirth
WHERE {
?person rdf:type dbo:MusicalArtist .
?person foaf:name ?name .
?person dbo:birthPlace/dbo:country* ?countryOfBirth .
?person dbo:birthDate ?birth .
?person dbo:deathDate ?death .
FILTER ( ?birth < "1900 -01 -01"^^xsd:date ) .
FILTER ( ?countryOfBirth = :Germany ) .
}
ORDER BY ?birth
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 50
Infrastructure
DBpedia has two extraction modes:
B Wikipedia-database-dump-based extraction
B DBpedia Live synchronisation (more later)
DBpedia Dumps:
B The DBpedia Dump archive is located in:http://downloads.dbpedia.org/
B Latest downloads is described in: http://dbpedia.org/Downloads
Official Endpoint (by OpenLink): http://dbpedia.org/sparql
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 51
Query Answering
Back to our Wikipedia questions:
B What do Innsbruck and Leipzig have in common?
B Who are mayors of central European towns elevated more than 1000m?
B Which movies are starring both Brad Pitt and Angelina Jolie?
B All soccer players, who played as goalkeeper for a club that has a stadiumwith more than 40.000 seats and who are born in a country with more than10 million inhabitants
Using the data extracted from Wikipedia and the public SPARQL endpointDBpedia can answer these questions.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 52
DBpedia Live
B DBpedia dumps are generated on a bi-annual basis
B Wikipedia has around 100,000-150,000 page edits per day
B DBpedia Live pulls page updates in real-time and extraction results updatethe triple store
B In practice, a 5 minute update delay increases performance by 15%
Links
B SPARQL Endpoint: http://live.dbpedia.org/sparql
B Documentation: http://wiki.dbpedia.org/DBpediaLive
B Statistics: http://live.dbpedia.org/LiveStats/
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 53
DBpedia Live - Overview
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 54
DBpedia Internationalization (I18n)
B DBpedia Internationalization Committee founded:
• http://wiki.dbpedia.org/Internationalization
B Available DBpedia language editions in:
• Korean, Greek, German, Polish, Russian, Dutch, Portuguese, Spanish,Italian, Japanese, French
• Use the corresponding Wikipedia language edition for input
B Mappings available for 23 languages
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 55
DBpedia I18n - Overview
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 56
Applications: Disambiguation
Named entity recognition and disambiguation Tools such as: DBpediaSpotlight, AlchemyAPI, Semantic API, Open Calais, Zemanta and ApacheStanbol
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 57
Applications: Question Answering
3 elements of a successful question answering system:
B Questions are asked in natural language
B Allows people to use their own terminology
B Provides a concise answer
Example
B Which books did Dan Brown write?
B Which books have Dan Brown as their author?
B Which novels did Dan Brown write?
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 58
Applications: Question Answering
B DBpedia is the primary target for several QA systems in the QuestionAnswering over Linked Data (QALD) workshop series
B IBM Watson relied also on DBpedia
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 59
Applications: Faceted Browsing
B Faceted browsing: allows finding items byrestricting the overall set of items alongmultiple criteria (facets)
B It is a common feature on websites thatdeal with structured data e.g. e-commercesites
B Some projects:
• Neofonie Browser• gFacet• OpenLink faceted browser (fct)
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 60
Applications: Search and Querying
B Sparklis: A query builder in naturallanguage that allows people to explore andquery SPARQL endpoints with all thepower of SPARQL and without anyknowledge of SPARQL.
B ExConQuer: Works on top of DBpediaand enables users to construct a SPARQLquery without requiring any knowledge ofSPARQL or the datasets’ underlyingschema.Users are then able to download the datathey require in a number of differentformats.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 61
Applications: Digital Libraries & Archives
B Virtual International Authority Files (VIAF) project as Linked Data
• VIAF added a total of 250,000 reciprocal authority links to Wikipedia.
B DBpedia can also provide:
• Context information for bibliographic and archive records (e.g. anauthor’s demographics, a film’s homepage, an image etc.)
• Stable and curated identifiers for linking.• The broad range of Wikipedia topics can form the basis for a thesaurus
for subject indexing.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 62
Applications: DBpedia Mobile
B DBpedia Mobile is a location-centric DBpedia client application for mobiledevices consisting of a map view, the Marbles Linked Data Browser and aGPS-enabled launcher application.
B Its initial view is a browser-based area map that indicates the user’s positionand nearby DBpedia resources with appropriate labels and icons.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 63
Other Applications
See http://wiki.dbpedia.org/projects for a more complete list
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 64
Statistical Relational Learning
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 65
KG Completion: Link Prediction
B Problem: KGs are often automatically extracted from text and thus may beincomplete.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 66
KG Completion: Link Prediction
B Problem: KGs are often automatically extracted from text and thus may beincomplete.
B Goal: use existing graph structure to predict missing edges/links.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 66
KG Completion: Link Prediction
B Problem: KGs are often automatically extracted from text and thus may beincomplete.
B Goal: use existing graph structure to predict missing edges/links.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 66
KG Completion: Link Prediction
B Problem: KGs are often automatically extracted from text and thus may beincomplete.
B Goal: use existing graph structure to predict missing edges/links.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 66
KG Completion: Link Prediction
B Problem: KGs are often automatically extracted from text and thus may beincomplete.
B Goal: use existing graph structure to predict missing edges/links.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 66
KG Correction: Link correction
B Problem: KGs are often automatically extracted from text and thus may bewrong.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 67
KG Correction: Link correction
B Problem: KGs are often automatically extracted from text and thus may bewrong.
B Goal: use existing graph structure to estimate probability of correctness ofan edge/link.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 67
KG Correction: Link correction
B Problem: KGs are often automatically extracted from text and thus may bewrong.
B Goal: use existing graph structure to estimate probability of correctness ofan edge/link.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 67
KG Correction: Link correction
B Problem: KGs are often automatically extracted from text and thus may bewrong.
B Goal: use existing graph structure to estimate probability of correctness ofan edge/link.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 67
KG Correction: Entity resolution
B Problem: KGs are often automatically extracted from text and thus may bewrong.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 68
KG Correction: Entity resolution
B Problem: KGs are often automatically extracted from text and thus may bewrong.
B Goal: use existing graph structure and other information to mergeduplicated entities.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 68
KG Correction: Entity resolution
B Problem: KGs are often automatically extracted from text and thus may bewrong.
B Goal: use existing graph structure and other information to mergeduplicated entities.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 68
Link-based clustering
B Goal: Group entities based on their similarity.
B In contrast to feature-based clustering, in link-based clustering entities arenot only grouped by the similarity of their features but also by the similarityof their links.
B E.g. community detection in social networks.
2This Wikipedia and Wikimedia Commons image is from the user Daniel Tenerife and isfreely available at //commons.wikimedia.org/wiki/File:Social Red.jpg under the creativecommons license.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 69
Related problems
B You will learn to solve the problems we just saw in this lecture.
B They also play a role for:
• merging of KGs• similarity estimation of entities• suggestion of friends in social networks• ...
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 70
Machine Learning on KGs
Methods that play a role for statistical relational learning techniques covered inthis class :
B tensor factorization
B neural networks
B probabilistic approaches
B optimization techniques (like stochastic gradient descent)
B ...
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 71
Further Research Problems
...which can not be addressed by methods learned in the lecture:
B Growth: knowledge graphs are incomplete!
• Ontology matching: connect graphs• Knowledge extraction: extract new entities and relations from web/text
B Validation: knowledge graphs are not always correct!
• Entity resolution: split wrongly merged entities• Error detection: remove false assertions• Fact validation: find evidence for / against facts being true
B Interface: how to make it easier to access knowledge?
• Semantic parsing: interpret the meaning of queries• Question answering: compute answers using the knowledge graph
B Intelligence: can AI emerge from knowledge graphs?
• Automatic reasoning and planning• Generalization and abstraction• Clustering and community detection
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 72
Opportunities in the Smart Data Analyics Group
B Master thesis on most topics discussed in this lecture
B Machine Learning on knowledge graphs (and in general)
B Semantic technologies
B Spatiotemporal prediction
B Question answering on knowledge graphs
B We are looking for students and offering PhD opportunities!
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 73
SDA Project: SANSA
B SANSA will be a suite of APIs fordistributed reading, querying, infer-encing and analyzing of RDF knowl-edge graphs
SANSA (Semantic Analytics Stack)
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 74
Summary
B KGs facilitate mapping and understanding of the world in a way that issimilar to how humans understand it
B KGs utilize not just information, but knowledge, and thus are an integralpart toward knowledge-powered assistance⇒ a better, more interactive user interface⇒ understanding of user queries instead of simple string-matching
B KGs naturally allow for efficient pattern matching: numerous applicationsfrom fraud detection to spam filters to computational molecular biology
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 75
The End
Dr. Hamed Shariat [email protected]
University of Bonn
Prof. Dr. Jens Lehmannhttp://jens-lehmann.org
University of BonnFraunhofer IAIS
Thanks for your attention!
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 76
References I
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives.
Dbpedia: A nucleus for a web of open data.
In The semantic web, pages 722–735. Springer, 2007.
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr, and T. M.Mitchell.
Toward an architecture for never-ending language learning.
In AAAI, volume 5, page 3, 2010.
V. I. Eric Sun.
Under the hood: The entities graph.
https://www.facebook.com/notes/facebook-engineering/
under-the-hood-the-entities-graph/10151490531588920/.
Q. He.
Building the linkedin knowledge graph.
https://en.wikipedia.org/wiki/Knowledge_Graph.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 77
References II
M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich.
A review of relational machine learning for knowledge graphs.
arXiv preprint arXiv:1503.00759, 2015.
D. R. Qian.
Understand your world with bing.
http:
//blogs.bing.com/search/2013/03/21/understand-your-world-with-bing/.
I. Robinson, J. Webber, and E. Eifrem.
Graph Databases: New Opportunities for Connected Data.
” O’Reilly Media, Inc.”, 2015.
A. Singhal.
Introducing the knowledge graph: things, not strings.
https://googleblog.blogspot.de/2012/05/
introducing-knowledge-graph-things-not.html.
Dr. Hamed Shariat Yazdi, Prof. Jens Lehmann Knowledge Graph Analysis 78