gstore: a graph-based sparql query engine
TRANSCRIPT
gStore: A Graph-based SPARQL Query Engine
M. Tamer Ozsu
University of WaterlooDavid R. Cheriton School of Computer Science
Joint work with Lei Zou, Peking University and Lei Chen, HongKong University of Science and Technology
1UPenn/2012-12-04
RDF and Semantic Web
RDF is a language for the conceptual modeling of informationabout web resources
A building block of semantic web
Facilitates exchange of informationSearch engines can retrieve more relevant informationFacilitates data integration (mashes)
Machine understandable
Understand the information on the web and theinterrelationships among them
2UPenn/2012-12-04
RDF Uses
Yago and DBPedia extract facts from Wikipedia & representas RDF → structural queries
Communities build RDF data
E.g., biologists: Bio2RDF and Uniprot RDF
Web data integration
Linked Data Cloud
. . .
3UPenn/2012-12-04
RDF Data Volumes . . .
. . . are growing – and fast
Linked data cloud currently consists of 325 datasets with>25B triplesSize almost doubling every year
4UPenn/2012-12-04
RDF Data Volumes . . .
. . . are growing – and fast
Linked data cloud currently consists of 325 datasets with>25B triplesSize almost doubling every year
As of March 2009
LinkedCTReactome
Taxonomy
KEGG
PubMed
GeneID
Pfam
UniProt
OMIM
PDB
SymbolChEBI
Daily Med
Disea-some
CAS
HGNC
InterPro
Drug Bank
UniParc
UniRef
ProDom
PROSITE
Gene Ontology
HomoloGene
PubChem
MGI
UniSTS
GEOSpecies
Jamendo
BBCProgramm
es
Music-brainz
Magna-tune
BBCLater +TOTP
SurgeRadio
MySpaceWrapper
Audio-Scrobbler
LinkedMDB
BBCJohnPeel
BBCPlaycount
Data
Gov-Track
US Census Data
riese
Geo-names
lingvoj
World Fact-book
Euro-stat
IRIT Toulouse
SWConference
Corpus
RDF Book Mashup
Project Guten-berg
DBLPHannover
DBLPBerlin
LAAS- CNRS
Buda-pestBME
IEEE
IBM
Resex
Pisa
New-castle
RAE 2001
CiteSeer
ACM
DBLP RKB
Explorer
eprints
LIBRIS
SemanticWeb.org Eurécom
ECS South-ampton
RevyuSIOCSites
Doap-space
Flickrexporter
FOAFprofiles
flickrwrappr
CrunchBase
Sem-Web-
Central
Open-Guides
Wiki-company
QDOS
Pub Guide
Open Calais
RDF ohloh
W3CWordNet
OpenCyc
UMBEL
Yago
DBpediaFreebase
Virtuoso Sponger
March ’09:89 datasets
4UPenn/2012-12-04
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.http://lod-cloud.net/
RDF Data Volumes . . .
. . . are growing – and fast
Linked data cloud currently consists of 325 datasets with>25B triplesSize almost doubling every year
As of September 2010
MusicBrainz
(zitgist)
P20
YAGO
World Fact-book (FUB)
WordNet (W3C)
WordNet(VUA)
VIVO UFVIVO
Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UMBEL
UK Post-codes
legislation.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov
.uk
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
The Open Library (Talis)
t4gm
Surge Radio
STW
RAMEAU SH
statisticsdata.gov
.uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
Semantic CrunchBase
semanticweb.org
SemanticXBRL
SWDog Food
rdfabout US SEC
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAAS
KISTIJISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints
dotAC
DEPLOY
DBLP (RKB
Explorer)
Course-ware
CORDIS
CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov
.uk
referencedata.gov
.uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
PSH
ProductDB
PBAC
Poké-pédia
Ord-nance Survey
Openly Local
The Open Library
OpenCyc
OpenCalais
OpenEI
New York
Times
NTU Resource
Lists
NDL subjects
MARC Codes List
Man-chesterReading
Lists
Lotico
The London Gazette
LOIUS
lobidResources
lobidOrgani-sations
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
Linked Open
Numbers
lingvoj
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Good-win
Family
Jamendo
iServe
NSZL Catalog
GovTrack
GESIS
GeoSpecies
GeoNames
GeoLinkedData(es)
GTAA
STITCHSIDER
Project Guten-berg (FUB)
MediCare
Euro-stat
(FUB)
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
Freebase
flickr wrappr
Fishes of Texas
FanHubz
Event-Media
EUTC Produc-
tions
Eurostat
EUNIS
ESD stan-dards
Popula-tion (En-AKTing)
NHS (EnAKTing)
Mortality (En-
AKTing)Energy
(En-AKTing)
CO2(En-
AKTing)
educationdata.gov
.uk
ECS South-ampton
Gem. Norm-datei
datadcs
MySpace(DBTune)
MusicBrainz
(DBTune)
Magna-tune
John Peel(DB
Tune)
classical(DB
Tune)
Audio-scrobbler (DBTune)
Last.fmArtists
(DBTune)
DBTropes
dbpedia lite
DBpedia
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Discogs(Data In-cubator)
Climbing
Linked Data for Intervals
Cornetto
Chronic-ling
America
Chem2Bio2RDF
biz.data.
gov.uk
UniSTS
UniRef
UniPath-way
UniParc
Taxo-nomy
UniProt
SGD
Reactome
PubMed
PubChem
PRO-SITE
ProDom
Pfam PDB
OMIM
OBO
MGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Cpd
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
GenBank
ChEBI
CAS
Affy-metrix
BibBaseBBC
Wildlife Finder
BBC Program
mesBBC
Music
rdfaboutUS Census
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
September ’10:203 datasets
4UPenn/2012-12-04
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.http://lod-cloud.net/
RDF Data Volumes . . .
. . . are growing – and fast
Linked data cloud currently consists of 325 datasets with>25B triplesSize almost doubling every year
As of September 2011
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
September ’11:295 datasets
4UPenn/2012-12-04
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.http://lod-cloud.net/
Outline
1 RDF Introduction
2 Answering Regular SPARQL QueriesAnswering SPARQL queries using graph pattern matching[ZouMoZhaoChenOzsu, PVLDB 2011]
3 AggregationAnswering Aggregate SPARQL Queries Over Large RDFRepositories
4 Future Research
5UPenn/2012-12-04
Outline
1 RDF Introduction
2 Answering Regular SPARQL QueriesAnswering SPARQL queries using graph pattern matching[ZouMoZhaoChenOzsu, PVLDB 2011]
3 AggregationAnswering Aggregate SPARQL Queries Over Large RDFRepositories
4 Future Research
6UPenn/2012-12-04
RDF Introduction
Everything is an uniquely namedresource
Namespaces can be used to scopethe names
Properties of resources can bedefined
Relationships with other resourcescan be defined
Resources can be contributed bydifferent people/groups and can belocated anywhere in the web
Integrated web “database”
http://en.wikipedia.org/wiki/Abraham Lincoln
xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln
Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”
y:Washington DC
Abraham Lincoln:DiedIn
7UPenn/2012-12-04
RDF Introduction
Everything is an uniquely namedresource
Namespaces can be used to scopethe names
Properties of resources can bedefined
Relationships with other resourcescan be defined
Resources can be contributed bydifferent people/groups and can belocated anywhere in the web
Integrated web “database”
http://en.wikipedia.org/wiki/Abraham Lincoln
xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln
Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”
y:Washington DC
Abraham Lincoln:DiedIn
7UPenn/2012-12-04
RDF Introduction
Everything is an uniquely namedresource
Namespaces can be used to scopethe names
Properties of resources can bedefined
Relationships with other resourcescan be defined
Resources can be contributed bydifferent people/groups and can belocated anywhere in the web
Integrated web “database”
http://en.wikipedia.org/wiki/Abraham Lincoln
xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln
Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”
y:Washington DC
Abraham Lincoln:DiedIn
7UPenn/2012-12-04
RDF Introduction
Everything is an uniquely namedresource
Namespaces can be used to scopethe names
Properties of resources can bedefined
Relationships with other resourcescan be defined
Resources can be contributed bydifferent people/groups and can belocated anywhere in the web
Integrated web “database”
http://en.wikipedia.org/wiki/Abraham Lincoln
xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln
Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”
y:Washington DC
Abraham Lincoln:DiedIn
7UPenn/2012-12-04
RDF Introduction
Everything is an uniquely namedresource
Namespaces can be used to scopethe names
Properties of resources can bedefined
Relationships with other resourcescan be defined
Resources can be contributed bydifferent people/groups and can belocated anywhere in the web
Integrated web “database”
http://en.wikipedia.org/wiki/Abraham Lincoln
xmlns:y=http://en.wikipedia.org/wikiy:Abraham Lincoln
Abraham Lincoln:hasName “Abraham Lincoln”Abraham Lincoln:BornOnDate: “1809-02-12”Abraham Lincoln:DiedOnDate: “1865-04-15”
y:Washington DC
Abraham Lincoln:DiedIn
7UPenn/2012-12-04
RDF Data Model
Triple: Subject, Predicate (Property),Object (s, p, o)
Subject: the entity that is described(URI or blank node)
Predicate: a feature of the entity (URI)Object: value of the feature (URI,
blank node or literal)
(s, p, o) ∈ (U ∪ B)× U × (U ∪ B ∪ L)
Set of RDF triples is called an RDF graph
U
Subject Object
U B U B L
U: set of URIsB: set of blank nodesL: set of literals
Predicate
Subject Predicate ObjectAbraham Lincoln hasName “Abraham Lincoln”Abraham Lincoln BornOnDate “1809-02-12”Abraham Lincoln DiedOnDate “1865-04-15”
8UPenn/2012-12-04
RDF Example InstancePrefix: y=http://en.wikipedia.org/wiki
Subject Predicate Object
y: Abraham Lincoln hasName “Abraham Lincoln”y: Abraham Lincoln BornOnDate “1809-02-12”’y: Abraham Lincoln DiedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy: Abraham Lincoln DiedIn y: Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y: Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roosevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”
URI
Literal
URI
9UPenn/2012-12-04
RDF Graph
y:Abraham Lincoln
“Abraham Lincoln”
hasName
“1809-02-12”bornOnDate
“1865-04-15”
diedOnDate
“President”
title
“Male”
gender
y:Washington D.C.
“1790”
foundYear
“Washington D.C.”
hasName
y:Hodgenville KY “Hodgenville”hasName
y:United States
“United States”
hasName
“1776”
foundingYear
y:Reese Witherspoon
“1976-03-22”
bornOnDate
“Female”
gender
“Actress”
title
“Reese Witherspoon”
hasName
y:New Orleans LA
“1718”
foundingYear
y:Franklin Roosevelt
“Franklin D. Roosevelt”
hasName
“Male”
gender
“President”
title
y:Hyde Park NY
“1810”
foundingYear
y:Marilyn Monroe“1962-08-05”diedOnDate
“1926-07-01”
bornOnDate
“Female”
gender
“Marilyn Monroe”hasName
diedIn
bornIn
hasCapital
bornInlocatedIn locatedIn
bornIn
10UPenn/2012-12-04
RDF Query ModelQuery Model - SPARQL Protocol and RDF Query LanguageGiven U (set of URIs), L (set of literals), and V (set ofvariables), a SPARQL expression is defined recursively:
an atomic triple pattern, which is an element of
(U ∪ V )× (U ∪ V )× (U ∪ V ∪ L)
?x hasName “Abraham Lincoln”
P FILTER R, where P is a graph pattern expression and R is abuilt-in SPARQL condition (i.e., analogous to a SQL predicate)
?x price ?p FILTER(?p < 30)
P1 AND/OPT/UNION P2, where P1 and P2 are graphpattern expressions
Example:SELECT ?nameWHERE {
?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )
}11UPenn/2012-12-04
SPARQL Queries
SELECT ?nameWHERE {
?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )
}
?m ?citybornIn
?name
hasName
?bd
bornOnDate
“1718”
foundingYear
FILTER(regex(str(?bd),“1976”))
12UPenn/2012-12-04
Naıve Triple Store DesignSELECT ?nameWHERE {
?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )
}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-
sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”
Too many self-joins!
13UPenn/2012-12-04
Naıve Triple Store DesignSELECT ?nameWHERE {
?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )
}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-
sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”
SELECT T2 . o b j e c tFROM T as T1 , T as T2 , T as T3 ,
T as T4WHERE T1 . p r o p e r t y=” b o r n I n ”AND T2 . p r o p e r t y=”hasName”AND T3 . p r o p e r t y=” bornOnDate ”AND T1 . s u b j e c t=T2 . s u b j e c tAND T2 . s u b j e c t=T3 . s u b j e c tAND T4 . p r o p e t y=” f o u n d i n g Y e a r ”AND T1 . o b j e c t=T4 . s u b j e c tAND T4 . o b j e c t=” 1718 ”AND T3 . o b j e c t LIKE ’%1976% ’
Too many self-joins!
13UPenn/2012-12-04
Naıve Triple Store DesignSELECT ?nameWHERE {
?m <born In> ? c i t y . ?m <hasName> ?name .?m<bornOnDate> ?bd . ? c i t y <found ingYear> ‘ ‘1718 ’ ’ .FILTER( regex ( s t r (? bd ) , ‘ ‘ 1 9 7 6 ’ ’ ) )
}Subject Property Objecty:Abraham Lincoln hasName “Abraham Lincoln”y:Abraham Lincoln bornOnDate “1809-02-12”y:Abraham Lincoln diedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy:Abraham Lincoln diedIn y:Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y:Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roo-
sevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”
SELECT T2 . o b j e c tFROM T as T1 , T as T2 , T as T3 ,
T as T4WHERE T1 . p r o p e r t y=” b o r n I n ”AND T2 . p r o p e r t y=”hasName”AND T3 . p r o p e r t y=” bornOnDate ”AND T1 . s u b j e c t=T2 . s u b j e c tAND T2 . s u b j e c t=T3 . s u b j e c tAND T4 . p r o p e t y=” f o u n d i n g Y e a r ”AND T1 . o b j e c t=T4 . s u b j e c tAND T4 . o b j e c t=” 1718 ”AND T3 . o b j e c t LIKE ’%1976% ’
Too many self-joins!
13UPenn/2012-12-04
Existing Solutions1. Property table
Each class of objects go to a different table ⇒ similar tonormalized relationsEliminates some of the joins
2. Vertically partitioned tablesFor each property, build a two-column table, containing bothsubject and object, ordered by subjectsCan use merge join (faster)Good for subject-subject joins but does not help withsubject-object joins
3. Exhaustive indexingCreate indexes for each permutation of the three columnsQuery components become range queries over individualrelations with merge-join to combineExcessive space usage
None can handle wildcard queries easily
Most cannot handle updates
14UPenn/2012-12-04
Existing Solutions1. Property table
Each class of objects go to a different table ⇒ similar tonormalized relationsEliminates some of the joins
2. Vertically partitioned tablesFor each property, build a two-column table, containing bothsubject and object, ordered by subjectsCan use merge join (faster)Good for subject-subject joins but does not help withsubject-object joins
3. Exhaustive indexingCreate indexes for each permutation of the three columnsQuery components become range queries over individualrelations with merge-join to combineExcessive space usage
None can handle wildcard queries easily
Most cannot handle updates
14UPenn/2012-12-04
Outline
1 RDF Introduction
2 Answering Regular SPARQL QueriesAnswering SPARQL queries using graph pattern matching[ZouMoZhaoChenOzsu, PVLDB 2011]
3 AggregationAnswering Aggregate SPARQL Queries Over Large RDFRepositories
4 Future Research
15UPenn/2012-12-04
Our Approach – General Idea
We work directly on the RDF graph and the SPARQL querygraph
Answering SPARQL query ≡ subgraph matchingSubgraph matching is computationally expensive
Use a signature-based encoding of each entity and class vertexto speed up matching
Filter-and-evaluate
Use a false positive algorithm to prune nodes and obtain a setof candidates; then do more detailed evaluation on those
We develop an index (VS∗-tree) over the data signature graph(has light maintenance load) for efficient pruning
16UPenn/2012-12-04
Why not ordinary graph pattern matching?
1. RDF graphs are much larger than what is considered in usualgraph pattern matching (by orders of magnitude)
2. Cardinality of vertices and edges in an RDF graph is muchlarger than in traditional graph databases
AIDS dataset: 10,000 data graphs, each with avg 20 vertices& 25 edges with a total size 5MBYago RDF: 500M vertices with a total size >3GB
3. SPARQL queries have special structure
Star subqueries that use several properties of the same entity
17UPenn/2012-12-04
0. Start with RDF Graph G
?m ?citybornIn
?name
hasName
?bd
bornOnDate
“1718”
foundingYear
FILTER(regex(str(?bd),“1976”))
y:Abraham Lincoln
“Abraham Lincoln”
hasName
“1809-02-12”bornOnDate
“1865-04-15”
diedOnDate
“President”
title
“Male”
gender
y:Washington D.C.
“1790”
foundYear
“Washington D.C.”
hasName
y:Hodgenville KY “Hodgenville”hasName
y:United States
“United States”
hasName
“1776”
foundingYear
y:Reese Witherspoon
“1976-03-22”
bornOnDate
“Female”
gender
“Actress”
title
“Reese Witherspoon”
hasName
y:New Orleans LA
“1718”
foundingYear
y:Franklin Roosevelt
“Franklin D. Roosevelt”
hasName
“Male”
gender
“President”
title
y:Hyde Park NY
“1810”
foundingYear
y:Marilyn Monroe“1962-08-05”diedOnDate
“1926-07-01”
bornOnDate
“Female”
gender
“Marilyn Monroe”hasName
diedIn
bornIn
hasCapital
bornInlocatedIn locatedIn
bornIn
18UPenn/2012-12-04
0. Start with RDF Graph G
?m ?citybornIn
?name
hasName
?bd
bornOnDate
“1718”
foundingYear
FILTER(regex(str(?bd),“1976”))
y:Abraham Lincoln
“Abraham Lincoln”
hasName
“1809-02-12”bornOnDate
“1865-04-15”
diedOnDate
“President”
title
“Male”
gender
y:Washington D.C.
“1790”
foundYear
“Washington D.C.”
hasName
y:Hodgenville KY “Hodgenville”hasName
y:United States
“United States”
hasName
“1776”
foundingYear
y:Reese Witherspoon
“1976-03-22”
bornOnDate
“Female”
gender
“Actress”
title
“Reese Witherspoon”
hasName
y:New Orleans LA
“1718”
foundingYear
y:Franklin Roosevelt
“Franklin D. Roosevelt”
hasName
“Male”
gender
“President”
title
y:Hyde Park NY
“1810”
foundingYear
y:Marilyn Monroe“1962-08-05”diedOnDate
“1926-07-01”
bornOnDate
“Female”
gender
“Marilyn Monroe”hasName
diedIn
bornIn
hasCapital
bornInlocatedIn locatedIn
bornIn
18UPenn/2012-12-04
Finding matches over a largegraph is not a trivial task!
1. Represent G as Adjacency List and Encode Each Vertex
Prefix: y=http://en.wikipedia.org/wiki/Label adjList...
...
y:Reese Witherspoon (BornOnDate,“1976-03-22”), (gender,“Female”),(title,“Actress”), (hasName,“Reese Witherspoon”),(BornIn,New Orleans LA)
......
19UPenn/2012-12-04
1. Represent G as Adjacency List and Encode Each Vertex
Prefix: y=http://en.wikipedia.org/wiki/Label adjList...
...
y:Reese Witherspoon (BornOnDate,“1976-03-22”), (gender,“Female”),(title,“Actress”), (hasName,“Reese Witherspoon”),(BornIn,New Orleans LA)
......
Encode all neighbors of a vertex into a bit stringvertex signature0010 1000
19UPenn/2012-12-04
2. Construct Data Signature Graph G ∗
0000 0001 0001 100010000
0011 1000
00001
0010 1000
1000 0100
10000
1000 0001
10000
01000
0100 000001000
0010 1000
10000
0010 0000
001 003
002
005
006
004 008
009
20UPenn/2012-12-04
3. Encode Q to Get Signature Graph Q∗
Q
?m ?citybornIn
?name
hasName
?bd
bornOnDate
“1718”
foundingYear
FILTER(regex(str(?bd),“1976”))
0010 1000 1000 000010000
Q∗
21UPenn/2012-12-04
4. Filter-and-Evaluate
0010 1000 1000 000010000
Query signature graph Q∗
0000 0001 0001 100010000
0011 1000
00001
0010 1000
1000 0100
10000
1000 0001
10000
01000
0100 000001000
0010 1000
10000
0010 0000
Data signature graph G ∗
Find matches of Q∗ oversignature graph G ∗
Verify each match inRDF graph G
22UPenn/2012-12-04
How to Generate Candidate List
Two step process:1. Run an inclusion query for each node of Q∗ and get lists of
nodes in G∗ that include that node.
Given query signature q and a set of data signatures S , findall data signatures si ∈ S where q&si = q
2. Do a multi-way join to get the candidate list
Alternatives:
Sequential scan of G∗
Both steps are inefficient
Use S-trees
Height-balanced tree over signaturesSupport inclusion queries wellDoes not support second step – expensive
VS-tree (and VS∗-tree)
Multi-resolution summary graph based on S-treeSupports both steps efficiently
23UPenn/2012-12-04
How to Generate Candidate List
Two step process:1. Run an inclusion query for each node of Q∗ and get lists of
nodes in G∗ that include that node.
Given query signature q and a set of data signatures S , findall data signatures si ∈ S where q&si = q
2. Do a multi-way join to get the candidate list
Alternatives:
Sequential scan of G∗
Both steps are inefficient
Use S-trees
Height-balanced tree over signaturesSupport inclusion queries wellDoes not support second step – expensive
VS-tree (and VS∗-tree)
Multi-resolution summary graph based on S-treeSupports both steps efficiently
23UPenn/2012-12-04
How to Generate Candidate List
Two step process:1. Run an inclusion query for each node of Q∗ and get lists of
nodes in G∗ that include that node.
Given query signature q and a set of data signatures S , findall data signatures si ∈ S where q&si = q
2. Do a multi-way join to get the candidate list
Alternatives:Sequential scan of G∗
Both steps are inefficient
Use S-trees
Height-balanced tree over signaturesSupport inclusion queries wellDoes not support second step – expensive
VS-tree (and VS∗-tree)
Multi-resolution summary graph based on S-treeSupports both steps efficiently
23UPenn/2012-12-04
How to Generate Candidate List
Two step process:1. Run an inclusion query for each node of Q∗ and get lists of
nodes in G∗ that include that node.
Given query signature q and a set of data signatures S , findall data signatures si ∈ S where q&si = q
2. Do a multi-way join to get the candidate list
Alternatives:Sequential scan of G∗
Both steps are inefficient
Use S-trees
Height-balanced tree over signaturesSupport inclusion queries wellDoes not support second step – expensive
VS-tree (and VS∗-tree)
Multi-resolution summary graph based on S-treeSupports both steps efficiently
23UPenn/2012-12-04
How to Generate Candidate List
Two step process:1. Run an inclusion query for each node of Q∗ and get lists of
nodes in G∗ that include that node.
Given query signature q and a set of data signatures S , findall data signatures si ∈ S where q&si = q
2. Do a multi-way join to get the candidate list
Alternatives:Sequential scan of G∗
Both steps are inefficient
Use S-trees
Height-balanced tree over signaturesSupport inclusion queries wellDoes not support second step – expensive
VS-tree (and VS∗-tree)
Multi-resolution summary graph based on S-treeSupports both steps efficiently
23UPenn/2012-12-04
S-tree Solution
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0100 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4
G 3
G 2
G 1
1000 00000010 100010000
0010 1000
0011 1000
0010 1000 0010 1000
002
005
007
1000 0000
1000 0001
1000 0100
004
006
004
006on
0010 1000 1000 0100
Large join space!
24UPenn/2012-12-04
S-tree Solution
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0100 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4
G 3
G 2
G 1
1000 00000010 100010000
0010 1000
0011 1000
0010 1000 0010 1000
002
005
007
1000 0000
1000 0001
1000 0100
004
006
004
006on
0010 1000 1000 0100
Large join space!
24UPenn/2012-12-04
S-tree Solution
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0100 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4
G 3
G 2
G 1
1000 00000010 100010000
0010 1000
0011 1000
0010 1000 0010 1000
002
005
007
1000 0000
1000 0001
1000 0100
004
006
004
006on
0010 1000 1000 0100
Large join space!
24UPenn/2012-12-04
S-tree Solution
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0100 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4
G 3
G 2
G 1
1000 00000010 100010000
0010 1000
0011 1000
0010 1000 0010 1000
002
005
007
1000 0000
1000 0001
1000 0100
004
006
004
006on
0010 1000 1000 0100
Large join space!
24UPenn/2012-12-04
S-tree Solution
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0100 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4
G 3
G 2
G 1
1000 00000010 100010000
0010 1000
0011 1000
0010 1000 0010 1000
002
005
007
1000 0000
1000 0001
1000 0100
004
006
004
006on
0010 1000 1000 0100
Large join space!
24UPenn/2012-12-04
S-tree Solution
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0100 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4
G 3
G 2
G 1
1000 00000010 100010000
0010 1000
0011 1000
0010 1000 0010 1000
002
005
007
1000 0000
1000 0001
1000 0100
004
006
004
006on
0010 1000 1000 0100
Large join space!
24UPenn/2012-12-04
VS-tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
Super edge
25UPenn/2012-12-04
Pruning with VS-Tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
0010 1000 1000 000010000
d31
d32
d34
G 3
10000
005
002
007
004
006on
Reduced join space!
Performance
26UPenn/2012-12-04
Pruning with VS-Tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
0010 1000 1000 000010000
d31
d32
d34
G 3
10000
005
002
007
004
006on
Reduced join space!
Performance
26UPenn/2012-12-04
Pruning with VS-Tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
0010 1000 1000 000010000
d31
d32
d34
G 3
10000
005
002
007
004
006on
Reduced join space!
Performance
26UPenn/2012-12-04
Pruning with VS-Tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
0010 1000 1000 000010000
d31
d32
d34
G 3
10000
005
002
007
004
006on
Reduced join space!
Performance
26UPenn/2012-12-04
Pruning with VS-Tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
0010 1000 1000 000010000
d31
d32
d34
G 3
10000
005
002
007
004
006on
Reduced join space!
Performance
26UPenn/2012-12-04
Pruning with VS-Tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
0010 1000 1000 000010000
d31
d32
d34
G 3
10000
005
002
007
004
006on
Reduced join space!
Performance 26UPenn/2012-12-04
Evaluation of VS-tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
Very dense
in higher levelsof VS-tree
Lots of summary matches
in higher levelsof VS-tree
Reduced
pruning power
27UPenn/2012-12-04
Evaluation of VS-tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
Very dense
in higher levelsof VS-tree
Lots of summary matches
in higher levelsof VS-tree
Reduced
pruning power
27UPenn/2012-12-04
Evaluation of VS-tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
Very dense
in higher levelsof VS-tree
Lots of summary matches
in higher levelsof VS-tree
Reduced
pruning power
27UPenn/2012-12-04
Evaluation of VS-tree
1111 1101
0011 1001 1101 1101
0010 1001 0011 1000 0101 1000 1000 0101
0000 0001
0010 1000 0010 0000
0011 1000
0010 1000
0001 1000
0001 0000
1000 0001
1000 0100
001
005 009
002
007
003
008
004
006
d11
d21 d2
2
d31 d3
2 d33 d3
4G 3
G 2
G 1
11101
00100
10000
00001 01000
10000
00001
00010
10000 01000 00100
00001
10000
10000
10000
01000
01000
01000
00100
Very dense
in higher levelsof VS-tree
Lots of summary matches
in higher levelsof VS-tree
Reduced
pruning power
27UPenn/2012-12-04
Possible Optimizations
1. Do not start the multi-way join from the root
Find an oracle that determines the best place to start the joinCost-based
2. Do not materialize summary matches at higher levels ofVS-tree
Use a DFS-search
3. Reduce the number of super-edges when building the VS-tree
Reduce the number of vertices in each list
28UPenn/2012-12-04
VS*-tree
1001 1000
1100 0000 1011 0001
1000 0000 0100 0000 1001 0000 0010 0001
d11
d21 d2
2
01000
10000 00010
10000 00010
01000
G 1
G 2
G ∗
1001 1000
1100 0000 1011 0001
1000 0000
0100 0000
0000 1000 1001 0000 0010 0001
d11
d21 d2
2
01000
10000 00010
10000
00010
01000
10000
10000
G 1
G 2
G ∗
Enter new vertex signature u intonon-leaf node d where
Hamming(u, d) is minimum
29UPenn/2012-12-04
VS*-tree
1001 1000
1100 0000 1011 0001
1000 0000 0100 0000 1001 0000 0010 0001
d11
d21 d2
2
01000
10000 00010
10000 00010
01000
G 1
G 2
G ∗
1001 1000
1100 0000 1011 0001
1000 0000 0100 0000 1001 0000
0010 0001
0000 1000
d11
d21 d2
2
01000
10000 00010
10000
00010
01000
10000
G 1
G 2
G ∗
Enter new vertex signature u intonon-leaf node d considering
Hamming distance &no of supernodes introduced
29UPenn/2012-12-04
Pruning Effect
? x1 y : hasGivenName ? x5? x1 y : hasFamilyName ? x6? x1 r d f : t y p e
<w o r d n e t s c i e n t i s t 1 1 0 5 6 0 6 3 7 >? x1 y : BornIn ? x2? x1 y : h a s A c a d e m i c A d v i s o r ? x4? x2 y : l o c a t e d I n <S w i t z e r l a n d >? x3 y : l o c a t e d I n <Germany>? x4 y : b o r n I n ? x3
Before AfterPruning Pruning
x1 810 810x2 424 197x3 66 66x4 36187 6686
30UPenn/2012-12-04
Exact Queries
A1 A2 B1 B2 B3 C1 C20
1,000
2,000
3,000
4,000
5,000
6,000
Queries
Qu
ery
Res
pon
seT
ime
(in
ms)
Yago network (20 million triples & size 3.1GB)
gStore RDF-3X SW-Store x-RDF-3x BigOWLIM GRIN
31UPenn/2012-12-04
Wildcard Queries
A1 A2 B1 B2 B3 C1 C20
2,000
4,000
6,000
8,000
10,000
Queries
Qu
ery
Res
pon
seT
ime
(in
ms)
Yago network (20 million triples & size 3.1GB)
gStore RDF-3X+PF SW-Store+PFx-RDF-3x+PF BigOWLIM GRIN+PF
32UPenn/2012-12-04
Outline
1 RDF Introduction
2 Answering Regular SPARQL QueriesAnswering SPARQL queries using graph pattern matching[ZouMoZhaoChenOzsu, PVLDB 2011]
3 AggregationAnswering Aggregate SPARQL Queries Over Large RDFRepositories
4 Future Research
33UPenn/2012-12-04
Aggregation Queries
(Q1) Group all individuals by theirgenders, titles, and the founding yearof their birth places, and report thenumber of individuals in each group.SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .
?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}
GROUP BY ? t ?g , ? y .
(Q2) Group all individuals by their genders,titles, and the founding year of theirbirth places, and report the numberof individuals in each group for thosegroups with more than 10 members.SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .
?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}
GROUP BY ? t ?g , ? y .HAVING COUNT(?m)>10.
Prefix: y=http://en.wikipedia.org/wikiSubject Predicate Objecty: Abraham Lincoln hasName “Abraham Lincoln”y: Abraham Lincoln BornOnDate “1809-02-12”’y: Abraham Lincoln DiedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy: Abraham Lincoln DiedIn y: Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y: Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roosevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”
34UPenn/2012-12-04
Aggregation Queries
(Q1) Group all individuals by theirgenders, titles, and the founding yearof their birth places, and report thenumber of individuals in each group.SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .
?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}
GROUP BY ? t ?g , ? y .
(Q2) Group all individuals by their genders,titles, and the founding year of theirbirth places, and report the numberof individuals in each group for thosegroups with more than 10 members.SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .
?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}
GROUP BY ? t ?g , ? y .HAVING COUNT(?m)>10.
Prefix: y=http://en.wikipedia.org/wikiSubject Predicate Objecty: Abraham Lincoln hasName “Abraham Lincoln”y: Abraham Lincoln BornOnDate “1809-02-12”’y: Abraham Lincoln DiedOnDate “1865-04-15”y:Abraham Lincoln bornIn y:Hodgenville KYy: Abraham Lincoln DiedIn y: Washington DCy:Abraham Lincoln title “President”y:Abraham Lincoln gender “Male”y: Washington DC hasName “Washington D.C.”y:Washington DC foundingYear “1790”y:Hodgenville KY hasName “Hodgenville”y:United States hasName “United States”y:United States hasCapital y:Washington DCy:United States foundingYear “1776”y:Reese Witherspoon bornOnDate “1976-03-22”y:Reese Witherspoon bornIn y:New Orleans LAy:Reese Witherspoon hasName “Reese Witherspoon”y:Reese Witherspoon gender “Female”y:Reese Witherspoon title “Actress”y:New Orleans LA foundingYear “1718”y:New Orleans LA locatedIn y:United Statesy:Franklin Roosevelt hasName “Franklin D. Roosevelt”y:Franklin Roosevelt bornIn y:Hyde Park NYy:Franklin Roosevelt title “President”y:Franklin Roosevelt gender “Male”y:Hyde Park NY foundingYear “1810”y:Hyde Park NY locatedIn y:United Statesy:Marilyn Monroe gender “Female”y:Marilyn Monroe hasName “Marilyn Monroe”y:Marilyn Monroe bornOnDate “1926-07-01”y:Marilyn Monroe diedOnDate “1962-08-05”
34UPenn/2012-12-04
Aggregate Query ResultsSELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .
?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}
GROUP BY ? t ?g , ? y .
v1
gender title
?mv2
foundingYearbornIn
gender title foundingYear COUNT(?m)
Male President 1718 1
Female Actress 1810 1
Aggregate Result
y:Abraham Lincoln
“Abraham Lincoln”
hasName
“1809-02-12”bornOnDate
“1865-04-15”
diedOnDate
“President”
title
“Male”
gender
y:Washington D.C.
“1790”
foundYear
“Washington D.C.”
hasName
y:Hodgenville KY “Hodgenville”hasName
y:United States
“United States”
hasName
“1776”
foundingYear
y:Reese Witherspoon
“1976-03-22”
bornOnDate
“Female”
gender
“Actress”
title
“Reese Witherspoon”
hasName
y:New Orleans LA
“1718”
foundingYear
y:Franklin Roosevelt
“Franklin D. Roosevelt”
hasName
“Male”
gender
“President”
title
y:Hyde Park NY
“1810”
foundingYear
y:Marilyn Monroe“1962-08-05”diedOnDate
“1926-07-01”
bornOnDate
“Female”
gender
“Marilyn Monroe”hasName
001
010
011
012
013 014
002
015016
003017
004
018
019
005
020
021
022 023 006
024
007
025 026
027
008
028
009029
030 031
032
diedIn
bornIn
hasCapital
bornInlocatedIn locatedIn
bornIn
36UPenn/2012-12-04
Aggregate Query ResultsSELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .
?m <t i t l e > ? t .?m <gender> ?g .? c <found ingYear> ? y .}
GROUP BY ? t ?g , ? y .
v1
gender title
?mv2
foundingYearbornIn
gender title foundingYear COUNT(?m)
Male President 1718 1
Female Actress 1810 1
Aggregate Result
y:Abraham Lincoln
“Abraham Lincoln”
hasName
“1809-02-12”bornOnDate
“1865-04-15”
diedOnDate
“President”
title
“Male”
gender
y:Washington D.C.
“1790”
foundYear
“Washington D.C.”
hasName
y:Hodgenville KY “Hodgenville”hasName
y:United States
“United States”
hasName
“1776”
foundingYear
y:Reese Witherspoon
“1976-03-22”
bornOnDate
“Female”
gender
“Actress”
title
“Reese Witherspoon”
hasName
y:New Orleans LA
“1718”
foundingYear
y:Franklin Roosevelt
“Franklin D. Roosevelt”
hasName
“Male”
gender
“President”
title
y:Hyde Park NY
“1810”
foundingYear
y:Marilyn Monroe“1962-08-05”diedOnDate
“1926-07-01”
bornOnDate
“Female”
gender
“Marilyn Monroe”hasName
001
010
011
012
013 014
002
015016
003017
004
018
019
005
020
021
022 023 006
024
007
025 026
027
008
028
009029
030 031
032
diedIn
bornIn
hasCapital
bornInlocatedIn locatedIn
bornIn
36UPenn/2012-12-04
Measuredimension
Group-bydimensions
Querypattern
Existing Solutions
1. Using Existing SPARQL Query Engines
Aggregate queryQ
Query withoutaggregation (Q ′)
Result withoutaggregation (R(Q ′))
Query resultR(Q)
Ignore aggregationfunction
Employ existingSPARQL engine
Online computeaggregation
Problem: missed opportunities for query optimization
2. Using a Relational DBMS
RDF datasetMap to table
(property tables)Build relational
aggregate index
Problems: (1) Usual problems with relationalimplementations; (2 )RDF data is high dimensional (DBpediahas more than 1000 properties) ⇒ prohibits using relationaloptimization techniques
37UPenn/2012-12-04
Existing Solutions
1. Using Existing SPARQL Query Engines
Aggregate queryQ
Query withoutaggregation (Q ′)
Result withoutaggregation (R(Q ′))
Query resultR(Q)
Ignore aggregationfunction
Employ existingSPARQL engine
Online computeaggregation
Problem: missed opportunities for query optimization
2. Using a Relational DBMS
RDF datasetMap to table
(property tables)Build relational
aggregate index
Problems: (1) Usual problems with relationalimplementations; (2 )RDF data is high dimensional (DBpediahas more than 1000 properties) ⇒ prohibits using relationaloptimization techniques
37UPenn/2012-12-04
Existing Solutions
1. Using Existing SPARQL Query Engines
Aggregate queryQ
Query withoutaggregation (Q ′)
Result withoutaggregation (R(Q ′))
Query resultR(Q)
Ignore aggregationfunction
Employ existingSPARQL engine
Online computeaggregation
Problem: missed opportunities for query optimization
2. Using a Relational DBMS
RDF datasetMap to table
(property tables)Build relational
aggregate index
Problems: (1) Usual problems with relationalimplementations; (2 )RDF data is high dimensional (DBpediahas more than 1000 properties) ⇒ prohibits using relationaloptimization techniques
37UPenn/2012-12-04
Existing Solutions
1. Using Existing SPARQL Query Engines
Aggregate queryQ
Query withoutaggregation (Q ′)
Result withoutaggregation (R(Q ′))
Query resultR(Q)
Ignore aggregationfunction
Employ existingSPARQL engine
Online computeaggregation
Problem: missed opportunities for query optimization
2. Using a Relational DBMS
RDF datasetMap to table
(property tables)Build relational
aggregate index
Problems: (1) Usual problems with relationalimplementations; (2 )RDF data is high dimensional (DBpediahas more than 1000 properties) ⇒ prohibits using relationaloptimization techniques
37UPenn/2012-12-04
RDF Data are not Well Structured
In relational systems aggregates can be computed from materializedviews.
Key A B C SUM Key A B SUM
View V1 Aggregate table T
T can be constructed by scanning V1
38UPenn/2012-12-04
RDF Data are not Well Structured
This can’t always be done in RDF data
(Q1) Group all individuals by theirtitles, gender and birthday,and report the number ofindividuals in each group.
SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .
?m <t i t l e > ? t .?m <gender> ?g .? c <bornOnDate> ? y .}
GROUP BY ? t ?g , ? y .
(Q2) Group all individuals by theirtitles and gender, and reportthe number of individuals ineach group.
SELECT ? t ?g ? y COUNT(?m)WHERE {?m <born In> ? c .
?m <t i t l e > ? t .?m <gender> ?g .}
GROUP BY ? t ?g .
gender title bornOnDate COUNT
Male President 1865-04-15 1
Female Actress 1976-03-22 1
gender title COUNT
Male President 2
Female Actress 1
R(Q1) R(Q2)
R(Q2) cannot be constructed by scanning R(Q1)
38UPenn/2012-12-04
Overview of Approach
Work on RDF graphs and query graphs ⇒ graph pattern matching.
1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}
2. Evaluate each starquery to get resultsR(S1), . . . , R(Sn)
T-index: triestructureEvaluate withoutjoins
3. Compute result of Qas R(Q) =oni (Ri )
Use VS*-tree tospeed up joinprocessing
v1
gender title
?mv2
foundingYearbornIn
Q
v1
gender title
?mv2
foundingYear
S1 S2
gender title
Male President
Female Actress
foundingYear
1790
1718
1810
1776
R(S1) R(S2)
gender title foundingYear COUNT(?m)Male President 1718 1
Female Actress 1810 1
on
39UPenn/2012-12-04
Overview of Approach
Work on RDF graphs and query graphs ⇒ graph pattern matching.
1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}
2. Evaluate each starquery to get resultsR(S1), . . . , R(Sn)
T-index: triestructureEvaluate withoutjoins
3. Compute result of Qas R(Q) =oni (Ri )
Use VS*-tree tospeed up joinprocessing
v1
gender title
?mv2
foundingYearbornIn
Q
v1
gender title
?mv2
foundingYear
S1 S2
gender title
Male President
Female Actress
foundingYear
1790
1718
1810
1776
R(S1) R(S2)
gender title foundingYear COUNT(?m)Male President 1718 1
Female Actress 1810 1
on
39UPenn/2012-12-04
Overview of Approach
Work on RDF graphs and query graphs ⇒ graph pattern matching.
1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}
2. Evaluate each starquery to get resultsR(S1), . . . , R(Sn)
T-index: triestructureEvaluate withoutjoins
3. Compute result of Qas R(Q) =oni (Ri )
Use VS*-tree tospeed up joinprocessing
v1
gender title
?mv2
foundingYearbornIn
Q
v1
gender title
?mv2
foundingYear
S1 S2
gender title
Male President
Female Actress
foundingYear
1790
1718
1810
1776
R(S1) R(S2)
gender title foundingYear COUNT(?m)Male President 1718 1
Female Actress 1810 1
on
39UPenn/2012-12-04
Selective Materialization of Views: T-index
Selectively materialize views through a trie structure called T-index.
VertexID
Entity vertex Adjacent Attributes
001 Abraham Lincoln hasName, gender,bornOnDate,title,diedOnDate
002 Washington DC hasName,foundingYear
003 Hodgenville KY hasName004 United States hasName,
foundingYear005 Reese Wither-
spoonhasName, gender,bornOnDate, title
006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
root
hasName
gender
bonOnDate
title
diedOnDate
diedOnDate
title
foundingYear
foundingYear{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7foundingYear:4
gender:4
bornOnDate:3title:3
diedOnDate:2
DimensionList DL
40UPenn/2012-12-04
Transaction Database D
T-index Construction
VertexID
Entity vertex Adjacent Attributes
001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate
002 Washington DChasName,foundingYear
003 Hodgenville KYhasName
004 United StateshasName,foundingYear
005 Reese Wither-spoon hasName, gender,
bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
root
hasName
gender
bonOnDate
title
diedOnDate
N8
foundingYear
{001,002,003}{001,002,003,004}
{002,004}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N9
foundingYear
N6
N7
diedOnDate
title
{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7
foundingYear:4
gender:4
bornOnDate:3
title:3
diedOnDate:2
DimensionList DL
41UPenn/2012-12-04
Transaction Database D
T-index Construction
VertexID
Entity vertex Adjacent Attributes
001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate
002 Washington DChasName,foundingYear
003 Hodgenville KYhasName
004 United StateshasName,foundingYear
005 Reese Wither-spoon hasName, gender,
bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
root
hasName
gender
bonOnDate
title
diedOnDate
{001}
{001}
{001}
{001}
{001}
N8
foundingYear
{001,002,003}{001,002,003,004}
{002,004}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N9
foundingYear
N6
N7
diedOnDate
title
{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7
foundingYear:4
gender:4
bornOnDate:3
title:3
diedOnDate:2
DimensionList DL
41UPenn/2012-12-04
Transaction Database D
T-index Construction
VertexID
Entity vertex Adjacent Attributes
001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate
002 Washington DChasName,foundingYear
003 Hodgenville KYhasName
004 United StateshasName,foundingYear
005 Reese Wither-spoon hasName, gender,
bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
root
hasName
gender
bonOnDate
title
diedOnDate
{001}
{001}
{001}
{001}
N8
foundingYear
{001,002}
{002}
{001,002,003}{001,002,003,004}
{002,004}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N9
foundingYear
N6
N7
diedOnDate
title
{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7
foundingYear:4
gender:4
bornOnDate:3
title:3
diedOnDate:2
DimensionList DL
41UPenn/2012-12-04
Transaction Database D
T-index Construction
VertexID
Entity vertex Adjacent Attributes
001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate
002 Washington DChasName,foundingYear
003 Hodgenville KYhasName
004 United StateshasName,foundingYear
005 Reese Wither-spoon hasName, gender,
bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
root
hasName
gender
bonOnDate
title
diedOnDate
{001}
{001}
{001}
{001}
N8
foundingYear
{002}
{001,002,003}
{001,002,003,004}
{002,004}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N9
foundingYear
N6
N7
diedOnDate
title
{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7
foundingYear:4
gender:4
bornOnDate:3
title:3
diedOnDate:2
DimensionList DL
41UPenn/2012-12-04
Transaction Database D
T-index Construction
VertexID
Entity vertex Adjacent Attributes
001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate
002 Washington DChasName,foundingYear
003 Hodgenville KYhasName
004 United StateshasName,foundingYear
005 Reese Wither-spoon hasName, gender,
bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
root
hasName
gender
bonOnDate
title
diedOnDate
{001}
{001}
{001}
{001}
N8
foundingYear
{001,002,003}
{001,002,003,004}
{002,004}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N9
foundingYear
N6
N7
diedOnDate
title
{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7
foundingYear:4
gender:4
bornOnDate:3
title:3
diedOnDate:2
DimensionList DL
41UPenn/2012-12-04
Transaction Database D
T-index Construction
VertexID
Entity vertex Adjacent Attributes
001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate
002 Washington DChasName,foundingYear
003 Hodgenville KYhasName
004 United StateshasName,foundingYear
005 Reese Wither-spoon hasName, gender,
bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
root
hasName
gender
bonOnDate
title
diedOnDate{001}
N8
foundingYear
{001,002,003}{001,002,003,004}
{002,004}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N9
foundingYear
N6
N7
diedOnDate
title
{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7
foundingYear:4
gender:4
bornOnDate:3
title:3
diedOnDate:2
DimensionList DL
41UPenn/2012-12-04
Transaction Database D
T-index Construction
VertexID
Entity vertex Adjacent Attributes
001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate
002 Washington DChasName,foundingYear
003 Hodgenville KYhasName
004 United StateshasName,foundingYear
005 Reese Wither-spoon hasName, gender,
bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
root
hasName
gender
bonOnDate
title
diedOnDate{001}
N8
foundingYear
{001,002,003}{001,002,003,004}
{002,004}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N9
foundingYear
{006}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N6
N7
diedOnDate
title
{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7
foundingYear:4
gender:4
bornOnDate:3
title:3
diedOnDate:2
DimensionList DL
41UPenn/2012-12-04
Transaction Database D
T-index Construction
VertexID
Entity vertex Adjacent Attributes
001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate
002 Washington DChasName,foundingYear
003 Hodgenville KYhasName
004 United StateshasName,foundingYear
005 Reese Wither-spoon hasName, gender,
bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
root
hasName
gender
bonOnDate
title
diedOnDate{001}
N8
foundingYear
{001,002,003}{001,002,003,004}
{002,004}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N9
foundingYear
N6
N7
diedOnDate
title
{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7
foundingYear:4
gender:4
bornOnDate:3
title:3
diedOnDate:2
DimensionList DL
41UPenn/2012-12-04
Transaction Database D
T-index Construction
VertexID
Entity vertex Adjacent Attributes
001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate
002 Washington DChasName,foundingYear
003 Hodgenville KYhasName
004 United StateshasName,foundingYear
005 Reese Wither-spoon hasName, gender,
bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
root
hasName
gender
bonOnDate
title
diedOnDate{001}
N8
foundingYear
{001,002,003}{001,002,003,004}
{002,004}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N9
foundingYear
N6
N7
diedOnDate
title
{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7
foundingYear:4
gender:4
bornOnDate:3
title:3
diedOnDate:2
DimensionList DL
41UPenn/2012-12-04
Transaction Database D
T-index Construction
VertexID
Entity vertex Adjacent Attributes
001 Abraham LincolnhasName, gender,bornOnDate,title,diedOnDate
002 Washington DChasName,foundingYear
003 Hodgenville KYhasName
004 United StateshasName,foundingYear
005 Reese Wither-spoon hasName, gender,
bornOnDate, title006 New Orleans foundingYear007 Franklin Roo-
sevelthasName, gender,title
008 Hyde Park NY foundingYear009 Marilyn Monroe hasName,
gender, bornOnDate,diedOnDate
N0
N1
N2
N3
N4
N5
root
hasName
gender
bonOnDate
title
diedOnDate{001}
N8
foundingYear
{001,002,003}{001,002,003,004}
{002,004}
{001,002,003,004,005}
{002,004}
{001,005}
{001,005}
{001,005}
N9
foundingYear
N6
N7
diedOnDate
title
{001,002,003,004,005,007,009}
{001,005,007,009}
{001,005,009}
{001,005}
{001}
{009}
{007}
{002,004}
{006,008}
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
Reese With-erspoon
Female 1976-03-22 Actress {005}
M(N4)
hasName gender title L
Franklin D.Roosevelt
Male President {007}
M(N7)
hasName:7
foundingYear:4
gender:4
bornOnDate:3
title:3
diedOnDate:2
DimensionList DL
41UPenn/2012-12-04
Transaction Database D
Star Aggregate (SA) Query Processing
Given a SA QueryS(v , p1, . . . , pn):
1. Find all matches of(p1, . . . , pn) inT-index
2. For each match
2.1 Let Ni be thelowest node in thematch
2.2 Let M(Ni ) be Ni ’saggregate set
2.3 M ′(Ni ) =Π(P1,...,Pn)M(Ni )
3. M = ∪iM ′(Ni )
v1
gender title
?m
42UPenn/2012-12-04
Star Aggregate (SA) Query Processing
Given a SA QueryS(v , p1, . . . , pn):
1. Find all matches of(p1, . . . , pn) inT-index
2. For each match
2.1 Let Ni be thelowest node in thematch
2.2 Let M(Ni ) be Ni ’saggregate set
2.3 M ′(Ni ) =Π(P1,...,Pn)M(Ni )
3. M = ∪iM ′(Ni )
v1
gender title
?m
42UPenn/2012-12-04
{N1,N2,N3,N4} {N1,N2,N7}
Star Aggregate (SA) Query Processing
Given a SA QueryS(v , p1, . . . , pn):
1. Find all matches of(p1, . . . , pn) inT-index
2. For each match
2.1 Let Ni be thelowest node in thematch
2.2 Let M(Ni ) be Ni ’saggregate set
2.3 M ′(Ni ) =Π(P1,...,Pn)M(Ni )
3. M = ∪iM ′(Ni )
v1
gender title
?m
hasName gender bornOnDate title L
AbrahamLincoln
Male 1865-04-15 President {001}
ReeseWitherspoon
Female 1976-03-22 Actress {005}
gender title L
Male President {001}Female Actress {005}
gender title L
Male President {007}
hasName gender title L
Franklin D.Roosevelt
Male President {007}
42UPenn/2012-12-04
{N1,N2,N3,N4} {N1,N2,N7}M(N4)
M′(N4) M′(N7)
M(N7)
Star Aggregate (SA) Query Processing
Given a SA QueryS(v , p1, . . . , pn):
1. Find all matches of(p1, . . . , pn) inT-index
2. For each match
2.1 Let Ni be thelowest node in thematch
2.2 Let M(Ni ) be Ni ’saggregate set
2.3 M ′(Ni ) =Π(P1,...,Pn)M(Ni )
3. M = ∪iM ′(Ni )
v1
gender title
?m
gender title L
Male President {001}Female Actress {005}
gender title L
Male President {007}
gender title L COUNT
Male President {001,007} 2
Female Actress {005} 1
42UPenn/2012-12-04
M′(N4) M′(N7)
∪
General Aggregate (GA) Query Processing
1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}
2. Evaluate them to getR(S1), . . . , R(Sn)
3. Generate vertex lists(L1, . . . , Ln) for eachR(S1), . . . , R(Sn)
v1
gender title
?mv2
foundingYearbornIn
Q
v1
gender title
?mv2
foundingYear
S1 S2
gender title L
Male President {001,007}Female Actress {005}
foundingYear L
1790 {002}1718 {006}1810 {008}1776 {004}
43UPenn/2012-12-04
General Aggregate (GA) Query Processing
1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}
2. Evaluate them to getR(S1), . . . , R(Sn)
3. Generate vertex lists(L1, . . . , Ln) for eachR(S1), . . . , R(Sn)
v1
gender title
?mv2
foundingYearbornIn
Q
v1
gender title
?mv2
foundingYear
S1 S2
gender title L
Male President {001,007}Female Actress {005}
foundingYear L
1790 {002}1718 {006}1810 {008}1776 {004}
43UPenn/2012-12-04
R(S1) R(S2)
General Aggregate (GA) Query Processing
1. Decompose aggregatequery Q into starqueries {S1, . . . ,Sn}
2. Evaluate them to getR(S1), . . . , R(Sn)
3. Generate vertex lists(L1, . . . , Ln) for eachR(S1), . . . , R(Sn)
v1
gender title
?mv2
foundingYearbornIn
Q
v1
gender title
?mv2
foundingYear
S1 S2
gender title L
Male President {001,007}Female Actress {005}
foundingYear L
1790 {002}1718 {006}1810 {008}1776 {004}
L1 = {001, 005, 007}L2 = {002, 004, 006, 008}
43UPenn/2012-12-04
R(S1) R(S2)
GA Query Processing (cont’d)
4. Use VS*-tree to findall subgraph matchesof (L1, . . . , Ln) –result is joinaccording to linkattribute
5. Compute the finalaggregate result (wehave optimizationshere)
u1 u2bornIn
u1 u2
005 006
007 008
Matchesgender title foundingYear u1 u2 COUNT
Male President 1718 005 006 1
Female Actress 1810 007 008 1
44UPenn/2012-12-04
L1 = {001, 005, 007} L2 = {002, 004, 006, 008}
GA Query Processing (cont’d)
4. Use VS*-tree to findall subgraph matchesof (L1, . . . , Ln) –result is joinaccording to linkattribute
5. Compute the finalaggregate result (wehave optimizationshere)
u1 u2bornIn
u1 u2
005 006
007 008
Matchesgender title foundingYear u1 u2 COUNT
Male President 1718 005 006 1
Female Actress 1810 007 008 1
44UPenn/2012-12-04
L1 = {001, 005, 007} L2 = {002, 004, 006, 008}
GA Query Processing (cont’d)
4. Use VS*-tree to findall subgraph matchesof (L1, . . . , Ln) –result is joinaccording to linkattribute
5. Compute the finalaggregate result (wehave optimizationshere)
u1 u2bornIn
u1 u2
005 006
007 008
Matchesgender title foundingYear u1 u2 COUNT
Male President 1718 005 006 1
Female Actress 1810 007 008 1
44UPenn/2012-12-04
L1 = {001, 005, 007} L2 = {002, 004, 006, 008}
Performance Comparison
SA1 SA2 SA3 GA1 GA2 GA30
2,000
4,000
6,000
8,000
Queries
Qu
ery
Res
pon
seT
ime
(in
ms)
Yago network (20 million triples & size 3.1GB)
gStore CAA Virtuoso45UPenn/2012-12-04
Outline
1 RDF Introduction
2 Answering Regular SPARQL QueriesAnswering SPARQL queries using graph pattern matching[ZouMoZhaoChenOzsu, PVLDB 2011]
3 AggregationAnswering Aggregate SPARQL Queries Over Large RDFRepositories
4 Future Research
47UPenn/2012-12-04
Systematic Study of SPARQL queryprocessing/optimization (Gunes Aluc)
Representation as graphs
Systematic study of dynamic RDF data management
Continuously and seamlessly self-tune their configurations
Systematic study of distributed RDF data management
Utilization of view materialization to improve performanceexploiting graph structure
End
48UPenn/2012-12-04
Efficient SPARQL Query Execution over Dynamic RDFDatabases
Problem: Existing RDF management systems are designedfor static datasets and workloads, whereas the RDF data onthe Web require dynamic solutions.
The underlying physical database layout in existing RDFmanagement systems is fixed a-priori.Hence, existing systems portray suboptimal performance whendata are updated or the characteristics of the workloadschange.
Objective: To design techniques (e.g., data structures,indexes, algorithms, etc.) that enable RDF managementsystems to continuously and seamlessly self-tune theirconfigurations to maintain an optimal performance.
49UPenn/2012-12-04
Efficient SPARQL Query Execution over Distributed RDFDatabases
Problem: RDF is inherently distributed across the Web,where query execution can be much more expensive than in acentralized setting.
Objective: To reduce the cost of query execution in adistributed RDF database through view materialization. Morespecifically,
to design efficient view selection algorithms (w.r.t. space/time)that potentially exploit the graph-theoretic properties of RDF.
50UPenn/2012-12-04
Thank you!
51UPenn/2012-12-04
Encoding Technique
vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),
(DiedIn, y:Washington DC)
(hasName, “Abraham Lincoln”)
0010 0000 0000
1000 0010 0100 0000
“Abr”
“bra”
“rah”. . .
0000 0010 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000. . .
OR
1000 0010 0100 0000
(BornOnDate, “1908-02-12”)
0100 0000 0000 0100 0010 0100 1000
(DiedOnDate, “1965-04-15”)
0000 1000 0000 0000 0010 0100 0000
(DiedIn, y:Washington DC)
0000 0010 0000 1000 0010 0100 0001
0000 0010 0000 1100 0010 0100 1001
OR
52UPenn/2012-12-04
Encoding Technique
vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),
(DiedIn, y:Washington DC)
(hasName, “Abraham Lincoln”)
0010 0000 0000
1000 0010 0100 0000
“Abr”
“bra”
“rah”. . .
0000 0010 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000. . .
OR
1000 0010 0100 0000
(BornOnDate, “1908-02-12”)
0100 0000 0000 0100 0010 0100 1000
(DiedOnDate, “1965-04-15”)
0000 1000 0000 0000 0010 0100 0000
(DiedIn, y:Washington DC)
0000 0010 0000 1000 0010 0100 0001
0000 0010 0000 1100 0010 0100 1001
OR
52UPenn/2012-12-04
Encoding Technique
vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),
(DiedIn, y:Washington DC)
(hasName, “Abraham Lincoln”)
0010 0000 0000
1000 0010 0100 0000
“Abr”
“bra”
“rah”. . .
0000 0010 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000. . .
OR
1000 0010 0100 0000
(BornOnDate, “1908-02-12”)
0100 0000 0000 0100 0010 0100 1000
(DiedOnDate, “1965-04-15”)
0000 1000 0000 0000 0010 0100 0000
(DiedIn, y:Washington DC)
0000 0010 0000 1000 0010 0100 0001
0000 0010 0000 1100 0010 0100 1001
OR
52UPenn/2012-12-04
Encoding Technique
vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),
(DiedIn, y:Washington DC)
(hasName, “Abraham Lincoln”)
0010 0000 0000
1000 0010 0100 0000
“Abr”
“bra”
“rah”. . .
0000 0010 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000. . .
OR
1000 0010 0100 0000
(BornOnDate, “1908-02-12”)
0100 0000 0000 0100 0010 0100 1000
(DiedOnDate, “1965-04-15”)
0000 1000 0000 0000 0010 0100 0000
(DiedIn, y:Washington DC)
0000 0010 0000 1000 0010 0100 0001
0000 0010 0000 1100 0010 0100 1001
OR
52UPenn/2012-12-04
Encoding Technique
vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),
(DiedIn, y:Washington DC)
(hasName, “Abraham Lincoln”)
0010 0000 0000 1000 0010 0100 0000
“Abr”
“bra”
“rah”. . .
0000 0010 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000. . .
OR
1000 0010 0100 0000
(BornOnDate, “1908-02-12”)
0100 0000 0000 0100 0010 0100 1000
(DiedOnDate, “1965-04-15”)
0000 1000 0000 0000 0010 0100 0000
(DiedIn, y:Washington DC)
0000 0010 0000 1000 0010 0100 0001
0000 0010 0000 1100 0010 0100 1001
OR
52UPenn/2012-12-04
Encoding Technique
vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),
(DiedIn, y:Washington DC)
(hasName, “Abraham Lincoln”)
0010 0000 0000 1000 0010 0100 0000
“Abr”
“bra”
“rah”. . .
0000 0010 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000. . .
OR
1000 0010 0100 0000
(BornOnDate, “1908-02-12”)
0100 0000 0000 0100 0010 0100 1000
(DiedOnDate, “1965-04-15”)
0000 1000 0000 0000 0010 0100 0000
(DiedIn, y:Washington DC)
0000 0010 0000 1000 0010 0100 0001
0000 0010 0000 1100 0010 0100 1001
OR
52UPenn/2012-12-04
Encoding Technique
vLabel adjListy:Abraham Lincoln (hasName, “Abraham Lincoln”), (BornOnDate, “1809-02-12”), (DiedOnDate,“1865-04-15”),
(DiedIn, y:Washington DC)
(hasName, “Abraham Lincoln”)
0010 0000 0000 1000 0010 0100 0000
“Abr”
“bra”
“rah”. . .
0000 0010 0000 0000
1000 0000 0000 0000
0000 0000 0100 0000. . .
OR
1000 0010 0100 0000
(BornOnDate, “1908-02-12”)
0100 0000 0000 0100 0010 0100 1000
(DiedOnDate, “1965-04-15”)
0000 1000 0000 0000 0010 0100 0000
(DiedIn, y:Washington DC)
0000 0010 0000 1000 0010 0100 0001
0000 0010 0000 1100 0010 0100 1001
OR
52UPenn/2012-12-04
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does not change – s not in the RDF data
Insert new transaction into DLInsert the new transaction into one path of T-indexUpdate materialized sets along the path
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
salary
gender
title
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,002,003,004,005}
{001,002,003,004}
{004}{005}
{006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
gender
title
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does not change – s is in the RDF data; pi > p > pi+1
Find Nn with (p1, . . . , pi , . . . , pn) and Ni with (p1, . . . , pi )Remove s from path Ni+1 − Nn and update M(Nj), i + 1 ≤ j ≤ nInsert (p, pi+1, . . . , pn) at Ni ; update M(Nj), null ≤ j ≤ i
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
salary
gender
title
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,002,003,004,005}
{001,002,003,004}
{004}{005}
{006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
gender
title
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does not change – s is in the RDF data; pi > p > pi+1
Find Nn with (p1, . . . , pi , . . . , pn) and Ni with (p1, . . . , pi )Remove s from path Ni+1 − Nn and update M(Nj), i + 1 ≤ j ≤ nInsert (p, pi+1, . . . , pn) at Ni ; update M(Nj), null ≤ j ≤ i
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, gender, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
salary
gender
title
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,002,003,004,005}
{001,002,003,004}
{004}{005}
{006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
gender
title
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
Insert (Person6, gender, “Male”)
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does not change – s is in the RDF data; pi > p > pi+1
Find Nn with (p1, . . . , pi , . . . , pn) and Ni with (p1, . . . , pi )Remove s from path Ni+1 − Nn and update M(Nj), i + 1 ≤ j ≤ nInsert (p, pi+1, . . . , pn) at Ni ; update M(Nj), null ≤ j ≤ i
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, gender, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
salary
gender
title
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,002,003,004,005}
{001,002,003,004}
{004}{005}
{006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
gender
title
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
Insert (Person6, gender, “Male”)
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does not change – s is in the RDF data; pi > p > pi+1
Find Nn with (p1, . . . , pi , . . . , pn) and Ni with (p1, . . . , pi )Remove s from path Ni+1 − Nn and update M(Nj), i + 1 ≤ j ≤ nInsert (p, pi+1, . . . , pn) at Ni ; update M(Nj), null ≤ j ≤ i
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, gender, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
salary
gender
title
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,002,003,004,005,006}
{001,002,003,004,006}
{004,006}{005}
{006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}$50,000 Male Associate Professor {006}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
gender
title
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
Insert (Person6, gender, “Male”)
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does change
This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
salary
gender
title
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,002,003,004,005}
{001,002,003,004}
{004}{005}
{006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
gender
title
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does change
This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
salary
gender
title
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,002,003,004,005}
{001,002,003,004}
{004}{005}
{006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
gender
title
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
Delete (Person2, gender, “Male”)
swap gender (pi )and title (pi+1)
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does change
This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
salary
gender
title
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,003,004,005}
{001,003,004}
{004}{005}
{002,006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
gender
title
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
Delete (Person2, gender, “Male”)
Delete 002 from path null − N3
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does change
This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11
salary
gender
title
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,003,004,005}
{001,003,004}
{004}{005}
{002,006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
gender
title
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
Delete (Person2, gender, “Male”)
Insert (Person, salary, title) intoT-index
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does change
This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N ′1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11N ′′1
gender{005}
salary
title
gender
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,003,004}
{001,003,004}
{004}{005}
{002,006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
title
gender
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
Delete (Person2, gender, “Male”)
Swap “gender” and “title”
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does change
This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N ′1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11N ′′1
gender{005}
N6
nationality
salary
title
gender
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,003,004,006}
{001,003,004}
{004}{005}
{002,006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
title
gender
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
Delete (Person2, gender, “Male”)
Swap “gender” and “title”
T-index Maintenance
Updates modeled as triple deletion + insertion - Consider (s, p, o)
DL’s order does change
This causes swapping of pi and pi+1 in DLFirst update as if the order does not changeThen update DL, T-index and materialized sets
VertexID
Entity ver-tex
Adjacency Attributes
001 Person1 salary, gender, title
002 Person2 salary, gender, title
003 Person3 salary, gender, title
004 Person4 salary, gender,title, nationality
005 Person5 salary, gender,nationality
006 Person6 salary, title,nationality
007 Paper1 year, inProceedings,paperTitle
008 Paper2 year, inProceedings,paperTitle
009 Paper3 year, inProceedings,paperTitle
010 School1 locatedIn, name
011 School2 locatedIn, name
null
N0
N ′1
N2
N3
N4
N5
N6
N7
N8
N9
N10
N11N ′′1
gender{005}
N6
nationality
salary
title
gender
nationality
title
nationality
year
inProceedings
paperTitle
locatedIn
name
nationality
{001,002,003,004,005,006}
{001,003,004,006}
{001,003,004}
{004}{005}
{002,006}
{006}
{007,008,009}
{007,008,009}
{007,008,009}
{010,011}
{010,011}
salary gender title L
$100,000 Male Professor {002}$50,000 Male Assistant Professor {001}$40,000 Male Assistant Professor {003}$50,000 Female Assistant Professor {004}
salary gender title nationality L
$50,000 Female Assistant Professor Chinese {004}
M(N2)
M(N3)
salary
title
gender
nationality
year
inProceedings
paperTitle
locatedIn
name
53UPenn/2012-12-04
Transaction Database D
DimensionList DL
Delete (Person2, gender, “Male”)
Update materialized sets