geographica: a benchmark for geospatial rdf stores
Post on 10-May-2015
741 Views
Preview:
DESCRIPTION
TRANSCRIPT
Benchmarking Geospatial RDF Stores
George Garbis, Kostis Kyzirakos, Manolis Koubarakis
Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece
1st International Workshop on Benchmarking RDF Systems (BeRSys 2013)
25/26/2013
Outline
• Motivation
• SPARQL extensions for querying geospatial data expressed in RDF
• State-of-the-art geospatial RDF stores
• Related benchmarks
• The benchmark Geographica
• Evaluating the performance of geospatial RDF stores using Geographica
• Conclusions
4
Geospatial data on the Web
5
Open Government Data
6
Linked geospatial data – Ordnance Survey
7
Linked geospatial data – Open Street Map
8
Linked geospatial data – http://www.linkedopendata.gr
9
Linked geospatial data – Wildfire Monitoring
105/26/2013
Outline
• Motivation
• SPARQL extensions for querying geospatial data expressed in RDF
• State-of-the-art geospatial RDF stores
• Related benchmarks
• The benchmark Geographica
• Evaluating the performance of geospatial RDF stores using Geographica
• Conclusions
Our Contributions
• The data model stRDF and the query language stSPARQL
• The system Strabon
11
[ ISWC 2012 ]
The Data Model stRDF
• stRDF stands for spatiotemporal RDF.
• It is an extension of the W3C standard RDF for the representation of geospatial data that may change over time.
• stRDF extends RDF with:
• Spatial literals encoded in OGC standards Well-Known Text or GML
• New datatypes for spatial literals (strdf:WKT, strdf:GML and strdf:geometry)
• Valid time of triples (ignored in this talk) [ ESWC 2013 ]
stRDF: An example
stRDF: An example (WKT)
ex:BurntArea1 rdf:type noa:BurntArea.
ex:BurntArea1 noa:hasID "1"^^xsd:decimal.
ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.
ex:BurntArea1 strdf:hasGeometry "POLYGON(( 38.16 23.7, 38.18 23.7, 38.18 23.8, 38.16 23.8, 38.16 23.7)); <http://spatialreference.org/ref/epsg/4121/>"^^strdf:WKT .
Spatial Literal (OpenGIS Simple Features)
Spatial Data Type Well-Known Text
stRDF: An example (GML)
ex:BurntArea1 rdf:type noa:BurntArea.
ex:BurntArea1 noa:hasID "1"^^xsd:decimal.
ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.
ex:BurntArea1 strdf:hasGeometry
"<gml:Polygon
srsName='http://www.opengis.net/def/crs/EPSG/0/4121'>
<gml:outerBoundaryIs>
<gml:LinearRing>
<gml:coordinates>38.16,23.70 38.18,23.70 38.18,
23.80 38.16,23.80,38.16 23.70
</gml:coordinates>
</gml:LinearRing>
</gml:outerBoundaryIs>
</gml:Polygon>"^^strdf:GML .
Spatial Literal (GML Simple Features Profile)
Spatial Data Type GML
• Find all burnt forests close to a citySELECT ?BA ?BAGEO
WHERE {
?R rdf:type noa:Region .
?R strdf:hasGeometry ?RGEO .
?R noa:hasLandCover ?F .
?F rdfs:subClassOf clc:Forests .
?CITY rdf:type dbpedia:City .
?CITY strdf:hasGeometry ?CGEO .
?BA rdf:type noa:BurntArea .
?BA strdf:hasGeometry ?BAGEO .
FILTER(strdf:anyInteract(?RGEO,?BAGEO) &&
strdf:distance(?BAGEO,?CGEO) < 0.02) }
stSPARQL: An example
Spatial Functions(OGC Simple Feature Access)
stSPARQL: Geospatial SPARQL 1.1
• We start from SPARQL 1.1.
• We add a SPARQL extension function for each function defined in the OGC standard OpenGIS Simple Feature Access – Part 2: SQL option (ISO 19125) for adding geospatial data to relational DBMSs and SQL.
• We add appropriate geospatial extensions to SPARQL 1.1 Update language
stSPARQL (cont’d)
• Basic functions
• Get a property of a geometry (e.g., strdf:srid)
• Get the desired representation of a geometry (e.g., strdf:AsText)
• Test whether a certain condition holds (e.g., strdf:IsEmpty, strdf:IsSimple)
• Functions for testing topological spatial relationships (e.g., strdf:equals, strdf:intersects)
• OGC Simple Features Access, Egenhofer, RCC-8
• Spatial analysis functions
• Construct new geometric objects from existing geometric objects (e.g., strdf:buffer, strdf:intersection, strdf:convexHull)
• Spatial metric functions (e.g., strdf:distance, strdf:area)
• Spatial aggregate functions (e.g., strdf:union, strdf:extent)
stSPARQL (cont’d)
• SELECT clause
Construction of new geometries (e.g., strdf:buffer(?geo, 0.1))
Spatial aggregate functions (e.g., strdf:extent(?geo))
Metric functions (e.g., strdf:area(?geo))
• FILTER clause
Functions for testing topological spatial relationships between spatial terms (e.g., strdf:contains(?G1, strdf:union(?G2, ?G3)))
Numeric expressions involving spatial metric functions (e.g.,strdf:area(?G1)<=2*strdf:area(?G2)+1)
• HAVING clause
Spatial aggregate functions and spatial metric functions or functions testing for topological relationships between spatial terms (e.g., strdf:area(strdf:union(?geo))>1)
• Isolate the parts of the burnt areas that lie in coniferous forests.
SELECT ?burntArea (strdf:intersection(?baGeom, strdf:union(?fGeom)) AS ?burntForest) WHERE {
?burntArea rdf:type noa:BurntArea; noa:hasGeometry [ noa:hasSerialization ?baGeo ].
?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest;
clc:hasGeometry [ clc:hasSerialization ?fGeom ].
FILTER(strdf:intersects(?baGeom,?fGeom)) } GROUP BY ?burntArea ?baGeom
stSPARQL: An example
GeoSPARQL
Core
Topology VocabularyExtension
- relation family
Geometry Extension - serialization - version
Geometry TopologyExtension
- serialization - version - relation family
Query RewriteExtension
- serialization - version - relation family
RDFS Entailment Extension
- serialization - version - relation family
Parameters• Serialization
• WKT• GML
• Relation Family• Simple
Features• RCC8• Egenhofer
System Language Index Geometries CRS support
Comments on Functionality
Strabon stSPARQL/ GeoSPARQL*
R-tree-over-GiST
WKT / GML support
Yes • OGC-SFA• Egenhofer• RCC-8
Parliament GeoSPARQL R-Tree WKT / GML support
Yes •OGC-SFA•Egenhofer•RCC-8
Brodt et al. (RDF-3X)
SPARQL R-Tree WKT support No OGC-SFA
Perry SPARQL-ST R-Tree GeoRSS GML Yes RCC8
AllegroGraph Extended SPARQL
Distribution sweeping technique
2D point geometries
Partial •Buffer•Bounding Box•Distance
OWLIM Extended SPARQL
Custom 2D point geometries (W3C Basic Geo Vocabulary)
No •Point-in-polygon•Buffer•Distance
Virtuoso SPARQL R-Tree 2D point geometries (in WKT)
Yes SQL/MM (subset)
uSeekM GeoSPARQL R-tree-over GiST
WKT support No OGC-SFA
235/26/2013
Outline
• Motivation
• SPARQL extensions for querying geospatial data expressed in RDF
• State-of-the-art geospatial RDF stores
• Related benchmarks
• The benchmark Geographica
• Evaluating the performance of geospatial RDF stores using Geographica
• Conclusions
245/26/2013
Related Work
• Benchmarks for SPARQL query processing:
• LUBM (JWS 2005)
• DBpedia SPARQL Benchmark (ISWC 2011)
• …
• Benchmarks for geospatial relational DBMS:
• Sequoia 2000 (SIGMOD 1993)
• VESPA (BNCOD 2000)
• Jackpine (ICDE 2011)
• Benchmarks for spatial indexing and query processing operations
• Benchmarks for geospatial RDF stores:
• A Benchmark for Spatial Semantic Web Systems (Kolas, SSWS 2008)
255/26/2013
The Benchmark Geographica
• Aim: measure the performance of today’s geospatial RDF stores
• Organized around two workloads:
• Real-world workload:
• Based on existing linked geospatial datasets and known application scenarios
• Synthetic workload:
• Measure performance in a controlled environment where we can play around with properties of the data and the queries.
• Γεωγραφικά: 17-volume geographical
encyclopedia by Στράβων (AD 17)
265/26/2013
Real-World Workload
• Datasets: Real-world datasets for the geographic area of Greece playing an important role in the LOD cloud or having complex geometries
• LinkedGeoData (LGD) for rivers and roads in Greece
• GeoNames for Greece
• DBpedia for Greece
• Greek Administrative Geography (GAG)
• CORINE land cover (CLC) for Greece
• Hotspots
275/26/2013
Real-World WorkloadDatasets
Dataset Size # of Triples
# of Points
# of Lines(max/min/avg points/line)
# of Polygons(max/min/avg
points/polygon)
GAG 33MB 4K - - 325 (15K/4/400)
CLC 401MB 630K - - 45K (5K/4/140)
LGD 29MB 150K - 12K (1.6K/2/21)
-
GeoNames 45MB 400K 22K - -
DBpedia 89MB 430K 8K - -
Hotspots 90MB 450K - - 37K (4/4/4)
285/26/2013
Real-World WorkloadParts
• For this workload, Geographica has two parts (following Jackpine):
• Micro part: Tests primitive spatial functions offered by geospatial RDF stores
• Macro part: Simulates some typical application scenarios
295/26/2013
Real-World WorkloadMicro part
• 29 queries that consist of one or two triple patterns and a spatial function.
• Functions included:
• Spatial analysis: boundary, envelope, convex hull, buffer, area
• Topological: equals, intersects, overlaps, crosses, within, distance, disjoint
• As used in spatial selections and spatial joins
• Spatial aggregates: extent, union
• Functions are applied to many representative types of geometries .
305/26/2013
Example – spatial analysis
• Construct the boundary of all polygons of CLC
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ( geof:boundary(?o1) as ?ret )
WHERE {
GRAPH <http://geographica.di.uoa.gr/dataset/clc>
{ ?s1 <http://geo.linkedopendata.gr/corine/ontology#asWKT> ?o1} }
315/26/2013
Example – spatial selection
• Find all points in Geonames that are contained in a given polygon.
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?s1 ?o1
WHERE {
GRAPH <http://geographica.di.uoa.gr/dataset/geonames>
{ ?s1 <http://www.geonames.org/ontology#asWKT> ?o1 }
FILTER( geof:sfWithin(?o1, "GIVEN_POLYGON_IN_WKT"^^<http://www.opengis.net/ont/geosparql#wktLiteral>)). }
325/26/2013
Example – spatial join
• Find all pairs of GAG polygons that overlap
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?s1 ?s2
WHERE {
GRAPH <http://geographica.di.uoa.gr/dataset/gag>
{?s1 <http://geo.linkedopendata.gr/gag/ontology/asWKT> ?o1}
GRAPH <http://geographica.di.uoa.gr/dataset/clc>
{?s2 <http://geo.linkedopendata.gr/corine/ontology#asWKT> ?o2}
FILTER( geof:sfOverlaps(?o1, ?o2) )
}
Query Point Query Line Query Polygon
Point Within BufferDistance
WithinDisjoint
Line EqualsCrosses
IntersectsDisjoint
Polygon Intersects EqualsOverlaps
Real-World WorkloadMicro part
• Spatial Selections
• Spatial JoinsPoint Line Polygon
Point Equals Intersects IntersectsWithin
Line IntersectsWithinCrosses
Polygon WithinTouchesOverlaps
345/26/2013
Real-World WorkloadMacro part
• Reverse Geocoding: Attribute a street address and place to a given point.
• Queries:
• Find the closest populated place (from GeoNames)
• Find the closest street (from LGD)
355/26/2013
Real-World WorkloadMacro part
• Web Map Search and Browsing
• Queries:
• Find the co-ordinates of a given POI based on thematic criteria (from GeoNames)
• Find roads in a given bounding box around these co-ordinates (from LGD)
• Find other POI in a given bounding box around these co-ordinates (from LGD)
365/26/2013
Real-World WorkloadMacro part
• Rapid Mapping for Fire Monitoring: representative of typical rapid mapping tasks carried out by space agencies in the case of an emergency
375/26/2013
Real-World WorkloadMacro part
• Rapid Mapping for Fire Monitoring
• Queries:
• Simple tasks retrieving background mapping information:
• Find the land cover of areas inside a given bounding box (from CLC)
• Find primary roads inside a given bounding box (from LGD)
• Find capitals of prefectures inside a given bounding box (from GAG)
• (Often) more complex main mapping tasks:
• Find municipality boundaries inside a bounding box (from GAG)
• Find coniferous forests which are on fire (from CLC and Hotspots)
• Find road segments which may be damaged by fire (from LGD and Hotspots)
385/26/2013
Experimental EvaluationReal-world workload
• Geospatial RDF stores tested: Strabon, Parliament, uSeekM
• Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz, 24GB RAM, 4 HDD with RAID-5
• Micro part:
• Metric: response time
• Run 3 times and compute the median
• Time out: 1 hour
• Run both on warm caches and cold caches
• Macro part:
• Run each scenario many times for one hour with warm caches
• Metric: Average time for a complete execution
395/26/2013
ResultsData Storage
• Strabon is the slowest given the PostGIS tables and indices it is building.
• uSeekM does much better using the Sesame native store
• Parliament slows down due to geo:asWKT subproperty inferencing
System Strabon uSeekM Parliament
Storage time 550 sec 214 sec 250 sec
405/26/2013
ResultsMicro part (cold caches)
415/26/2013
ResultsMacro part
Scenario Strabon uSeekM Parliament
Reverse Geocoding 65 sec 0.77 sec 2.6 sec
Map Search and Browsing
0.9 sec 0.6 sec 22.2 sec
Rapid Mapping for Fire Monitoring
207.4 sec - -
425/26/2013
Synthetic Workload
• Goal: Evaluate performance in a controlled environment where we can vary the thematic and spatial selectivity of queries
• Thematic selectivity: the fraction of the total geographic features of a dataset that satisfy the non-spatial part of a query
• Spatial selectivity: the fraction of the total geographic features of a dataset which satisfy the topological relation in the FILTER clause of a query
435/26/2013
Synthetic Workload
• Dataset: As in VESPA, the produced datasets are geographic features on a synthetic map:
• States in a country ( (n/3)2 )
• Land ownership (n2)
• Roads (n)
• POI (n2)
445/26/2013
Synthetic WorkloadOntology
• Based roughly on the ontology of OpenStreetMap and the GeoSPARQL vocabulary
• Tagging each feature with a key enables us to select a known fraction of features in a uniform way
455/26/2013
Synthetic WorkloadQuery template for spatial selections
SELECT ?s
WHERE {?s ns:hasGeometry ?g.?s c:hasTag ?tag.?g ns:asWKT ?wkt.?tag ns:hasKey “THEMA”
FILTER(FUNCTION(?wkt, “GEOM”))}
• Parameters:
• ns: specifies the kind of feature (and geometry type) examined
• THEMA: defines the thematic selectivity of the query using another parameter k
• FUNCTION: specifies the topological function examined
• GEOM: specifies a rectangle that controls the spatial selectivity of the query
465/26/2013
Synthetic WorkloadQuery template for spatial joins
SELECT ?s1 ?s2
WHERE {?s1 ns1:hasGeometry ?g1.?s1 c:hasTag ?tag1.?g1 ns:asWKT ?wkt1.?tag1 ns:hasKey “THEMA”
?s2 ns2:hasGeometry ?g2.?s2 c:hasTag ?tag2.?g2 ns2:asWKT ?wkt2.?tag2 ns2:hasKey “THEMA’”
FILTER(FUNCTION(?wkt1, ?wkt2))}
475/26/2013
Experimental EvaluationSynthetic workload
• Geospatial RDF stores tested: Strabon, Parliament, uSeekM
• Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz, 24GB RAM, 4 HDD with RAID-5
• Details:
• Metric: response time
• Run 3 times and compute the median
• Time out: 1 hour
• Run both on warm caches and cold caches
485/26/2013
ResultsData Storage
• We generate the synthetic dataset with n=512 and k=9. This results in:
• 28900 states
• 262144 land ownerships
• 512 roads
• 262144 points of interest
• Size: 3,880,224 triples (745 MB)
System Strabon uSeekM Parliament
Storage time 221 sec 406 sec 462 sec
495/26/2013
ResultsSynthetic Workload (Spatial Selections)
IntersectsTag 1, cold caches
IntersectsTag 512, cold caches
505/26/2013
ResultsSynthetic Workload (Spatial Joins)
Intersects
515/26/2013
Outline
• Motivation
• SPARQL extensions for querying geospatial data expressed in RDF
• State-of-the-art geospatial RDF stores
• Related benchmarks
• The benchmark Geographica
• Evaluating the performance of geospatial RDF stores using Geographica
• Conclusions
525/26/2013
Conclusions
• We defined Geographica, a new comprehensive benchmark for geospatial RDF stores, and used it to compare 3 relevant systems (Strabon, Parliament, uSeekM).
• More implementation work is necessary in adding features to other geospatial RDF stores beyond the ones tested.
• More real-world scenarios can be added.
• Next target: spatiotemporal RDF stores
535/26/2013
Advertisement
Strabon: http://strabon.di.uoa.gr
Geographica: http://geographica.di.uoa.gr
Tutorials/Survey paper
More at ESWC
Paper: Storing and Querying the Valid Time of Triples in Linked Geospatial Data
Demo: Sextant, a web tool for browsing and mapping Linked Geospatial Data http://strabon.di.uoa.gr:8080/sextant/
Project networking: TELEIOS http://www.earthobservatory.eu
top related