geographica: a benchmark for geospatial rdf stores - iswc 2013
TRANSCRIPT
![Page 1: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/1.jpg)
Geographica: A Benchmark for Geospatial RDF Stores
George Garbis, Kostis Kyzirakos, Manolis Koubarakis
Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece
12th International Semantic Web Conference (Evaluation Track)
![Page 2: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/2.jpg)
Outline
• Motivation• The benchmark Geographica
• Real-world workload• Synthetic workload
• Evaluating the performance of geospatial RDF stores using Geographica
• Conclusions
23/10/2013 2
![Page 3: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/3.jpg)
Motivation
Lots of geospatial data is available on the Web today. Lots of geospatial data is quickly being transformed into
linked geospatial data! People have started building applications using such data. Geospatial extensions of SPARQL (e.g., GeoSPARQL and
stSPARQL) have been recently developed.
RDF stores provide support for GeoSPARQL (e.g., Strabon,
Oracle 12c, uSeekM, Parliament) or provide limited
geospatial functionality (e.g., Virtuoso, BigOwlim,
AllegroGraph)
23/10/2013 3
![Page 4: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/4.jpg)
The Benchmark Geographica
• Aim: measure the performance of today’s geospatial RDF stores• Organized around two workloads:
• Real-world workload:• Based on existing linked geospatial datasets and known
application scenarios• Synthetic workload:
• Measure performance in a controlled environment where we can play around with selectivity of queries.
• Γεωγραφικά: 17-volume geographical encyclopedia by Στράβων (AD 17)
23/10/2013 4
![Page 5: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/5.jpg)
Outline
• Motivation• The benchmark Geographica
• Real-world workload• Synthetic workload
• Evaluating the performance of geospatial RDF stores using Geographica
• Conclusions
23/10/2013 5
![Page 6: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/6.jpg)
Real-World WorkloadDatasets
• Datasets: Real-world datasets for the geographic area of Greece playing an important role in the LOD cloud or having complex geometries• LinkedGeoData (LGD) for rivers and main roads in Greece
• GeoNames for Greece• DBpedia for Greece• Greek Administrative Geography (GAG)• CORINE land cover (CLC) for Greece• Hotspots
23/10/2013 6
![Page 7: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/7.jpg)
Real-World WorkloadDatasets
23/10/2013 7
Dataset Size # of Triples
# of Points
# of Lines(max/min/avg
points/line)
# of Polygons(max/min/avg
points/polygon)
GeoNames 45MB 400K 22K - -
Dbpedia 89MB 430K 8K - -
LGD 29MB 150K - 12K (1.6K/2/21) -
GAG 33MB 4K - - 325 (15K/4/400)
CLC 401MB 630K - - 45K (5K/4/140)
Hotspots 90MB 450K - - 37K (4/4/4)
![Page 8: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/8.jpg)
Real-World WorkloadParts
• For this workload, Geographica has two parts (following Jackpine):• Micro part: Tests primitive spatial functions offered by geospatial RDF stores
• Macro part: Simulates some typical application scenarios
23/10/2013 8
![Page 9: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/9.jpg)
Real-World WorkloadMicro part
• 29 SPARQL queries that consist of one or two triple patterns and a spatial function.
• Functions included:• Non-topological: boundary, envelope, convex hull, buffer, area
• Topological: equals, intersects, overlaps, crosses, within, distance, disjoint
• Spatial aggregates: extent, union
• These functions are used for spatial selections and spatial joins
23/10/2013 9
![Page 10: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/10.jpg)
Example – non-topologicalMicro part
• Construct the boundary of all polygons of CLC
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>PREFIX dataset: <http://geographica.di.uoa.gr/dataset/>PREFIX clc: <http://geo.linkedopendata.gr/corine/ontology#>
SELECT ( geof:boundary(?o1) as ?ret ) WHERE {
GRAPH dataset:clc { ?s1 clc:asWKT ?o1. } }23/10/2013 10
![Page 11: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/11.jpg)
Example – spatial selectionMicro part
• Find all points in GeoNames that are within a given polygon.
PREFIX dataset: <http://geographica.di.uoa.gr/dataset/> PREFIX geonames: <http://www.geonames.org/ontology#>
SELECT ?s1 ?o1 WHERE {
GRAPH dataset:geonames { ?s1 geonames:asWKT ?o1 }
FILTER( geof:sfWithin(?o1, "POLYGON((…))"^^geo:wktLiteral)).}23/10/2013 11
![Page 12: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/12.jpg)
Example – spatial joinMicro part
• Find all pairs of GAG polygons that overlap
PREFIX dataset: <http://geographica.di.uoa.gr/dataset/> PREFIX gag: <http://geo.linkedopendata.gr/gag/ontology/>PREFIX clc: <http://geo.linkedopendata.gr/corine/ontology#>SELECT ?s1 ?s2 WHERE {
GRAPH dataset:gag {?s1 gag:asWKT ?o1}GRAPH dataset:clc {?s2 clc:asWKT ?o2}FILTER( geof:sfOverlaps(?o1, ?o2) )
}23/10/2013 12
![Page 13: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/13.jpg)
Real-World WorkloadMicro part
Query Point Query Line Query Polygon
Points Within BufferDistance
WithinDisjoint
Lines EqualsCrosses
IntersectsDisjoint
Polygons Intersects EqualsOverlaps
23/10/2013 13
• Spatial Selections
Points Lines Polygons
Points Equals Intersects IntersectsWithin
Lines IntersectsWithinCrosses
Polygons WithinTouchesOverlaps
• Spatial Joins
![Page 14: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/14.jpg)
Real-World WorkloadMacro part: Scenarios
• Reverse Geocoding
23/10/2013 14
![Page 15: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/15.jpg)
Real-World WorkloadMacro part: Scenarios
• Reverse Geocoding• Web Map Search and Browsing
23/10/2013 15
![Page 16: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/16.jpg)
Real-World WorkloadMacro part: Scenarios
• Reverse Geocoding• Web Map Search and Browsing• Rapid Mapping for Fire Monitoring
23/10/2013 16
![Page 17: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/17.jpg)
Outline
• Motivation• The benchmark Geographica
• Real-world workload• Synthetic workload
• Evaluating the performance of geospatial RDF stores using Geographica
• Conclusions
23/10/2013 17
![Page 18: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/18.jpg)
Synthetic Workload
• Goal: Evaluate performance in a controlled environment where we can vary the thematic and spatial selectivity of queries• Thematic selectivity: the fraction of the total
geographic features of a dataset that satisfy the non-spatial part of a query
• Spatial selectivity: the fraction of the total geographic features of a dataset which satisfy the topological relation in the FILTER clause of a query
23/10/2013 18
![Page 19: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/19.jpg)
Synthetic WorkloadGenerator
• Dataset: As in VESPA, the produced datasets are geographic features on a synthetic map:• States in a country ((n/3)2)• Land ownership (n2)• Roads (n)• POI (n2)
23/10/2013 19
![Page 20: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/20.jpg)
Synthetic WorkloadOntology
• Based roughly on the ontology of OpenStreetMap and the GeoSPARQL vocabulary
• Tagging each feature with a key enables us to select a known fraction of features in a uniform way
23/10/2013 20
![Page 21: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/21.jpg)
Synthetic WorkloadQuery template for spatial selections
SELECT ?sWHERE {?s ns:hasGeometry ?g.?s c:hasTag ?tag.?g ns:asWKT ?wkt.?tag ns:hasKey “THEMA”FILTER(FUNCTION(?wkt, “GEOM”^^geo:wktLiteral))}
• Parameters:• ns: specifies the kind of feature (and geometry type) examined• FUNCTION: specifies the topological function examined• THEMA: defines the thematic selectivity of the query using
another parameter k• GEOM: specifies a rectangle that controls the spatial selectivity
of the query
23/10/2013 21
![Page 22: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/22.jpg)
Synthetic WorkloadQuery template for spatial joins
SELECT ?s1 ?s2WHERE {?s1 ns1:hasGeometry ?g1.?s1 ns1:hasTag ?tag1.?g1 ns1:asWKT ?wkt1.?tag1 ns1:hasKey “THEMA” .
?s2 ns2:hasGeometry ?g2.?s2 ns2:hasTag ?tag2.?g2 ns2:asWKT ?wkt2.?tag2 ns2:hasKey “THEMA’” .
FILTER(FUNCTION(?wkt1, ?wkt2))}
23/10/2013 22
![Page 23: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/23.jpg)
Outline
• Motivation• The benchmark Geographica
• Real-world workload• Synthetic workload
• Evaluating the performance of geospatial RDF stores using Geographica
• Conclusions
23/10/2013 23
![Page 24: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/24.jpg)
Experimental Setup
• Geospatial RDF stores tested: Strabon, Parliament, uSeekM• Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz, 24GB RAM, 4
HDD with RAID-5
• Micro part (real-world workload) & synthetic workload: • Metric: response time• Run 3 times and compute the median• Time out: 1 hour• Run both on warm caches and cold caches
• Macro part (real-world workload) :• Run many instantiations of each scenario for one hour without
cleaning caches• Metric: Average time for a complete execution
23/10/2013 24
![Page 25: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/25.jpg)
ResultsReal Workload - micro part (cold caches)
23/10/2013 25
![Page 26: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/26.jpg)
ResultsMacro part
23/10/2013 26
Scenario Strabon uSeekM Parliament
Reverse Geocoding 65 sec 0.77 sec 2.6 sec
Map Search and Browsing
0.9 sec 0.6 sec 22.2 sec
Rapid Mapping for Fire Monitoring
207.4 sec - -
![Page 27: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/27.jpg)
ResultsSynthetic Workload
• We generate the synthetic dataset with n=512. This results in:• 28,900 states• 262,144 land ownerships• 512 roads• 262,144 points of interest
• Size: 3,880,224 triples (745 MB)
23/10/2013 27
![Page 28: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/28.jpg)
ResultsSynthetic Workload – spatial selections
23/10/2013 28
IntersectsTag 1, cold caches
IntersectsTag 512, cold caches
![Page 29: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/29.jpg)
ResultsSynthetic Workload - Spatial Joins
23/10/2013 29
Touches
![Page 30: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/30.jpg)
Conclusions
• We defined Geographica, a new comprehensive benchmark for geospatial RDF stores, and used it to compare 3 relevant systems• Strabon• Parliament• uSeekM
• Two workloads: real-world and synthetic
23/10/2013 30
![Page 31: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/31.jpg)
Future Work
• Capture the full GeoSPARQL standard.• Study scaling issues with larger datasets.• Add more application scenarios• Extent the generator to produce datasets that do not follow a uniform distribution.
• Extend the benchmark to include • time-evolving geospatial data.
23/10/2013 31
![Page 32: Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013](https://reader033.vdocuments.net/reader033/viewer/2022052619/5559c5eed8b42aaa6f8b536b/html5/thumbnails/32.jpg)
Thanks!
Geographica: http://geographica.di.uoa.gr This work was supported in part by the European Commission
project TELEIOS http://www.earthobservatory.eu
23/10/2013 32
Any Questions?