![Page 1: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/1.jpg)
O C T O B E R 1 1 -‐ 1 4 , 2 0 1 6 • B O S T O N , M A
![Page 2: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/2.jpg)
H-‐Hypermap: Heatmap Analy?cs at Scale David Smiley
Freelance Search Developer/Consultant
![Page 3: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/3.jpg)
About: David Smiley • So2ware Engineer (16 years) • Search (7 years) • Java (full-‐stack), Web, SpaGal
• Freelance search consultant / developer • Apache Lucene / Solr commiKer & PMC • Wrote first book on Solr, updated twice
![Page 4: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/4.jpg)
Agenda • About this project • Architecture • Solr & Gme sharding • Experiences with: – Kotlin, Dropwizard, Swagger
– KaUa – Docker, Kontena
• Solr for geo-‐enrichment • Solr adapter for Lucene BKD Lat-‐Lon point search & sort
• Heatmaps – ExisGng funcGonality
• demo – New funcGonality
![Page 5: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/5.jpg)
H-‐Hypermap / BOP • Harvard University, CGA: Center for GeospaGal Analysis hKp://gis.harvard.edu
• Harvard Hypermap Project – Managed by Ben Lewis
• BOP “Billion Object Pla^orm” – Funded by the Sloan FoundaGon
![Page 6: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/6.jpg)
BOP Requirements Summary
• Most recent ~billion geo-‐tweets • RealGme search (<5 sec latency) • Sub-‐second queries – Including heatmaps!
• On the cheap: ~6 mediocre boxes
Provide a proof-‐of-‐concept pla^orm designed to lower the barrier for researchers who need to access big streaming spaGo-‐temporal datasets.
![Page 7: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/7.jpg)
Logical High-‐Level Architecture
Archival
RealGme
HarvesGng Enrichment
various clients...
various clients...
Data flows via Apache KaLa Systems expose HTTP web services
“BOP”
![Page 8: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/8.jpg)
Shard: W51
The BOP KaUa Topic Ingester
ZooKeeper
Shard: W52 Shard: W53 Shard: W54 Shard: RT
...
Web-‐Service
KaUa Streams • Create Solr doc • Routes to shard
REST/JSON API • Keyword search • FaceGng • Heatmaps • CSV export
...
![Page 9: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/9.jpg)
BOP Solr Sharding Architecture RealGme
T2016_05_20 T2016_05_06 T2016_04_22 T2016_04_08
… 4-‐5 mo.
T2016_05_20 T2016_05_06 T2016_04_22 T2016_04_08
… 4-‐5 mo.
G_North_America G_Elsewhere
Lone RealGme CollecGon/Shard. 1-‐25 hrs Copy then delete, at night
• RealGme shard is where realGme search happens. No caches, but small.
• Primary collecGons have useful caches • Housekeeping Tasks:
• Move data from RT to primary • Create new shards; expire old • Merge/opGmize shards
![Page 10: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/10.jpg)
Building a Search Web-‐Service • Kotlin language (JVM based) – Nullity as first-‐class language feature
• DropWizard framework – Designed for web-‐services
• Swagger – Dynamically generated dev UI for web-‐services
![Page 11: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/11.jpg)
Apache KaUa • KaUa: a scalable message/queue pla^orm • See new KaUa Streams & KaUa Connect APIs • No back-‐pressure; can be a challenge • Non-‐obvious use: – For storage; Gme parGGoning
• Lots of benefits yet serious limitaGons
![Page 12: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/12.jpg)
Docker • Easy to find/try/use so2ware – No installaGon – Simplified configuraGon (env variables)
– Common logging – Isolated
• Ideal for: – ConGnuous Int. servers – Trying new so2ware – ProducGon advantages
• But “new”
![Page 13: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/13.jpg)
Docker in ProducGon • I use “Kontena” • Common logging, machine/proc stats, security – VPN to secure network; access everything as local
• No longer need to care about: – Ansible, Chef, Puppet, etc. – Security at network or proxy; not service specific
• Challenges: state & big-‐data
![Page 14: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/14.jpg)
Enrichment
Geo: Query Solr via spaGal point query; aKach related metadata to tweet
KaUa Topic Enrich KaUa
Topic
TwiKer SenGment Classifier
Geo: Solr with regional polygons & metadata
![Page 15: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/15.jpg)
Solr for Geo Enrichment • Tweets (docs) can have a geo lat/lon • Enrich tweet with Country, State/Province, … – GazeKeer lookup (point-‐in-‐polygon)
Data Set Features Raw size Index ?me Index size
Admin2 46,311 824 MB 510 min 892 MB
US States 74,002 747 MB 4.9 min 840 MB
MassachuseKs Census Blocks 154,621 152 MB 5.9 min 507 MB
![Page 16: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/16.jpg)
Fast Point-‐in-‐Polygon Tricks Index/Config • OpGmize to 1 segment • RptWithGeometry
SpaGalField – precisionModel=
"floating_single" – autoIndex="true"
• <cache name= "perSegSpatial FieldCache_WKT" …
Search • Embed Solr (in-‐process) • Use docValues, not stored
– fl=block:field(GEOID10) Query like this: • q={!field cache=false
f=WKT}Intersects(POINT( $lon $lat))
Sub-‐Millisecond!
![Page 17: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/17.jpg)
Lucene “LatLonPoint” • Uses new PointValues (BKD index) in Lucene 6 • Fastest: hKp://home.apache.org/~mikemccand/geobench.html
• Presently in Lucene sandbox module • Some limitaGons: WGS84 points only • Credit to Rob Muir and Mike McCandless
![Page 18: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/18.jpg)
Solr Adapter For LatLonPoint • New Solr FieldType for Lucene LatLonPoint – Filter points by circle, rect, polygon – Distance sort; but no boos(ng
Coming soon! Solr 6.4?
![Page 19: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/19.jpg)
Heatmaps: SpaGal Grid FaceGng • SpaGal density summary grid faceGng,
also useful for point-‐plovng search results • Lucene & Solr APIs • Scalable & fast usually…
• Usually rendered with a gradient radius -‐> • See: hKp://spacemansteve.github.io/
leaflet-‐solr-‐heatmap/example/index.html
![Page 20: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/20.jpg)
How-‐to: Heatmaps • On an RPT field geo="false" worldBounds= "ENVELOPE( -180, 180, 180, -180)" prefixTree="packedQuad"
• Query: /select?facet=true &facet.heatmap=geo_rpt &facet.heatmap.geom= ["-180 -90" TO "180 90”] &facet.heatmap.format= ints2D or png
// Normal Solr response... "facet_counts":{ ... // facet response fields "facet_heatmaps":{ "geo_rpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D”, [null, null, [0, 1, ... ]] ...
![Page 21: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/21.jpg)
New HeatmapSpaGalField • Why? – With new BKD/PointValues, no “RPT” field to use – Scalable for heatmaps; don’t worry about search
• Scalable at all resoluGons; many millions of docs/shard
– Can be specific about grid resoluGons
Coming soon! Solr 6.4?
![Page 22: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/22.jpg)
Heatmaps with Stats • Instead of counGng docs; calculate a metric – Ex: avg(minuteOfDay)
• Will require JSON Facet API • Inherently slower than just doc counts
Coming soon! Solr 6.4?
![Page 23: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/23.jpg)
![Page 24: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/24.jpg)
![Page 25: H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smiley LLC](https://reader030.vdocuments.net/reader030/viewer/2022021502/58f2bbca1a28ab8b5e8b457f/html5/thumbnails/25.jpg)
Final Remarks • Open-‐Source – hKps://github.com/dsmiley/hhypermap-‐bop
• In-‐progress • Improvements to Solr expected to be available before December; officially in Solr 6.4.