chicago solr meetup - june 10th: this ain't your parents' search engine
DESCRIPTION
TRANSCRIPT
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
This Ain’t Your Parents’ Search Engine
Grant IngersollCTO, LucidWorks
Twitter: @gsingers
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Search is dead.
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Long live search
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
Search is good for…
• Traditional: Fast, fuzzy text matching across a large document collection
• De-normalized data- “light” relational
• Top N problems- Key-value (n=1)- Recommendations- “Good enough” classification, clustering
• Faceting, aggregations, analytical slicing and dicing of data
• Spatial, record/event linkage, alerting
http://cheezburger.com/5243950080
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
Foundational Changes in Lucene/Solr 4
•Reduced Memory usage•Pluggable Codecs/similarity•FS(A|T)•Doc Values (column oriented)•Spatial upgrade•New facets and functions•Cursors (deep paging)•Distributed capabilities•Joins/Grouping
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
Search + Hadoop
•What’s Old is New Again
•“Traditional” Use Cases:-Build/Store indexes-https://cwiki.apache.org/confluence/display/
solr/Running+Solr+on+HDFS
•Enrichment and Signal processing-PageRank, Statistically Interesting Phrases, etc.
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013
LucidWorks + Hadoop
• Ingestion Help- Flexible Map-Reduce content ingestion supporting:»Directory of files»CSV, Writable, etc.»LogStash»Build Your Own
•Pig Load/Store and UDFs•Hive 2-way support•http://www.lucidworks.com/search-for-hadoop/-Open source this summer
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks SiLK
LucidWorks Search
JDBC Connector
Web/File System Crawl
Data Warehouse
Hadoop Connectors
Clickstream Networking
Data Sources
Connectors
Servers
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr/Solr Cloud
Search Analytics—Data Ingestion & Visualization
Gateway(Reverse Proxy)
Solr Output Writer for
LogStash (Http)
Search Logs
Visualization Configurable Dashboards
Hadoop ConnectorGrokIngestMapperLogStash
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks Open Source
• Logstash for Solr: https://github.com/LucidWorks/solrlogmanager• Banana (Kibana for Solr): https://github.com/LucidWorks/banana
• Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk
• Data Quality Toolkit: https://github.com/LucidWorks/data-quality
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Demos
10010
10010
10010 10010
Confidential and Proprietary © Copyright 201312
Fly the friendly skies
http://www.ibm.com/developerworks/library/j-solr-lucene/index.html
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Make $$$
• Leverage time series data and visualization using LucidWorks SiLK
• Monitor Social• Traditional Research
https://github.com/lucidworks/lws-financial-demo
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Cure what ails you
10010
10010
10010 10010
Confidential and Proprietary © Copyright 201315
Space-Time Continuum
• Leverage Solr’s spatial capabilities to index non-spatial data, such as time ranges- Useful for Open Hours, Shifts,
etc.
•Query using rectangle intersections- q = shift:"Intersects(0 19 23
365)”
https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Signal Processing for Search and Discovery
• Signals power modern relevance– Clicks, conversions, sharing, history, signatures
• LucidWorks 5 makes it easy to capture and leverage signals– Recommendations, analytics, discovery
• Simplifies your data workflow• Simplify your operational footprint
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr Powered Signal Processing
• Use Case: eCommerce
• Data: – Product catalog (~1.2m items)– Click data (~3.9M clicks)
10010
10010
10010 10010
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Meta
• http://www.lucidworks.com– [email protected]– @gsingers
• Sales– Steve Drane (based here in Chicago)– [email protected]
• Lucene/Solr Revolution – Washington DC, Nov 11-14– http://www.lucenerevolution.org