this ain't your parents' search engine
TRANSCRIPT
![Page 1: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/1.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
This Ain’t Your Parents’ Search Engine
Grant Ingersoll
CTO, LucidWorks
Twitter: @gsingers
![Page 2: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/2.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Search is dead.
![Page 3: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/3.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Long live search
![Page 4: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/4.jpg)
Confidential and Proprietary © Copyright 2013
Search is good for…
• Traditional: Fast, fuzzy text matching across a large document collection
• De-normalized data- “light” relational
• Top N problems- Key-value (n=1)- Recommendations- “Good enough” classification,
clustering
• Faceting, aggregations, analytical slicing and dicing of data
• Spatial, record/event linkage, alertinghttp://cheezburger.com/5243950080
![Page 5: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/5.jpg)
Confidential and Proprietary © Copyright 2013
Foundational Changes in Lucene/Solr 4
•Reduced Memory usage
•Pluggable Codecs/similarity
•FS(A|T)
•Doc Values (column oriented)
•Spatial upgrade
•New facets and functions
•Cursors (deep paging)
•Distributed capabilities
•Joins/Grouping
![Page 6: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/6.jpg)
Confidential and Proprietary © Copyright 2013
Search + Hadoop
•What’s Old is New Again
•“Traditional” Use Cases:- Build/Store indexes
- https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
•Enrichment and Signal processing- PageRank, Statistically Interesting Phrases, etc.
![Page 7: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/7.jpg)
Confidential and Proprietary © Copyright 2013
LucidWorks + Hadoop
• Ingestion Help- Flexible Map-Reduce content ingestion supporting:
»Directory of files
»CSV, Writable, etc.
»LogStash
»Build Your Own
•Pig Load/Store and UDFs
•Hive 2-way support
•http://www.lucidworks.com/search-for-hadoop/- Open source this summer
![Page 8: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/8.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks SiLK
LucidWorks Search
JDBC Connector
Web/File System Crawl
Data Warehouse
Hadoop Connectors
Clickstream Networking
Data Sources
Connectors
Servers
![Page 9: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/9.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr/Solr Cloud
Search Analytics—Data Ingestion & Visualization
Gateway(Reverse Proxy)
Solr Output Writer for
LogStash (Http)
Search Logs
Visualization Configurable Dashboards
Hadoop ConnectorGrokIngestMapperLogStash
![Page 10: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/10.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
LucidWorks Open Source
• Logstash for Solr: https://github.com/LucidWorks/solrlogmanager
• Banana (Kibana for Solr): https://github.com/LucidWorks/banana
• Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk
• Data Quality Toolkit: https://github.com/LucidWorks/data-quality
![Page 11: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/11.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Demos
![Page 12: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/12.jpg)
Confidential and Proprietary © Copyright 201312
Fly the friendly skies
http://www.ibm.com/developerworks/library/j-solr-lucene/index.html
![Page 13: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/13.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Make $$$
• Leverage time series data and visualization using LucidWorks SiLK
• Monitor Social
• Traditional Research
https://github.com/lucidworks/lws-financial-demo
![Page 14: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/14.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Cure what ails you
![Page 15: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/15.jpg)
Confidential and Proprietary © Copyright 201315
Space-Time Continuum
• Leverage Solr’s spatial capabilities to index non-spatial data, such as time ranges- Useful for Open Hours,
Shifts, etc.
•Query using rectangle intersections- q = shift:"Intersects(0 19
23 365)”
https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
![Page 16: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/16.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Signal Processing for Search and Discovery
• Signals power modern relevance– Clicks, conversions, sharing, history, signatures
• LucidWorks 5 makes it easy to capture and leverage signals– Recommendations, analytics, discovery
• Simplifies your data workflow
• Simplify your operational footprint
![Page 17: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/17.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Solr Powered Signal Processing
• Use Case: eCommerce
• Data:
– Product catalog (~1.2m items)
– Click data (~3.9M clicks)
![Page 18: This Ain't Your Parents' Search Engine](https://reader030.vdocuments.net/reader030/viewer/2022032420/55a4e1251a28abb20e8b47a7/html5/thumbnails/18.jpg)
Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013
Meta
• http://www.lucidworks.com– [email protected]– @gsingers
• Sales– Steve Drane (based here in Chicago)– [email protected]
• Lucene/Solr Revolution – Washington DC, Nov 11-14– http://www.lucenerevolution.org