chicago solr meetup - june 10th: this ain't your parents' search engine

18
10010 10010 10010 10010 Confidential and Proprietary © Copyright 2013 Confidential and Proprietary © Copyright 2013 This Ain’t Your Parents’ Search Engine Grant Ingersoll CTO, LucidWorks Twitter: @gsingers

Upload: lucidworks-archived

Post on 27-Jan-2015

104 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

This Ain’t Your Parents’ Search Engine

Grant IngersollCTO, LucidWorks

Twitter: @gsingers

Page 2: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Search is dead.

Page 3: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Long live search

Page 4: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013

Search is good for…

• Traditional: Fast, fuzzy text matching across a large document collection

• De-normalized data- “light” relational

• Top N problems- Key-value (n=1)- Recommendations- “Good enough” classification, clustering

• Faceting, aggregations, analytical slicing and dicing of data

• Spatial, record/event linkage, alerting

http://cheezburger.com/5243950080

Page 5: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013

Foundational Changes in Lucene/Solr 4

•Reduced Memory usage•Pluggable Codecs/similarity•FS(A|T)•Doc Values (column oriented)•Spatial upgrade•New facets and functions•Cursors (deep paging)•Distributed capabilities•Joins/Grouping

Page 6: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013

Search + Hadoop

•What’s Old is New Again

•“Traditional” Use Cases:-Build/Store indexes-https://cwiki.apache.org/confluence/display/

solr/Running+Solr+on+HDFS

•Enrichment and Signal processing-PageRank, Statistically Interesting Phrases, etc.

Page 7: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013

LucidWorks + Hadoop

• Ingestion Help- Flexible Map-Reduce content ingestion supporting:»Directory of files»CSV, Writable, etc.»LogStash»Build Your Own

•Pig Load/Store and UDFs•Hive 2-way support•http://www.lucidworks.com/search-for-hadoop/-Open source this summer

Page 8: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

LucidWorks SiLK

LucidWorks Search

JDBC Connector

Web/File System Crawl

Data Warehouse

Hadoop Connectors

Clickstream Networking

Data Sources

Connectors

Servers

Page 9: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Solr/Solr Cloud

Search Analytics—Data Ingestion & Visualization

Gateway(Reverse Proxy)

Solr Output Writer for

LogStash (Http)

Search Logs

Visualization Configurable Dashboards

Hadoop ConnectorGrokIngestMapperLogStash

Page 10: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

LucidWorks Open Source

• Logstash for Solr: https://github.com/LucidWorks/solrlogmanager• Banana (Kibana for Solr): https://github.com/LucidWorks/banana

• Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk

• Data Quality Toolkit: https://github.com/LucidWorks/data-quality

Page 11: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Demos

Page 12: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 201312

Fly the friendly skies

http://www.ibm.com/developerworks/library/j-solr-lucene/index.html

Page 13: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Make $$$

• Leverage time series data and visualization using LucidWorks SiLK

• Monitor Social• Traditional Research

https://github.com/lucidworks/lws-financial-demo

Page 14: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Cure what ails you

Page 15: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 201315

Space-Time Continuum

• Leverage Solr’s spatial capabilities to index non-spatial data, such as time ranges- Useful for Open Hours, Shifts,

etc.

•Query using rectangle intersections- q = shift:"Intersects(0 19 23

365)”

https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/

Page 16: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Signal Processing for Search and Discovery

• Signals power modern relevance– Clicks, conversions, sharing, history, signatures

• LucidWorks 5 makes it easy to capture and leverage signals– Recommendations, analytics, discovery

• Simplifies your data workflow• Simplify your operational footprint

Page 17: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Solr Powered Signal Processing

• Use Case: eCommerce

• Data: – Product catalog (~1.2m items)– Click data (~3.9M clicks)

Page 18: Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

10010

10010

10010 10010

Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013

Meta

• http://www.lucidworks.com– [email protected]– @gsingers

• Sales– Steve Drane (based here in Chicago)– [email protected]

• Lucene/Solr Revolution – Washington DC, Nov 11-14– http://www.lucenerevolution.org