introduction to apache solr
DESCRIPTION
Introduction to Solr, presented at Bangkok meetup in April 2014: http://www.meetup.com/bkk-web/events/172090992/ Covers high-level use-cases for Solr. Demos include support for Thai language (with GitHub link for source). Has slides showcasing Solr-ecosystem as well as couple of ideas for possible Solr-specific learning projects.TRANSCRIPT
Introduction to Apache SolrSoftware is eating the world"
The search is eating the software
April 2014
Web search engines !are quite sophisticated
3
4
But the real search needs !are!
much DEEPER and BROADER
5
Searching code
6
Searching people and companies
7
Searching products
8
Searching library material
9
Searching languages
10
Understanding full-text search
SELECT * FROM database WHERE field LIKE ‘%word%’"
This DOES NOT Scale"
Instead: "
break text into tokens"
domain-specific processing (e.g. lower-casing)"
build fast-access structures"
algorithms for term, phrases, proximity search
11
Basic search engine features
Search (Duh!): keyword, phrase, field-specific"
Positive and negative terms"
Sort: relevancy, recency"
Pagination"
Compact summary in results"
SPEED
12
Advanced search engine features
Facets/Taxonomy - based navigation with live counts"
Language-specific processing"
Domain-specific text processing (WiFi = Wi-Fi = WIFI)"
Geographic search"
More-like-this, did-you-mean, autocomplete"
Scaling/Clustering"
NOT web crawling - different, but related
13
Search engine solutions?
Solr"
Elastic Search"
Xapian"
Sphinx"
Zoie"
Groonga"
Searchdaimon"
{F}lexSearch"
Algolia (SaaS)"
Searchify (SaaS)"
ForageJS"
Lunr.js"
FACT-Finder"
DtSearch"
MarkLogic"
Verity"
Fast"
Most databases"
!
!
…AND MORE
14
Used with permission from SemaText
Open Source Search Evolution
15
Secret Ingredient - Lucene
Solr"
Elastic Search"
Zoie"
SwiftType"
PyLucene (Python wrapper)"
Lucene.net (C# port)
Scalable, high-performance indexing"
Incremental indexing"
Full-text search"
Information-Retrieval algorithms"
Implemented in Java"
Written in 1999, still going strong
16
Secret Ingredient - SolrCertified distributions"
LucidWorks"
HelioSearch"
Big Data platforms"
Cloudera"
Hortonworks HDP"
Hosted and SaaS"
Amazon CloudSearch"
WebSolr, SolrHQ, SearchBox
Lucene full-text-search"
XML and REST config"
Schema/Schemaless"
SolrCloud (clustering)"
Caching"
Near real-time"
Rich-document indexing (Tika inside)"
Plugins, components, processors
17
Solr Ecosystem sample
Drupal"
Project Blacklight"
LuxDB"
SolrMeter"
CrafterCMS"
Typo3"
Magenta"
HippoCMS"
ColdFusion"
SolrNet"
DataStax"
Dovecot"
NGData Lily"
Basho Riak"
YaCy"
Apache ManifoldCF"
Apache Camel"
Franz Allegrograph"
BitNami Solr Stack"
Carrot2!
Broadleaf Commerce"
Cloudera CDK!
CodeLibs Fess (フェス)!
Splunk"
Alfresco"
Rosette by BasisTech!
Luwak by Flax!
Quepid by OSC!
TwigKit!
SPM by SemaText!
SILK by LucidWorks!
Banana (O/S Solr
Kibana)
18
DEMO Time
19
DEMO - Basic
Unzip"
Go to example directory"
Run Solr"
Import some documents from example docs"
grep -l store *.xml | xargs ./post.sh"
Show off Solr 4 admin panel
20
DEMO - Browse handler
Restart Solr with -Dsolr.clustering.enabled=true"
Visit http://localhost:8983/solr/browse/ "
Show off"Search"
Facets - Categories and Ranges"
Spatial/Geo-distance"
Clusters
21
DEMO - Thai specific
Index Thai and English text"
Search in English, Thai, Auto-transliterated Thai"
Show Analysis screen"
Code at: https://github.com/arafalov/solr-thai-test
22
Getting into Solr
23
Start for free
Download, unzip, cd example; java -jar start.jar"
Go through basic tutorial in docs/tutorial.html"
Copy example directory, modify schema.xml until happy"
If coming from ElasticSearch, look at example-schemaless"
Do NOT follow this path to production"
example schema is a kitchen sink !!!
24
Accelerate your learning
Buy my book - seriously. That’s what it’s for"
All code/data is at: https://github.com/arafalov/solr-indexing-book "
Buy Solr In Action - just published and is a great reference"
Use my www.solr-start.com resource and join the mailing list"
Join solr-user mailing list - full of advanced hackers"
Watch Lucid Revolution videos for background"
Start helping out on Stack Overflow #solr"
Blog what you learned, twit with #Solr
25
Pick a project - make it happen
Solr + Dart => Better search experience for Dart packages"
Solr consultants discovery website"
Visualise Solr search request - step by step"
Solr + your language => is client library up to date?"
ToDoMVC for Solr clients"
Package LARGE dataset for others (e.g. Project Gutenberg)"
Rebuild lernu.net Esperanto dictionary with Solr backend
26
With Solr, how far can I go?
Cloudera (BigData) has > 1,000,000,000 $USD investments - opportunities?"
8M+ searches/day, 40 languages, 100ms NRT, 1024 cores, 256 shards, 32 servers on #solr at Bloomberg http://bit.ly/1jmG72G (via @FlaxSearch)
27
Other Search-related books
Designing the Search Experience: The Information Architecture of Discovery - by a TwigKit creator +1"
Search Analytics for Your Site: Conversations with Your Customers by Louis Rosenfeld - see also Quepid"
Enterprise Search by Martin White
28