solr 1.5 and beyond yonik seeley may 11, 2010
DESCRIPTION
NYC Lucene / Solr Meetup. Solr 1.5 and Beyond Yonik Seeley May 11, 2010. Agenda. Lucene / Solr merge Relevancy (Extended Dismax Parser) Scalability ( Solr Cloud) Spatial/Geo Search Near Real Time Field Collapsing Q&A. Lucene-Solr Merge. Lucene / Solr voted to merge (March 2010) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/1.jpg)
Solr 1.5 and BeyondYonik SeeleyMay 11, 2010
NYC Lucene/Solr Meetup
![Page 2: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/2.jpg)
Lucid Imagination, Inc.
Agenda
Lucene/Solr merge
Relevancy (Extended Dismax Parser)
Scalability (Solr Cloud)
Spatial/Geo Search
Near Real Time
Field Collapsing
Q&A
![Page 3: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/3.jpg)
Lucid Imagination, Inc.
Lucene-Solr Merge
Lucene/Solr voted to merge (March 2010)Were already separate sub-projects of the Lucene TLP
High committer overlap
Solr had stopped using Lucene trunk/development versions
Much code duplication
What it meansSingle set of committers
Single developer mailing list ([email protected])
Single subversion trunk
Keep separate downloads, user mailing lists
![Page 4: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/4.jpg)
Lucid Imagination, Inc.
Lucene/Solr Development Changes
Nutch, Tika, Mahout spun off to their own TLPStill may be considered part of “Lucene Ecosystem”
Lucene/Solr development changestrunk is now always next major release (currently 4.0)
branch_3x will be base for all 3.x releases
No back compat guarantees between major releases
![Page 5: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/5.jpg)
Relevance
![Page 6: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/6.jpg)
Lucid Imagination, Inc.
Extended Dismax Parser
Superset of dismax&defType=edismax&q=foo&qf=body
Fixes edge cases where dismax could still throw exceptionsOR AND NOT - “
Full lucene syntax supportTries lucene syntax first
Smart escaping is done if syntax errors
Optionally supports treating “and”/”or” as AND/OR in lucene syntax
Fielded queries (e.g. myfield:foo) even in degraded modeuf parameter controls what field names may be directly specified in “q”
![Page 7: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/7.jpg)
Lucid Imagination, Inc.
Extended Dismax Parser (continued)
boost parameter for multiplicative boost-by-function
Pure negative query clausesExample: solr OR (-solr)
Enhanced term proximity boostingpf2=myfield – results in term bigrams in sloppy phrase queries
myfield:“aa bb cc” -> myfield:“aa bb” myfield:“bb cc”
Enhanced stopword handlingstopwords omitted in main query, but added in optional proximity boosting part
Example: q=solr is awesome & qf=myfield & pf2=myfield -> +myfield:(solr awesome) (myfield:”solr is” myfield:”is awesome”)
Currently controlled by the absence of StopWordFilter in index analyzer, and presence in query analyzer
![Page 8: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/8.jpg)
Scalability
![Page 9: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/9.jpg)
Lucid Imagination, Inc.
SolrCloud
First steps toward simplifying cluster management
Integrates ZookeeperCentral configuration (schema.xml, solrconfig.xml, etc)
Tracks live nodes + shards of collections
Removes need for external load balancersshards=localhost:8983/solr|localhost:8900/solr, localhost:7574/solr|localhost:7500/solr
Can specify logical shard idsshards=NY_shard,NJ_shard
Clients don’t need to know shards:http://localhost:8983/solr/collection1/select?
distrib=true
![Page 10: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/10.jpg)
Lucid Imagination, Inc.
SolrCloud : The Future
Eliminate all single points of failure
Remove Master/Searcher distinctionEnables near real-time search in a highly scalable environment
High Availability for WritesEventual consistency model (like Amazon Dynamo, Cassandra)
ElasticSimply add/subtract servers, cluster will rebalance automatically
By default, Solr will handle document partitioning
![Page 11: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/11.jpg)
Spatial Search
![Page 12: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/12.jpg)
Lucid Imagination, Inc.
Spatial Search
PointTypeGeneric improvement: polyField – single value -> multiple indexed fields
Compound values: 38.89,-77.03
Range queries and exact matches supported• q=location:21.33,51.37• q=location:[10,20 TO 30,40]
Distance FunctionsGeneric improvement: function queries can yield multiple values
Haversine: hsin(3963.205, store, vector(10,20))Many possibilities, including boost by distance
![Page 13: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/13.jpg)
Lucid Imagination, Inc.
Spatial Search (continued)
Sorting by function querysort=hsin(3963.205,store,vector(10,20)) asc
Distance Filtering (SOLR-1568)fq={!sfilt fl=store_tiles}&pt=45.17614,-93.87341&d=1Implementations: trie range queries, spatial tiles, geohash
Return sort values or function query values for each doc FunctionQuery results as pseudo-fields (SOLR-1298)
fl=field1,field2,{!func key=dist}hsin(…) ???
![Page 14: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/14.jpg)
Near Real Time
![Page 15: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/15.jpg)
Lucid Imagination, Inc.
Near Real-Time Search
Shorter times until updates are searchable/visible
Lucene 2.9 first laid the groundwork w/ per-segment searchingPer-segment FieldCache entries for sorting and FunctionQueries
NRT IndexWriter.getReader()• Make new segments available before merging is done in background
• Doesn’t cause commit/fsync first
Solr still needsPer-segment faceting
Per-segment caching
Per-segment statistics (and anything else that uses FieldCache)
![Page 16: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/16.jpg)
Lucid Imagination, Inc.
Existing single-valued faceting algorithm
53514521
(null)batman
flashspidermansupermanwolverine
order: for each doc, an index into the lookup array
lookup: the string values
Lucene FieldCache Entry (StringIndex) for the “hero” field
027
010002
Documents matching the base query “Juggernaut”
accumulator
increment
lookup
q=Juggernaut&facet=true&facet.field=hero
![Page 17: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/17.jpg)
Lucid Imagination, Inc.
Per-segment single-valued faceting algorithm
Segment1FieldCache
Entry
Segment2FieldCache
Entry
Segment3FieldCache
Entry
Segment4FieldCache
Entry
027
035012
0210
1304
010
Priority queue
Batman, 3flash, 5
Base DocSet
lookupinc
accumulator1 accumulator2 accumulator3 accumulator4
FieldCache + accumulator merger(Priority queue)
thread1
thread2 thread3thread4
![Page 18: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/18.jpg)
Lucid Imagination, Inc.
Per-segment faceting
Enable with facet.method=fcs
Controllable multi-threadingfacet.field={!threads=4}myfield
DisadvantagesLarger memory use (FieldCaches + accumulators)
Slower (extra FieldCache merge step needed)
AdvantagesRebuilds FieldCache entries only for new segments (NRT friendly)
Multi-threaded
![Page 19: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/19.jpg)
Lucid Imagination, Inc.
Per-segment faceting performance comparison
Time for request* facet.method=fc facet.method=fcs
static index 3 ms 244 ms
quickly changing index 1388 ms 267 ms
Base DocSet=100 docs, facet.field on a field with 100,000 unique terms
Test index: 10M documents, 18 segments, single valued field
Time for request* facet.method=fc facet.method=fcs
static index 26 ms 34 ms
quickly changing index 741 ms 94 ms
Base DocSet=1,000,000 docs, facet.field on a field with 100 unique terms
*complete request time, measured externally
A
B
![Page 20: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/20.jpg)
Field Collapsing
![Page 21: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/21.jpg)
Lucid Imagination, Inc.
Field Collapsing
Field collapsingLimit the number of results per category
“category” defined by unique values in a field
UsesWeb Search – collapse by web site
Email threads – collapse by thread id
Ecommerce/retail• Show the top 5 items for each store category (music, movies, etc)
![Page 22: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/22.jpg)
Lucid Imagination, Inc.
Field Collapsing by Site
![Page 23: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/23.jpg)
Lucid Imagination, Inc.
Field Collapse on Product Type
![Page 24: Solr 1.5 and Beyond Yonik Seeley May 11, 2010](https://reader035.vdocuments.net/reader035/viewer/2022062520/5681660a550346895dd9430c/html5/thumbnails/24.jpg)
Q&A