keynote yonik seeley & steve rowe lucene solr roadmap
TRANSCRIPT
![Page 1: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/1.jpg)
Lucene Roadmap
Steve Rowe
LucidWorks
![Page 2: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/2.jpg)
• 1997: Doug Cutting creates Lucene • 2000-2001: SourceForge hosts Lucene • 2001-present: Lucene @ Apache Software Foundation • 2006: Flexible indexing planning starts • 2007: Solr graduates from the Apache Incubator to join the Lucene PMC as a sub-project • 2008: Flexible indexing implementation begins • 2010: Lucene and Solr development merge • 2011: Lucene and Solr 3.1 and all further releases coordinated (13 joint releases so far) • 2012: Lucene/Solr 4.0 released
Some Lucene (& Solr) History & Stats
![Page 3: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/3.jpg)
Lucene 4.0 Highlights
• Flexible indexing: pluggable codecs: index format suites
• Flexible scoring: more index stats & similarities that use them
• Faster multithreaded indexing via concurrent flushing: DWPT
• Doc Values: typed single-valued fields: flexible sorting, scoring
• Norms are now doc values: you can have more than one byte!
• More RAM efficient data structures, e.g. terms dict/idx & fieldcache
• Faster search filtering
• Merge I/O can be rate-limited, to reduce I/O contention
• IndexReader is now per-segment
• Completely reworked spatial search
![Page 4: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/4.jpg)
Lucene 4.1 & 4.2 Highlights
• Seeks on writing out index files eliminated • Compressed stored fields and term vectors • AnalyzingSuggester and FuzzySuggester • Lucene facet module improvements: speedups, NRT
support, DrillSideways • PostingsHighlighter: uses postings offsets • CommonTermsQuery: speed up queries with very highly
frequent terms. • Doc Values API and performance improvements • The FST package supports FSTs over 2GB in size • LiveFieldValues: real-time get for Lucene • New classification module
![Page 5: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/5.jpg)
Lucene 4.3 Highlights
• minShouldMatch BooleanQuery major performance improvement
• SortingAtomicReader and SortingMergePolicy • DocIdSetIterator and Scorer now has a cost API • Analyzing/FuzzySuggester now enable recording an
arbitrary byte[] as a payload • Spatial module: support for query relations Within,
Contains, and Disjoint • Facet module: new method computes facet counts
using SortedSetDocValuesField, without a separate taxonomy index.
![Page 6: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/6.jpg)
On the horizon
• More efficient positional queries
• Incremental field updates
• Korean Analyzer
![Page 7: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/7.jpg)
Solr Dev/User Survey Results
![Page 8: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/8.jpg)
Solr Developer/User survey, April 2013
• Survey invitation emailed to 4,136 people:
– LucidWorks training class attendees
– Revolution attendees
– LucidWorks webinar registrants
• 177 have responded so far
![Page 9: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/9.jpg)
Please rank the following features by priority Answered: 165 Skipped: 12
![Page 10: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/10.jpg)
![Page 11: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/11.jpg)
![Page 12: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/12.jpg)
![Page 13: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/13.jpg)
![Page 14: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/14.jpg)
![Page 15: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/15.jpg)
More questions
1. How many attendees are Eclipse developers?
2. How many attendees are running Solr Cloud in production?
![Page 16: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/16.jpg)
Solr: Past, Present & Future
Yonik Seeley LucidWorks
![Page 17: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/17.jpg)
Origins of Solr
• CNET driven to find alternatives to discontinued commercial enterprise search product
• Plan A: ATOMICS (Apache TO MySQL In CNET Search) – Standalone server speaking XML over HTTP
– Meet majority of “search” needs – http://conferences.oreillynet.com/cs/mysqluc2005/view/e_sess/7066
• Plan B: “Something based on Lucene” – Started Summer 2004
– First prototype called “Fusion”, later renamed SOLAR (Search On Lucene And Resin)
![Page 18: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/18.jpg)
Origins of the first Solr admin UI
![Page 19: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/19.jpg)
New admin UI
![Page 20: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/20.jpg)
Timeline (up to 1.4)
Initial prototype
CNET production
CNET contributes Solr to ASF
Solr graduates
from Incubator
Simple faceting
replication
highlighting, dismax
Spellchecking, CSV, Luke
MLT, Update Request
Processors
QParsers Search Components
Multi-core
Distributed Search
Data Import Handler
JMX
1.3 1.4
Statistics Component
Java Replication
Terms and TermVector
Components
Multi-select faceting
Dynamic Clustering
1.1 1.0
1.2
4.0
3.1
![Page 21: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/21.jpg)
Solr 4
• Solr Cloud
– Distributed Indexing
– No single points of failure
– Near Real Time friendly (push replication)
• NoSQL feature set
– Update Durability
– Real-time get
– Atomic Updates
– Optimistic Concurrency
• Pseudo-join, Pivot Faceting, Pseudo-fields, etc
![Page 22: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/22.jpg)
What search solution/version are you currently using?
![Page 23: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/23.jpg)
Recent Enhancements
![Page 24: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/24.jpg)
Document Routing
80000000-bfffffff
00000000-3fffffff
40000000-7fffffff
c0000000-ffffffff
shard1 shard4
shard3 shard2
id = BigCo!doc5
1f27 3c71
(MurmurHash3)
q=my_query shard.keys=BigCo!
1f27 0000 1f27 ffff to
(hash)
shard1
numShards=4 router=compositeId
![Page 25: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/25.jpg)
Seamless Online Shard Splitting
Shard2_0
Shard1
replica
leader Shard2
replica
leader Shard3
replica
leader
Shard2_1
1. New sub-shards created in “construction” state 2. Leader starts forwarding applicable updates, which
are buffered by the sub-shards 3. Leader index is split and installed on the sub-shards 4. Sub-shards apply buffered updates then become
“active” leaders and old shard becomes “inactive”
update
![Page 26: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/26.jpg)
Cloud Enhancements
• Request forwarding
– In a multi-collection cluster, any node can handle/forward requests for any collection
• Collection Aliases http://localhost:8983/solr/admin/collections
?action=CREATEALIAS
&name=northeast
&collections=NY,NJ,PA,CT,ME,MA,NH,RI,VT
• Coming Soon: Shard Aliases
![Page 27: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/27.jpg)
Schema REST API
• Restlet is now integrated with Solr
• Get a specific field curl http://localhost:8983/solr/schema/fields/price
{"field":{
"name":"price",
"type":"float",
"indexed":true,
"stored":true }}
• Get all fields curl http://localhost:8983/solr/schema/fields
• Get Entire Schema! curl http://localhost:8983/solr/schema
![Page 28: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/28.jpg)
Dynamic Schema
• Add a new field (Solr 4.4) curl -XPUT http://localhost:8983/solr/schema/fields/strength -d ‘
{"type":”float", "indexed":"true”}
‘
• Works in distributed (cloud) mode too!
• Future: More schemaless
– Reality: there is no such thing for Lucene based systems
– Type guessing for fields we haven’t seen before
![Page 29: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/29.jpg)
Future
• Greater scalability
• More “NoSQL”
– More ways to update & manipulate documents
• Analytics
– More powerful faceting, functions, statistics
• Improved Relational queries
• More dynamic (settings & configuration)
• Continued focus on ease of use
![Page 30: Keynote Yonik Seeley & Steve Rowe lucene solr roadmap](https://reader034.vdocuments.net/reader034/viewer/2022052310/554f6570b4c9058a148b4aa2/html5/thumbnails/30.jpg)
Thank You!