solr cluster with solrcloud at lucenerevolution (tutorial)
DESCRIPTION
In this presentation we aim to show how to make a high availability Solr cloud with 4.1 using only Solr and a few bash scripts. The goal is to present an infrastructure which is self healing using only cheap instances based on ephemeral storage. We will start by providing a comprehensive overview of the relation between collections, Solr cores, shards and cluster nodes. We continue by an introduction to Solr 4.x clustering using zookeeper with a particular emphasis on cluster state status/monitoring and solr collection configuration. The core of our presentation will be demonstrated using a live cluster. We will show how to use cron and bash to monitor the state of the cluster and the state of its nodes. We will then show how we can extend our monitoring to auto generate new nodes, attach them to the cluster, and assign them shardes (selecting between missing shardes or replication for HA). We will show that using a high replication factor it is possible to use ephemeral storage for shards without the risk of data loss, greatly reducing the cost and management of the architecture. Future work discussions, which might be engaged using an open source effort, include monitoring activity of individual nodes as to scale the cluster according to traffic and usage.TRANSCRIPT
Lucene revolu+on 2013
SIMPLE & “CHEAP” SOLR CLUSTER
Stéphane GamardSearchbox CTO <[email protected]>
1Lucene revolu+on 2013
Lucene revolu+on 2013 2
Searchbox -‐ Search as a Service
“We are in the business of providing search engines on demand”
Lucene revolu+on 2013
Solr Provisioning
3
High Availability• Redundancy• Sustained QPS• Monitoring• Recovery
Index Provisioning• Collec+on crea+on• Cluster resizing• Node distribu+on
Lucene revolu+on 2013
Solr Clustering
4
LB
Master
Slave
Slave
Master
Slave
Backup Backup
Master
Slave
Slave
LB
Monitoring
Before 4.x:
Master/SlaveCustom Rou+ngComplex Provisioning
Lucene revolu+on 2013
Solr Clustering
5
A6er 4.x:
NodesAutoma+c Rou+ngSimple Provisioning
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZKLB LB
Thank you to the SolrCloud Team !!!
Lucene revolu+on 2013
What is SolrCloud?
6
Backward compa=bility• Plain old Solr (with Lucene 4.x)• Same schema• Same solrconfig• Same plugins
Some plugins might need update (distrib)
Lucene revolu+on 2013
What is SolrCloud?
7
Centralized configura=on
• /conf
• /conf/schema.xml• /conf/solrconfig.xml
• numShards
• replica+onFactor• ...
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZKLB LB
Lucene revolu+on 2013
What is SolrCloud?
8
Configura=on & Architecture Agnos=c Nodes
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZKLB LB
• ZK driven configura+on
• Shard (1 core)
• ZK driven role:
• Leader
• Replica
• Peer & Replica+on
• Disposable
Lucene revolu+on 2013
What is SolrCloud?
9
Automa=c Rou=ng
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZKLB LB
• Smart client connect to ZK
• Any node can forward a requests to node that can process it
Lucene revolu+on 2013
What is SolrCloud?
10
Collec=on API• Abstrac+on level• An index is a collec+on• A collec+on is a set of shards• A shard is a set of cores
• CRUD API for collec+on
“Collec?ons represents a set of cores with iden)cal configura?on. The set of cores of a collec?on covers the en?re index”
Lucene revolu+on 2013
What is SolrCloud?
11
Node
Core
Shard
Collec=on Abstrac+on level of interac+on & config
Scaling factor for collec+on size (numShards)
Scaling factor for QPS (replica?onFactor)
Scaling factor for cluster size (liveNodes)
=> SolrCloud is highly geared toward horizontal scaling
Lucene revolu+on 2013 12
nodes => Single effort for scalability
That’s SolrCloud
High Availability• Redundancy• Sustained QPS• Monitoring• Recovery
# replicas
ZK (clusterstatus, livenodes)peer & replica+on
# replicas & # shards
Lucene revolu+on 2013 13
Collection
Shards
Cores
Nodes
SolrCloud -‐ Design
Key metrics• Collec+on size & complexity• JVM requirement• Node requirement
Lucene revolu+on 2013 14
SolrCloud -‐ Collec+on Metrics
Pubmed Index• ~12M documents• 7 indexed fields• 2 TF fields• 3 sorted Fields• 5 stored Fields
Lucene revolu+on 2013 15
A note on sharding “The magic sauce of webscale”
Ram requirement effect
0"
1000"
2000"
3000"
4000"
5000"
6000"
0" 2" 4" 6" 8" 10" 12"
RAM$/$Shard$
# shards
ram
Lucene revolu+on 2013 16
A note on sharding “The magic sauce of webscale”
Disk requirement effect
0"
5"
10"
15"
20"
25"
30"
35"
40"
45"
50"
0" 2" 4" 6" 8" 10" 12" 14" 16"
Disk%/%shard%
# shards
disk
spa
ce
“hidden quote for the book”
Lucene revolu+on 2013 17
SolrCloud -‐ Collec+on Configura+on
Pubmed Index• ~12M documents• 7 indexed fields• 2 TF fields• 3 sorted Fields• 5 stored Fields
Configura=on• numShards: 3• replica+onFactor: 2• JVM ram: ~3G• Disk: ~15G
Lucene revolu+on 2013 18
SolrCloud -‐ Core Sizing
Heuris=cally inferred from “experience”• Size on shard, not collec+on• Do NOT starve resources on nodes• Senle for JVM/Disk sizing • Large amount of spare disk (op+mize)
RAM Disk3 G 60 G
Lucene revolu+on 2013 19
SolrCloud -‐ Cluster Availability
Depends on the nodes!!!Instance ram disk $/h Nodes Min Size $/core/m
m1.medium 3.75 410 0.12 1 6 6 87
m1.large 7.5 850 0.24 2 6 12 87
m1.xlarge 15 1690 0.48 5 6 30 70
m2.xlarge 17.1 420 0.41 5 6 30 60
m2.2xlarge 34.2 850 0.82 11 6 66 54
m1.medium 3.75 410 0.12 3 6 18 28
CCtrl (paas) 1.02 420 -‐ 1 6 6 75( )
Lucene revolu+on 2013 20
SolrCloud -‐ Monitoring
Solr Monitoring• clusterstate.json• /livenodes
Node Monitoring *• load average• core-‐to-‐resource consump+on (Core to CPU)• collec+on-‐to-‐node consump+on (LB logs)
Lucene revolu+on 2013 21
SolrCloud -‐ Provisioning
Stand-‐by nodes• Automa+cally assigned as replica• provides a metric of HA
Node addi=on * (self healing)• Scheduled check on cluster conges+on• Automa+cally spawn new nodes per need
Lucene revolu+on 2013 22
SolrCloud -‐ Conclusion
Using SolrCloud is like juggling• Gets bener with prac+ce• There is always some magic leq• Could become very overwhelming• When it fails you loose your balls
Test -‐> Test -‐> Test -‐> some more Tests -‐> Test
Lucene revolu+on 2013 23
What would make our current SolrCloud cluster even more awesome:• Balance/distribute core based on machine load
• Standby core (replicas not serving request and auto-‐shurng down
Next Steps
Lucene revolu+on 2013 24
Requirement for solrCloud:• Solr Mailing list: solr-‐[email protected]
Further informa+on• blogs & feed: hnp://www.searchbox.com/blog/• Searchbox email: [email protected]
Further Informa+on
Lucene revolu+on 2013
CONFERENCE PARTYThe Tipsy Crow: 770 5th AveStarts after Stump The ChumpYour conference badge gets you in the door
TOMORROW Breakfast starts at 7:30Keynotes start at 8:30
CONTACTStephane [email protected]
25Lucene revolu+on 2013