an introduction to distributed search with datastax enterprise search

32
TOO BIG TO FAIL An Introduction to Distributed Search with Cassandra and Solr OpenSource Connections @PatriciaGorla [email protected]

Upload: patricia-gorla

Post on 10-May-2015

2.684 views

Category:

Technology


1 download

DESCRIPTION

This is an overview of distributed search with Cassandra and Solr.

TRANSCRIPT

Page 1: An Introduction to Distributed Search with Datastax Enterprise Search

TOO BIG TO FAILAn Introduction to Distributed Search with Cassandra and Solr

OpenSource Connections@PatriciaGorla

[email protected]

Page 2: An Introduction to Distributed Search with Datastax Enterprise Search

ABOUT MESystems AnalystProgramming

Information Retrieval

Page 3: An Introduction to Distributed Search with Datastax Enterprise Search

Created at Facebook to power inbox search

Distributed data store run on commodity servers

Highly available

No one single point of failure

CASSANDRA

Page 4: An Introduction to Distributed Search with Datastax Enterprise Search

WHO USES CASSANDRA?

Page 5: An Introduction to Distributed Search with Datastax Enterprise Search

SEARCH + CASSANDRA, 1

• First implementation: Solandra (originally Lucandra)

• Replaced Lucene index with Cassandra column families

Page 6: An Introduction to Distributed Search with Datastax Enterprise Search

SEARCH + CASSANDRA, 2

•DataStax Enterprise Search

• Uses native Lucene index

• All data is retrieved from Cassandra

Page 7: An Introduction to Distributed Search with Datastax Enterprise Search

Datastax Enterprise Search Cluster

DistributedLinearly ScalableHighly AvailableEventually ConsistentFull-text searchAggregation

Page 8: An Introduction to Distributed Search with Datastax Enterprise Search

SETTING UP THE SCHEMA

• <fields>

• <field name="id" type="string" indexed="true" stored="true"/>

• <field name="name" type="text" indexed="true" stored="true"/>

• <field name="body" type="text" indexed="true" stored="true"/>

• <field name="title" type="text" indexed="true" stored="true"/>

• <field name="date" type="string" indexed="true" stored="true"/>

• </fields>

Page 9: An Introduction to Distributed Search with Datastax Enterprise Search

WRITING TO CLUSTER

•Write to either Cassandra clients or Solr API

•Write process is the same

• True atomic updates to Cassandra

Page 10: An Introduction to Distributed Search with Datastax Enterprise Search

Cassandra nodes are set up according to row-key hash.

Page 11: An Introduction to Distributed Search with Datastax Enterprise Search

Data can be written directly to Cassandra

Page 12: An Introduction to Distributed Search with Datastax Enterprise Search

Data is distributed according to row key hash and replication factor

Page 13: An Introduction to Distributed Search with Datastax Enterprise Search

DSE first writes to Cassandra

Page 14: An Introduction to Distributed Search with Datastax Enterprise Search

And then updates the secondary index on Solr

Page 15: An Introduction to Distributed Search with Datastax Enterprise Search

The quorum responds with success / failure

Page 16: An Introduction to Distributed Search with Datastax Enterprise Search

Data is now distributed evenly

Page 17: An Introduction to Distributed Search with Datastax Enterprise Search

READING FROM CLUSTER

• Read either Cassandra-side or through Solr API

• Cassandra: fast reads*

• Solr : full-text search

• Read direction affects performance

•Data is stored in Cassandra

Page 18: An Introduction to Distributed Search with Datastax Enterprise Search

Query is sent to node

Page 19: An Introduction to Distributed Search with Datastax Enterprise Search

Node uses gossip to find who has the information

Page 20: An Introduction to Distributed Search with Datastax Enterprise Search

QUERYING CASSANDRA

• Can query Solr or Cassandra directly

• Limited syntax with CQL, can use solr_query parameter

Page 21: An Introduction to Distributed Search with Datastax Enterprise Search

Querying Cassandra directly

Page 22: An Introduction to Distributed Search with Datastax Enterprise Search

Cassandra retrieves information from column

family

Page 23: An Introduction to Distributed Search with Datastax Enterprise Search

Querying Solr index

Page 24: An Introduction to Distributed Search with Datastax Enterprise Search

Row-key hashes are stored in Solr, and

Cassandra is queried for stored data

Page 25: An Introduction to Distributed Search with Datastax Enterprise Search

Cassandra node sends request to node with the

corresponding hash, returns information

Page 26: An Introduction to Distributed Search with Datastax Enterprise Search

Data is always synced

Page 27: An Introduction to Distributed Search with Datastax Enterprise Search

Both nodes respond with information

Page 28: An Introduction to Distributed Search with Datastax Enterprise Search

Updates can be committed and searched over in real time

Page 29: An Introduction to Distributed Search with Datastax Enterprise Search

PRODUCTION USE

•Will want a mix of analytics, search nodes

Page 30: An Introduction to Distributed Search with Datastax Enterprise Search

An OLTP - OLAP integrated solution

Page 31: An Introduction to Distributed Search with Datastax Enterprise Search

TRADEOFFS

• Changing the Solr schema requires reindex (standard for Solr)

•No multi-valued fields or composite columns