using elasticsearch and couchbase together to build large scale applications

Post on 26-Jun-2015

8.842 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Couchbase Server 2.0 allows for full-text search integration. In this webinar we examine how you can integrate your Couchbase Server 2.0 cluster with an Elasticsearch Cluster to provide enhanced querying capabilities and build large scale applications.

TRANSCRIPT

Using&Elas*csearch&and&Couchbase&Together&to&Build&Large&Scale&Apps&

Uri&Boness,&Founder,&Elas*csearch&

Dip*&Borkar,&Director,&Products,&Couchbase&

Introduction to Elasticsearch

What is Elasticsearch?

Open source Apache 2 license•

multi-tenant, realtime anddistributed search & analytics

engine

Backed by Elasticsearch (the company)•

Proven technology in productionOver 2 million downloads

What can Elasticsearch do?Unstructured search

find all companies in the “search” market

Structured searchfind all companies founded since 2000

Analyticsfind the average annual revenue of all companies

Combine allfind the average annual revenue of all companies foundedsince 2000 within the “search” market

(near) real-time!

Distributed & multi-tenantA node is single Elasticsearch instanceMultiple nodes can form a clusterA cluster can manage multiple indicesA cluster is agile & self managing

continuously ensuring the distributed characteristics of allindices are maintained and that all nodes in the cluster areefficiently & effectively utilized

••••

The Index

What’s in an index?An identified collection of documentsBuilt & designed for small & large scales

data volumesdata can be split and distributed between shards

loads & HAeach shard can have zero or more replicas

••

starting a node

node_1

creating our first index

node_1

curl -XPUT 'localhost:9200/companies' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }}'

the two shards are allocated

node_1

0 1

curl -XPUT 'localhost:9200/companies' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }}'

starting a second node

node_1 node_2

0 1

curl -XPUT 'localhost:9200/companies' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }}'

shards are relocating

node_1 node_2

0 1

curl -XPUT 'localhost:9200/companies' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }}'

replicas are allocated

node_1 node_2

0 11 0

curl -XPUT 'localhost:9200/companies' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }}'

Indexing Data

the dataDocuments are typically JSON formatted•

curl -XPUT 'localhost:9200/companies/company/1' -d '{ "id" : "elasticsearch", "name" : "elasticsearch", "website" : "http://www.elasticsearch.com", "category" : "software", "overview" : "distributed search & analytics engine", "founded_year" : 2012, "location" : { "city" : "Amsterdam", "country_code" : "NL", "geo" : { "lat" : 52.370176, "lon" : 4.895008 } }}'

sending req. to one of the nodes

node_3node_1 node_2

0 11 010

client

sending req. to one of the nodes

node_3node_1 node_2

0 11 010

client

resolve the target shard

resolve shard & index to primary

node_3node_1 node_2

0 11 010

client

replicate to replicas

node_3node_1 node_2

0 11 010

client

Searching

unstructured searchUsing an extensive & powerful QueryDSL•

curl -XGET 'localhost:9200/companies/_search' -d '{ "query" : {, "match" : { "overview" : "search" } }}'

unstructured searchUsing an extensive & powerful QueryDSL•

curl -XGET 'localhost:9200/companies/_search' -d '{ "query" : {, "match" : { "overview" : "search" } }}'

search for the term “search” in the “overview”field

structured searchnarrows the “searchable” document space•

curl -XGET 'localhost:9200/companies/company/_search' -d '{ "query" : {, "filtered" : { "query" : { "match" : { "overview" : "search" } }, "filter" : { "range" : { "founded_year" : { "gte" : 2000 } } } } }}'

structured searchnarrows the “searchable” document space•

curl -XGET 'localhost:9200/companies/company/_search' -d '{ "query" : {, "filtered" : { "query" : { "match" : { "overview" : "search" } }, "filter" : { "range" : { "founded_year" : { "gte" : 2000 } } } } }}'

only search companies that were founded since year 2000

returned hits{ ... "hits": [ { "_index": "companies", "_type": "company", "_id": "1", "_score": 0.13424811, "_source": { "id": "elasticsearch", "name": "elasticsearch", "website": "http://www.elasticsearch.com", "category": "software", "founded_year": 2012, "overview": "distributed search & analytics engine", "location": { "city": "Amsterdam", "country_code": "NL", "geo": { "lat": 52.370176, "lon": 4.895008 } } } } ] }}

returned hits{ ... "hits": [ { "_index": "companies", "_type": "company", "_id": "1", "_score": 0.13424811, "_source": { "id": "elasticsearch", "name": "elasticsearch", "website": "http://www.elasticsearch.com", "category": "software", "founded_year": 2012, "overview": "distributed search & analytics engine", "location": { "city": "Amsterdam", "country_code": "NL", "geo": { "lat": 52.370176, "lon": 4.895008 } } } } ] }}

returned hits{ ... "hits": [ { "_index": "companies", "_type": "company", "_id": "1", "_score": 0.13424811, "_source": { "id": "elasticsearch", "name": "elasticsearch", "website": "http://www.elasticsearch.com", "category": "software", "founded_year": 2012, "overview": "distributed search & analytics engine", "location": { "city": "Amsterdam", "country_code": "NL", "geo": { "lat": 52.370176, "lon": 4.895008 } } } } ] }}

returned hits{ ... "hits": [ { "_index": "companies", "_type": "company", "_id": "1", "_score": 0.13424811, "_source": { "id": "elasticsearch", "name": "elasticsearch", "website": "http://www.elasticsearch.com", "category": "software", "founded_year": 2012, "overview": "distributed search & analytics engine", "location": { "city": "Amsterdam", "country_code": "NL", "geo": { "lat": 52.370176, "lon": 4.895008 } } } } ] }}

Query DSLQueries (unstructured)

term queries

boolean queries

phrase (proximity) queries

fuzzy/prefix/regexp/wildcards

more...

Filters (structured)term (exact match)

range

boolean

geo_* (e.g. geo_distance)

Analytics(a.k.a facets)

Analytics (facets)Slice & dice your dataCompute aggregations over field valuesAcross any index field/sAll in (near) realtime

••••

used as navigation aid

or analytics dashboards

Elasticsearch is often usedpurely for analytics

(without incorporating free text search)

ExampleFind the average revenue of all companies

since 2000•

curl -XGET 'localhost:9200/companies/revenues/_search' -d '{ "query" : { "match_all" : {} }, "facets" : { "revenue_stats" : { "date_histogram" : { "key_field" : "year", "value_field" : "value", "interval" : "month" } } }}'

ExampleFind the average revenue of all companies

since 2000•

curl -XGET 'localhost:9200/companies/revenues/_search' -d '{ "query" : { "match_all" : {} }, "facets" : { "revenue_stats" : { "date_histogram" : { "key_field" : "year", "value_field" : "value", "interval" : "month" } } }}'

return a yearly breakdown of stats over companies revenues

response"facets": { "revenue_stats": { "_type": "date_histogram", "entries": [ { "time": 956448895664, "mean": 23.0 }, { "time": 987984922557, "mean": 267.1034482758621 }, { "time": 1019520942098, "mean": 195.51724137931035 } ... ] } }

response"facets": { "revenue_stats": { "_type": "date_histogram", "entries": [ { "time": 956448895664, "mean": 23.0 }, { "time": 987984922557, "mean": 267.1034482758621 }, { "time": 1019520942098, "mean": 195.51724137931035 } ... ] } }

year 2000

avg revenue

Types of analyticsterms

unique value counts

rangestatistics of specific field for a set of range groups ofanother field

statisticalstats over a specific field

terms_statsstats over a specific fields for every unique field value

date_/histograma breakdown of statistics of a specific field over a

There’s much moreFine control of how documents are treated

indexed, stored, text analysis, relations

Additional featureshighlighting

suggest API (type ahead, auto-completion)

percolator (reverse search)

support of document relations (parent/child)

extensive geo-location search & analytics

more....

•------

Introduc)on*to*Couchbase*

Couchbase*Server*NoSQL*Document*Database*

Couchbase*Open*Source*Project*

•  Leading(NoSQL(database(project(focused(on(distributed(database(technology(and(surrounding(ecosystem(

•  Supports(both(key;value(and(document;oriented(use(cases(

•  All(components(are(available(under(the(Apache*2.0*Public*License*

•  Obtained(as(packaged(so?ware(in(both(enterprise(and(community(ediAons.(

Couchbase Open Source Project

Easy*Scalability*

Consistent*High*Performance*

Always*On*24x365*

Grow(cluster(without(applicaAon(changes,(without(downAme(with(a(single(click(

Consistent(sub;millisecond((read(and(write(response(Ames((with(consistent(high(throughput(

No(downAme(for(so?ware(upgrades,(hardware(maintenance,(etc.(

JSONJSONJSON

JSONJSON

PERFORMANCE

Flexible*Data*Model*

JSON(document(model(with(no(fixed(schema.(

Couchbase*Server*

Features*in*Couchbase*Server*2.0*

JSON*support* Indexing*and*Querying*

Cross*data*center*replica)on*Incremental*Map*Reduce*

JSONJSONJSON

JSONJSON

Addi)onal*Features*

Built;in(clustering(–(All(nodes(equal((Data(replicaAon(with(auto;failover((Zero;downAme(maintenance(((Built;in(managed(cached((

((

Append;only(storage(layer((Online(compacAon((Monitoring(and(admin(API(&(UI((SDK(for(a(variety(of(languages(

Couchbase*Server*2.0*Architecture*

Heartbe

at(

Process(mon

itor(

Global(singleton

(sup

ervisor(

Confi

guraAon

(manager(

on(each(node(

Rebalance(orchestrator(

Nod

e(he

alth(m

onitor(

one(per(cluster(

vBucket(state(and(replicaA

on(m

anager(

hQp*RE

ST*m

anagem

ent*A

PI/W

eb*UI*

HTTP(8091*

Erlang(port(mapper(4369*

Distributed(Erlang(21100*Y*21199*

Erlang/OTP*

storage(interface(

Couchbase*EP*Engine*

11210*Memcapable((2.0(

Moxi*

11211*Memcapable((1.0(

Memcached*

New*Persistence*Layer*

8092*Query(API(

Que

ry*Engine*

Data*Manager* Cluster*Manager*

3(3( 2(

Cross*data*center*replica)on*–*Data*flow*2(

Managed(Cache(

Disk(Que

ue(

Disk(

ReplicaAon(Queue(

App(Server(

Couchbase(Server(Node(

Doc*1*Doc*1*

Doc*1*

To(other(node(

XDCR(Queue(

Doc*1*

To(other(cluster(

Cross*Datacenter*Replica)on*(XDCR)*

Couchbase*plugYin*for*Elas)csearch*

How*does*it*work?*

Elas)cSearch*

UnidirecAonal(Cross(Data(Center(ReplicaAon(

ElasAcsearch(IntegraAon((via(XDCR)(

RAM(CACHE(

Doc(1(

Doc(2(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

SERVER(1(

Doc(6(

DISK(

RAM(CACHE(

Doc(1(

Doc(2(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

SERVER(2(

Doc(6(

DISK(

RAM(CACHE(

Doc(1(

Doc(2(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

Doc(

SERVER(3(

Doc(6(

DISK(

Couchbase(Cluster(West(Coast(Data(Center(

ES(SERVER(1(

ElasAc(Search(Cluster(

ES(SERVER(2( ES(SERVER(3(

Couchbase(Transport(Plugin(

Couchbase(Transport(Plugin(

Couchbase(Transport(Plugin(

Install*the*Couchbase*PlugYIn*•  PreYrequisite*­  ExisAng(Couchbase(and(ElasAcSearch(Clusters(

•  Install*the*Elas)cSearch*Couchbase*Transport*PlugYin*­  bin/plugin(;install((

((((((((((((couchbaselabs/elasAcsearch;transport;couchbase/1.0.0;dp(

•  Configure*the*PlugYin*­  Set(a(password(­  Install(the(Couchbase(Index(Template(

•  Restart*Elas)cSearch*•  Create*an*Elas)cSearch*index*for*your*documents*

Configure*Couchbase*XDCR*(step*1)*

Configure*Couchbase*XDCR*(step*2)*

Documents*are*now*indexed*in*Elas)csearch*

Document(Count(Increasing(

Reference*Architecture*

Recommended*Usage*PaQern*

Elas)cSearch*

1.((ElasAcSearch(Query(

2.(ElasAcSearch(Result(

3.(Couchbase(MulA;GET(

4.(Couchbase(Result(

Common*Couchbase*Use*Cases*Social*Gaming*

*•  Couchbase(stores(player(and(game(data((

•  Examples(customers(include:(Zynga(

•  Tapjoy,(Ubiso?,(Tencent(

*

*Mobile*Apps*

*•  Couchbase(stores(user(info(and(app(content(

•  Examples(customers(include:(Kobo,(PlayAka((

*

*

Ad*Targe)ng**

•  Couchbase(stores(user(informaAon(for(fast(access(

•  Examples(customers(include:(AOL,(Mediamind,(Convertro((

*

Session*store**

•  Couchbase(Server(as(a(key;value(store(

•  Examples(customers(include:(Concur,(Sabre(

*

User*Profile*Store**

•  Couchbase(Server(as(a(key;value(store(

•  Examples(customers(include:(Tunewiki(

*High*availability*cache**

•  Couchbase(Server(used(as(a(cache(Aer(replacement(

•  Examples(customers(include:(Orbitz(

*

Content*&*Metadata*Store*

•  Couchbase(document(store(with(ElasAc(Search(

•  Examples(customers(include:(McGraw(Hill(

*

*3rd*party*data**aggrega)on**

*•  Couchbase(stores(social(media(and(data(feeds(

•  Examples(customers(include:(Sambacloud(

*

Social*Gaming**

•  Couchbase(stores(player(and(game(data((

•  Examples(customers(include:(Zynga(

•  Tapjoy,(Ubiso?,(Tencent(

*

*Mobile*Apps*

*•  Couchbase(stores(user(info(and(app(content(

•  Examples(customers(include:(Kobo,(PlayAka((

*

*

Ad*Targe)ng**

•  Couchbase(stores(user(informaAon(for(fast(access(

•  Examples(customers(include:(AOL,(Mediamind,(Convertro((

*

Session*store**

•  Couchbase(Server(as(a(key;value(store(

•  Examples(customers(include:(Concur,(Sabre(

*

User*Profile*Store**

•  Couchbase(Server(as(a(key;value(store(

•  Examples(customers(include:(Tunewiki(

*High*availability*cache**

•  Couchbase(Server(used(as(a(cache(Aer(replacement(

•  Examples(customers(include:(Orbitz(

*

Content*&*Metadata*Store*

•  Couchbase(document(store(with(ElasAc(Search(

•  Examples(customers(include:(McGraw(Hill(

*

*3rd*party*data**aggrega)on**

*•  Couchbase(stores(social(media(and(data(feeds(

•  Examples(customers(include:(Sambacloud(

*

RealYworld*example*Couchbase*+*Elas)csearch*

• Content*metadata*• Content:*Ar)cles,*text**• Landing*pages*for*website*• Digital*content:*eBooks,*magazine,*research*material**

Content*and*Metadata*Store*

Use*Case:*Content*and*Metadata*Store*

•  Flexibility*to*store*any*kind*of*content*•  Fast*access*to*content*metadata*(most*accessed*objects)*and*content**•  FullYtext*Search*across*data*set*•  Scales*horizontally*as*more*content*gets*added*to*the*system*

• Fast*access*to*metadata*and*content*via*objectYmanaged*cache*•  JSON*provides*schema*flexibility*to*store*all*types*of*content*and*metadata*•  Indexing*and*querying*provides*realY)me*analy)cs*capabili)es*across*dataset**•  Integra)on*with*Elas)cSearch*for*fullYtext*search*• Ease*of*scalability*ensures*that*the*data*cluster*can*be*grown*seamlessly*as*the*amount*of*user*and*ad*data*grows*

Types*of*Data* Applica)on*Requirements*

Why*NoSQL*and*Couchbase**

McGraw*Hill*Educa)on*Labs**Learning*portal*

*

Use*Case:*Content*and*metadata*store*

Building(a(self;adapAng,(interacAve(learning(portal(with(Couchbase(and(ElasAcsearch(

As learning move online in great numbers

Growing need to build interactive learning environments that

Scale!!Scale(to(millions(of(learners(

Serve(MHE(as(well(as(third;party(content(

Including(open(content(

Support(learning(apps(

010100100111010101010101001010101010(

Self;adapt(via(usage(data(

The Problem*

• Allow(for(elasAc(scaling(under(spike(periods(

• Ability(to(catalog(&(deliver(content(from(many*sources*

• Consistent(lowYlatency*for(metadata(and(stats(access(

• Require(fullYtext*search*support(for(content(discovery(

• Offer(tunable(content(ranking(&(recommendaAon(

funcAons((

Backend is an Interactive Content Delivery Cloud that must:

XML(Databases(

SQL/MR(Engines(

In;memory(Data(Grids(

Enterprise(Search(Servers(

Experimented with a combination of:

The Challenge*

•  Document(Modeling(

•  Metadata(&(Content(Storage(

•  View(Querying(to(support(Content(Browsing(•  ElasAcsearch(IntegraAon((;  Content(Updated(in(near(Real;Time(

;  Search(Content(Summaries(

;  Relevancy(boosted(based(on(User(Preferences(•  Real;Time(Content(Updates(

•  Event(Logging(for(offline(analysis(

Techniques*Used*

Couchbase*2.0*****+******Elas)csearch*

Store(full-text articles(as(well(as(document metadata(for(image,(video(and(text(content(in(Couchbase(

Combine(user(preferences(staAsAcs(with(custom relevancy scoring(to(provide(personalized search results

Logs(user behavior(to(calculate(user(preference(staAsAcs((e.g.(video(>(text)(

1(

2( 4(

ConAnuously(accept updates from(Couchbase(with(new(content(&(stats(

3(

Data(Model(

Content Metadata Bucket

User Profiles Bucket

Content Stats Bucket

•  Stores content metadata for media objects and content for articles

•  Includes tags, contributors, type information

•  Includes pointer to the media

•  Stores user view details per type •  Updated every time a user views

a doc with running count •  To be used for customizing ES

search results per user preference

•  Stores content view details •  Updated for every time a

document is viewed •  To be used for boosting ES

search results based on popularity

Couchbase Views

Top Contributors & Tagsdriven by Incremental MapReduce Views!

Calcula)ng*sta)s)cs*via*Couchbase*

Tuning(content(ranking(via(

ElasticSearch

ElasticSearch-driven based on settings below!

Content popularity boost!

User preference boost!

{ "_id": �4ae5be2df3122f06ba45b70753001841�,

�_rev�: �1-0013b349ffc3afc700000000068000000�, �$flags�: 0, �#expiration�: 0, �type�: �access�, �user�: �chris@gmail.com�, �resource�: �379823�, �timestamp�: �2012-09-02T22:46:07Z�

}

{ "_id": �4ae5be2df3122f06ba45b70753001842�,

�_rev�: �1-0013b349ffc3afc700000000068000000�, �$flags�: 0, �#expiration�: 0, �type�: �create�, �user�: �chris.tse@gmail.com�, �resource�: �948177�, �timestamp�: �2012-09-02T22:48:32Z�

}

What?!

Who?!

Which?!

When?!

Analy)cs*and*Event*Logging*

•  Store*full*event*log*for*offline*analysis*

•  Stored*on*a*separate*analy)cs*cluster**

•  Limit*impact*on*OLTP*

•  Tuned*differently*

•  Keep*an*upperYbound*on*data*size*via*TTL*(24*hrs)*

{ "filter": { "term": { "type": "video� } }, "boost": USER_VIDEO_PREFERENCE * PREFERENCE_SLIDER }

User*Preference*Boost*

•  Use*Elas)csearch*filter*boos)ng*

"script": "_score * (((doc['popularity'].value + 1) / AVG_POPULARITY ) * POPULARITY_SLIDER)"

Document Popularity Boost*

•  Use*Elas)csearch*custom*script*to*score*documents*

"filters": [ { "filter": { "term": { "type": "video" } }, "boost": USER_VIDEO_PREFERENCE * PREFERENCE_SLIDER }, … image and texts filters omitted … ], "score_mode": "total" } }, "script": "_score * (((doc['popularity'].value + 1) / AVG_POPULARITY ) * POPULARITY_SLIDER)" }

Combined Algorithm in a Query*

The Learning Portal*

•  Designed and built as a collaboration between MHE Labs and Couchbase

•  Serves as proof-of-concept and testing harness for Couchbase + Elasticsearch integration

•  Available for download and further development as open source code

h"ps://github.com/couchbaselabs/learningportal5

Q*&*A*

Thank*you*

*******

dip)@couchbase.com*uri.boness@elas)csearch.com*

****

top related