building a relevance platform with couchbase and elasticsearch

Post on 26-Jan-2015

112 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

These slides were from my Goto Amsterdam presentation. During this presentation I went into detail about how we're building a high performance relevance platform at Hippo with Couchbase and Elasticsearch. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.

TRANSCRIPT

OneHippo @ Goto

follow the Hippo trail

Building a relevance platform with Couchbase

and Elasticsearch@jreijn | Hippo

#gotoams, June 18

follow the Hippo trail

OneHippo @ Goto

About me

• Architect @ Hippo

• DevOps guy

• Blogger @ http://blog.jeroenreijn.com

follow the Hippo trail

OneHippo @ Goto

About Hippo

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

Relevance?

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

“The capability of a search engine or function to

retrieve data appropriate to a user's needs.”

http://www.thefreedictionary.com/relevance

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

How we deliver relevant content

@Hippo

follow the Hippo trail

OneHippo @ Goto

Registration

Visitor - entity making HTTP requests

Collector - records data about a visitor or his behavior

Example: location collector (GeoIPCollector)

Targeting Data - all data about a specific visitor

Example: IP address is located in Amsterdam

follow the Hippo trail

OneHippo @ Goto

MatchingCharacteristic - a type of fact about visitors

Example: "comes from a city", "experiences a type of weather"

Target Group - the specification of a Characteristic

Example: "comes from a European city", "comes from Amsterdam"

Persona - one or more target groups that describe a certain type of visitor

Example: "Jim, the European urban consumer",

"Alice, the Pet owner"

follow the Hippo trail

OneHippo @ Goto

What do we store?Request log

Targeting data

Statistics

Averages, e.g. how many visitors became which persona

follow the Hippo trail

OneHippo @ Goto

Real-time analysis

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoArchitecture

follow the Hippo trail

OneHippo @ Goto

RDBMS

Hippo Delivery Tier

Hippo Repository

App server

XMLJSON (X)HTML

follow the Hippo trail

OneHippo @ Goto

Delivery Tier

URL Matching

Fetch content

Compose output

Request

Response

Request

follow the Hippo trail

OneHippo @ Goto

Delivery Tier

URL Matching

Targeting Data Collection

Compose output

Request

Response

Request

Fetch content

Scoring

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoScaling

follow the Hippo trail

OneHippo @ Goto

RDBMS

Hippo Delivery Tier

Hippo Repository

App server

Hippo Delivery Tier

Hippo Repository

App server

Scaling out

follow the Hippo trail

OneHippo @ Goto

RDBMS

Delivery Tier

Repository

App server

Delivery Tier

Repository

App server

Scaling out

TargetingDatastore

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoWhat kind of ‘storage’?

follow the Hippo trail

OneHippo @ Goto

Distributed Cache?

follow the Hippo trail

OneHippo @ Goto

We have a winner!

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

Requirements change!

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoNoSQL to the rescue

follow the Hippo trail

OneHippo @ Goto

Suitable types• Key-value store

• Document database

follow the Hippo trail

OneHippo @ Goto

Assessment Criteria

Maturity Data model

Consistency model

PerformanceReplication

Caching model Query model

Monitoring

Scalability

Reliability

Support

follow the Hippo trail

OneHippo @ Goto

Selection Criteria• Performance!

• Scalability

• Schema flexibility

• Simplicity

• Monitoring

• Support

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoPerformance !!

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoScalability

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoSchema flexibility

follow the Hippo trail

OneHippo @ Goto

{ "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": []}

Request log document

follow the Hippo trail

OneHippo @ Goto

{ "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" }}

Visitor document

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoSimplicity

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoMonitoring

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoSupport

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoCouchbase

follow the Hippo trail

OneHippo @ Goto

Why Couchbase?

• Drop-in replacement for memcached

• Read/Write-through cache

• High throughput

• Easy scalability

• Schema flexibility

• Low latency

follow the Hippo trail

OneHippo @ Goto

Couchbase

• Open Source

• Document-oriented

• Easy Scalable

• Consistent High Performance

follow the Hippo trail

OneHippo @ Goto

Performance

• Object managed cache

• Write Queue to disk

• Avoids Cold Cache

follow the Hippo trail

OneHippo @ Goto

Easy scalable

• Auto sharding

• Cross cluster replication (XDCR)

• Master - Master replication

follow the Hippo trail

OneHippo @ Goto

Flexible data model

• Native JSON support

• Incremental Map Reduce

• Gives power to the developer

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

How we run Couchbase @Hippo

follow the Hippo trail

OneHippo @ Goto

Load Balancer

Database cluster

Hippo Delivery Tier Couchbase cluster

•Request log data•Targeting data•Statistics data

follow the Hippo trail

OneHippo @ Goto

Query capabilities• Querying via views

• Secondary indexes via views

• Views based on Map - Reduce

• Lacks some advanced query capabilities

follow the Hippo trail

OneHippo @ Goto

Elasticsearch

• Apache Lucene

• Designed to be distributed

• Schema free

• Apache 2 licensed

• RESTful API

follow the Hippo trail

OneHippo @ Goto

Added value of ES• Full text search

• Faceted search

• Geo spatial search

• All in (near) real-time

follow the Hippo trail

OneHippo @ Goto

Couchbase Server Cluster Elasticsearch Server Cluster

Hippo Delivery Tier

Java API

Wri

te

Rea

d

XDCR Couchbase ES Transport plugin

Replicating to ES

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoDemo time!

follow the Hippo trail

OneHippo @ Goto

OneHippo @ GotoWhat’s Next?

follow the Hippo trail

OneHippo @ Goto

Advanced analytics

follow the Hippo trail

OneHippo @ Goto

OneHippo @ Goto

Thank you!

Questions?

j.reijn@onehippo.com@jreijn

ps. We’re hiring!

top related