elasticsearch in netflix

145
Elasticsearch In Netflix Danny Yuan, Jae Bae

Upload: danny-yuan

Post on 15-Jan-2015

4.536 views

Category:

Technology


3 download

DESCRIPTION

Slides for the Elasticsearch Meetup in Netflix

TRANSCRIPT

Page 1: Elasticsearch in Netflix

Elasticsearch In NetflixDanny Yuan, Jae Bae

Page 2: Elasticsearch in Netflix

Welcome

@Elasticsearch - Elasticsearch !@stonse - Sudhir Tonse !@g9yuayon - Danny Yuan !@metacret - Jae Bae

Hashtag: #ES_in_Netflix

Page 3: Elasticsearch in Netflix

Who Are We?

Page 4: Elasticsearch in Netflix

Who Are We?Software engineers in Netflix’s Platform Engineering team, working on large scale data infrastructure

Page 5: Elasticsearch in Netflix

Who Are We?Software engineers in Netflix’s Platform Engineering team, working on large scale data infrastructure

Building and operating Netflix’s cloud real-time query service

Page 6: Elasticsearch in Netflix

Why Are We Here?

Page 7: Elasticsearch in Netflix

How We Use Elasticsearch

Why Are We Here?

Page 8: Elasticsearch in Netflix

How We Use Elasticsearch

Why Elasticsearch

Why Are We Here?

Page 9: Elasticsearch in Netflix

How We Use Elasticsearch

Why Elasticsearch

How We Run Elasticsearch

Why Are We Here?

Page 10: Elasticsearch in Netflix

How We Use Elasticsearch

Why Elasticsearch

How We Run Elasticsearch

To Seek Your Feedback

Why Are We Here?

Page 11: Elasticsearch in Netflix

How We Use Elasticsearch

Page 12: Elasticsearch in Netflix

Querying Log Events

Tracking Service Deployments

Page 13: Elasticsearch in Netflix

Querying Log Events

Page 14: Elasticsearch in Netflix

A Little Historical Perspective

Page 15: Elasticsearch in Netflix

photo credit: http://www.flickr.com/photos/decade_null/142235888/sizes/o/in/photostream/

Netflix is a log generating company that also happens to stream movies

- Adrian Cockroft

Page 16: Elasticsearch in Netflix
Page 17: Elasticsearch in Netflix
Page 18: Elasticsearch in Netflix

A Humble Beginning

Page 19: Elasticsearch in Netflix

A Humble Beginning

Page 20: Elasticsearch in Netflix

A Humble Beginning

Page 21: Elasticsearch in Netflix

A Humble Beginning

Page 22: Elasticsearch in Netflix

Things Changed

Page 23: Elasticsearch in Netflix
Page 24: Elasticsearch in Netflix
Page 25: Elasticsearch in Netflix

ApplicationApplication

Application Application

Application

Application

Application

Application

ApplicationApplication

Page 26: Elasticsearch in Netflix

70,000,000,000

Page 27: Elasticsearch in Netflix

1,500,000

Page 28: Elasticsearch in Netflix

Making Sense of Billions of Events

Page 29: Elasticsearch in Netflix

So We Evolved

Page 30: Elasticsearch in Netflix

So We Evolved

Page 31: Elasticsearch in Netflix

hgrep -C 10 -k 5,2,3 'users.*[1-9]{3}' *catalina.out s3//bucket

So We Evolved

Page 32: Elasticsearch in Netflix

hgrep -C 10 -k 5,2,3 'users.*[1-9]{3}' *catalina.out s3//bucket

So We Evolved

Page 33: Elasticsearch in Netflix

hgrep -C 10 -k 5,2,3 'users.*[1-9]{3}' *catalina.out s3//bucket

So We Evolved

Page 34: Elasticsearch in Netflix

hgrep -C 10 -k 5,2,3 'users.*[1-9]{3}' *catalina.out s3//bucket

So We Evolved

Page 35: Elasticsearch in Netflix

hgrep -C 10 -k 5,2,3 'users.*[1-9]{3}' *catalina.out s3//bucket

So We Evolved

select * from log_events where dateint=20140101

Page 36: Elasticsearch in Netflix

Field Name Field Value

Client “API”

Server “Cryptex”

StatusCode 200

ResponseTime 73

Page 37: Elasticsearch in Netflix

Server Farm

Server Farm

Server Farm

Log Collectors

Log data

Log data

Log data

Page 38: Elasticsearch in Netflix
Page 39: Elasticsearch in Netflix
Page 40: Elasticsearch in Netflix
Page 41: Elasticsearch in Netflix
Page 42: Elasticsearch in Netflix
Page 43: Elasticsearch in Netflix

What Could Go Wrong?

Page 44: Elasticsearch in Netflix
Page 45: Elasticsearch in Netflix

You thought parallelization would save the day? Think again

Page 46: Elasticsearch in Netflix

You thought parallelization would save the day? Think again

Page 47: Elasticsearch in Netflix

What Is Missing?

Page 48: Elasticsearch in Netflix

Interactive Exploration

Page 49: Elasticsearch in Netflix

Functional Requirements

Arbitrary Boolean QueriesAggregated Query - Top N Query - Trend - Distribution

Page 50: Elasticsearch in Netflix

Non-Functional Requirements

- Interactive (response within seconds) !

- Quickly locates the right log events

- Minimal programming effort

Page 51: Elasticsearch in Netflix

It’s All about Extracting Small Data Out of Big Data

Page 52: Elasticsearch in Netflix
Page 53: Elasticsearch in Netflix
Page 54: Elasticsearch in Netflix
Page 55: Elasticsearch in Netflix

Now Back to the Use Case

Page 56: Elasticsearch in Netflix

Intelligent Alerts

Page 57: Elasticsearch in Netflix

Guided Debugging in the Right Context

Page 58: Elasticsearch in Netflix

Guided Debugging in the Right Context

Page 59: Elasticsearch in Netflix

Guided Debugging in the Right Context

Page 60: Elasticsearch in Netflix

Guided Debugging in the Right Context

Page 61: Elasticsearch in Netflix

Guided Debugging in the Right Context

Page 62: Elasticsearch in Netflix

Guided Debugging in the Right Context

Page 63: Elasticsearch in Netflix

Guided Debugging in the Right Context

Page 64: Elasticsearch in Netflix

Guided Debugging in the Right Context

Page 65: Elasticsearch in Netflix

A Useful Pattern

Page 66: Elasticsearch in Netflix

Aggregated Query -> Individual Query

Page 67: Elasticsearch in Netflix

- S3 diagnostics !

- Tracking email campaigns

-Request traces

Examples

Page 68: Elasticsearch in Netflix

RequestId Parent Id Node Id Service Name Status

4965-4a74 0 123 Edge Service 200

4965-4a74 123 456 Gateway 200

4965-4a74 456 789 Service A 200

4965-4a74e 456 abc Service B 200

Status:200

Page 69: Elasticsearch in Netflix
Page 70: Elasticsearch in Netflix

Edge Service (456) ---> Gateway (789)

Status Code

Endpoints

25 ms

/rest/service

Request ID

Data Name Value

200

Response Time

4965-4a74

Page 71: Elasticsearch in Netflix

Why Elasticsearch?

Page 72: Elasticsearch in Netflix

Automatic Sharding and Replication

Page 73: Elasticsearch in Netflix
Page 74: Elasticsearch in Netflix

Flexible Schema

Page 75: Elasticsearch in Netflix

Flexible Schema

- Schemaless

Page 76: Elasticsearch in Netflix

Flexible Schema

- Schemaless

- Reasonable defaults

Page 77: Elasticsearch in Netflix
Page 78: Elasticsearch in Netflix

Nice Extension Model

Page 79: Elasticsearch in Netflix

Nice Extension Model- Customizable REST Actions

Page 80: Elasticsearch in Netflix

Nice Extension Model- Customizable REST Actions

- Site Plugins

Page 81: Elasticsearch in Netflix

Nice Extension Model- Customizable REST Actions

- Site Plugins- River Plugins

Page 82: Elasticsearch in Netflix

Nice Extension Model- Customizable REST Actions

- Site Plugins- River Plugins- Discovery Module

Page 83: Elasticsearch in Netflix
Page 84: Elasticsearch in Netflix

Ecosystem - Plugins, Kibana

Page 85: Elasticsearch in Netflix

Tracking Service Deployments

Page 86: Elasticsearch in Netflix

!

{ edda }

Page 87: Elasticsearch in Netflix
Page 88: Elasticsearch in Netflix

Built by Netflix Monitoring Eng Team

Page 89: Elasticsearch in Netflix

Built by Netflix Monitoring Eng Team

Tracks History and Changes to Service Deployments

Page 90: Elasticsearch in Netflix

Built by Netflix Monitoring Eng Team

Tracks History and Changes to Service Deployments

Keeps Many Revisions

Page 91: Elasticsearch in Netflix

Built by Netflix Monitoring Eng Team

Tracks History and Changes to Service Deployments

Keeps Many Revisions

Tracks Dozens of Document Types

Page 92: Elasticsearch in Netflix
Page 93: Elasticsearch in Netflix
Page 94: Elasticsearch in Netflix

Why Elasticsearch?

Page 95: Elasticsearch in Netflix
Page 96: Elasticsearch in Netflix

Schemas may change at any time

Page 97: Elasticsearch in Netflix

Schemas may change at any time

Go schemaless

Page 98: Elasticsearch in Netflix
Page 99: Elasticsearch in Netflix

Users may search for any combination of fields

Page 100: Elasticsearch in Netflix

Users may search for any combination of fields

This is what search engine is designed for

Page 101: Elasticsearch in Netflix
Page 102: Elasticsearch in Netflix

Users often needs only a few fields

Page 103: Elasticsearch in Netflix

Users often needs only a few fields

Projection via “fields” query

Page 104: Elasticsearch in Netflix
Page 105: Elasticsearch in Netflix

Need range queries on date and revisions

Page 106: Elasticsearch in Netflix

Need range queries on date and revisions

Natively supported by Elasticsearch

Page 107: Elasticsearch in Netflix

Need range queries on date and revisions

Natively supported by Elasticsearch

Route by document ID

Page 108: Elasticsearch in Netflix

Running ES in Netflix

Page 109: Elasticsearch in Netflix

Operational Challenges

Page 110: Elasticsearch in Netflix

Operational Challenges

Back pressure when indexing

Page 111: Elasticsearch in Netflix

Operational Challenges

Back pressure when indexing

Diverse configurations and data

Page 112: Elasticsearch in Netflix

Operational Challenges

Back pressure when indexing

Diverse configurations and data

Dynamic flow of log events

Page 113: Elasticsearch in Netflix

Operational Challenges

Back pressure when indexing

Diverse configurations and data

Dynamic flow of log events

Needs extensive monitoring and alerting

Page 114: Elasticsearch in Netflix

Operational Challenges

Back pressure when indexing

Diverse configurations and data

Dynamic flow of log events

Needs extensive monitoring and alerting

Tolerating outage at different scales

Page 115: Elasticsearch in Netflix

Favor Pulling Over Pushing

Page 116: Elasticsearch in Netflix
Page 117: Elasticsearch in Netflix
Page 118: Elasticsearch in Netflix

Choose Config with Data

Page 119: Elasticsearch in Netflix
Page 120: Elasticsearch in Netflix

Integrating ES

Page 121: Elasticsearch in Netflix

AMI for Deployment by Asgard

Page 122: Elasticsearch in Netflix

Archaius for Configuration

Page 123: Elasticsearch in Netflix

Eureka for Server Discovery

Page 124: Elasticsearch in Netflix

Suro for Data Delivery

Page 125: Elasticsearch in Netflix

Servo for Monitoring Metrics

Page 126: Elasticsearch in Netflix

Zone-aware Replication

Page 127: Elasticsearch in Netflix

Multi-region Deployment

Page 128: Elasticsearch in Netflix

Multi-region Deployment

Discovery over Cassandra

Region-aware replication

Page 129: Elasticsearch in Netflix

Favor Index Rolling Over TTL

Page 130: Elasticsearch in Netflix

Favor Index Rolling Over TTL

A dedicated service manages index rolling

Uses index template and routing

Page 131: Elasticsearch in Netflix

Worth Trying G1

Page 132: Elasticsearch in Netflix

Worth Trying G1

Not recommended by ES team, but

Page 133: Elasticsearch in Netflix

Worth Trying G1

Not recommended by ES team, but

Has fewer and shorter GC pauses

Page 134: Elasticsearch in Netflix

Worth Trying G1

Not recommended by ES team, but

Has fewer and shorter GC pauses

Occasional SIGSEGV, but it’s okay

Page 135: Elasticsearch in Netflix

Simple Majority for Master Election

Page 136: Elasticsearch in Netflix

Simple Majority for Master Election

Split-brain problem

Page 137: Elasticsearch in Netflix

Simple Majority for Master Election

Split-brain problemdiscovery.zen.minimum_master_nodes

Page 138: Elasticsearch in Netflix

Simple Majority for Master Election

Split-brain problemdiscovery.zen.minimum_master_nodes

Dynamically updated

Page 139: Elasticsearch in Netflix

Future Work

Page 140: Elasticsearch in Netflix

Future WorkAutomatic incremental backup and restore

Page 141: Elasticsearch in Netflix

Future WorkAutomatic incremental backup and restoreAuto scaling

Page 142: Elasticsearch in Netflix

Future WorkAutomatic incremental backup and restoreAuto scaling

Fully automated deployment

Page 143: Elasticsearch in Netflix

Future WorkAutomatic incremental backup and restoreAuto scaling

Fully automated deployment

Support more use cases

Page 144: Elasticsearch in Netflix

We’re Hiring

Page 145: Elasticsearch in Netflix

Thank You!