in flux limiting for a multi-tenant logging service

15
In-Flux Limiting for a Multi-Tenant Logging Service Ambud Sharma & Suma Cherukuri Cloud Platform Engineering @ Symantec -Flux Limiting for a Multi-Tenant Logging Service 1

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

578 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: In Flux Limiting for a multi-tenant logging service

In-Flux Limiting for a Multi-Tenant Logging Service

Ambud Sharma & Suma CherukuriCloud Platform Engineering @ Symantec

In-Flux Limiting for a Multi-Tenant Logging Service 1

Page 2: In Flux Limiting for a multi-tenant logging service

Overview• Who are we?• Architecture• Streaming Pipeline• Influx Issue• Influx Limiting Design & Solution• Conclusion• Q & A

In-Flux Limiting for a Multi-Tenant Logging Service 2

Page 3: In Flux Limiting for a multi-tenant logging service

Who are we?• Symantec’s internal cloud team• Host over $1B+ revenue applications• Team– Logging as a Service (LaaS) – Elasticsearch/Kibana– Metering as a Service (MaaS) – InfluxDB/Grafana– Alerting as a Service (AaaS) – Hendrix

We are hiring!

Also checkout Hendrix: https://github.com/Symantec/hendrix

In-Flux Limiting for a Multi-Tenant Logging Service 3

Page 4: In Flux Limiting for a multi-tenant logging service

Our Data

Logs• Application and system

logs data from VM’s and Containers

• Used for troubleshooting

Metrics• Application and system

telemetries• Used for Application

Performance Monitoring

{ “message”: “User logged in from 1.1.1.1”, “@version”: "1", “@timestamp”: "2014-07-16T06:49:39.919Z", “host”: "value", “path”: “/opt/logstash/sample.log", “tenant_id”: "291167ebed3221a006eb", “apikey”: "06be8a-28ef-4568-8cb8-612", “string_boolean”: "true", “host_ip”: "192.168.99.01"}

{ “@version”: "1", “@timestamp”: "2014-07-16T06:49:39.919Z", “host”: "host1.symantec.com", “tenant_id”: "291167ebed3221a006ebf6", “apikey”: "06be8a-28ef-4568-8cb8-618", “value”: 0.65, “name”: “cpu”}

Log Event Metric Event

In-Flux Limiting for a Multi-Tenant Logging Service 4

Page 5: In Flux Limiting for a multi-tenant logging service

LMM Architecture

Redis

Customer Agents

Elasticsearch

InfluxDB

Log Topology

Metrics Topology

Kafka

Logstash

Users

Open to customers

In-Flux Limiting for a Multi-Tenant Logging Service 5

Page 6: In Flux Limiting for a multi-tenant logging service

Streaming Pipeline

• Validate events to match schema to optimize indexing

• Authenticate events to route data to the correct index

• Have 1 index per day per tenant

Kafk

aValidate Auth Index

In-Flux Limiting for a Multi-Tenant Logging Service 6

Page 7: In Flux Limiting for a multi-tenant logging service

Influx Issue• You know your data store performance

limits (find EPS from benchmark/capacity)

• Tenants send a lot of data and ingestion rate is never linear

• Ingestion spikes are bound to happen in a real-time streaming application

• Wouldn’t it be great if you could normalize these spikes?

In-Flux Limiting for a Multi-Tenant Logging Service 7

Page 8: In Flux Limiting for a multi-tenant logging service

Influx Limiting• Normalize the EPS curve using buffers• Like a Hydro Dam, explicitly allocate EPS resource to tenants

Before

After

In-Flux Limiting for a Multi-Tenant Logging Service 8

Page 9: In Flux Limiting for a multi-tenant logging service

Design - OptionsApproach 1 Approach 2

• Route to separate Kafka topic• No back-pressure in primary queue• Secondary queue is drained at a slower pace• Events may appear out of order

• Controlled back-pressure in the primary queue• Selectively reduce ingestion rate for tenants• Events will always appear in order

In-Flux Limiting for a Multi-Tenant Logging Service 9

Page 10: In Flux Limiting for a multi-tenant logging service

Customer Requirements• Customers want threshold quotas defined for them • Thresholds defined as policies (duration in seconds)• Policies saved in a data store

Tenant A Tenant B Tenant C

{“threshold”: 100,“window”: 90}

{“threshold”: 700,“window”: 10}

{“threshold”: 900,“window”: 1}

In-Flux Limiting for a Multi-Tenant Logging Service 10

Page 11: In Flux Limiting for a multi-tenant logging service

Bolt Design

Kafk

a

1. Track “Event Rate” for each Tenant for the policy window

2. If threshold exceeds then throttle else allow the events

3. Reset window when the time interval is complete (tumbling window)

Validate Auth Throttle Index

In-Flux Limiting for a Multi-Tenant Logging Service 11

Page 12: In Flux Limiting for a multi-tenant logging service

Scheduled-task design pattern• Clock is maintained using

Storm Tick Tuple

• Tenant’s counter is incremented when event is received from it

• Counters are reset when modulated value matches Is Time % Throttle Duration = 0?

= Tenant Throttle Counter

Clock timeModulo

Reset counters for each tenant in this sliceNothing to Reset

= Tenant Throttle Duration (modulated)

Reset counters for each tenant in this slice

In-Flux Limiting for a Multi-Tenant Logging Service 12

Page 13: In Flux Limiting for a multi-tenant logging service

Results

13

• Reduced EPS to Elasticsearch

• We can normalize flow rate based on load

In-Flux Limiting for a Multi-Tenant Logging Service

Page 14: In Flux Limiting for a multi-tenant logging service

In-Flux Limiting for a Multi-Tenant Logging Service

Conclusion• Overview of real-time log and metric indexing

• Approaches to rate limit in real-time streaming application

• Design pattern to efficiently perform counting in Storm

14

That’s all folks!

Page 15: In Flux Limiting for a multi-tenant logging service

Questions?

In-Flux Limiting for a Multi-Tenant Logging Service 15