traxticsearch

16
Traxticsearch Search for the elusive ELK Stack

Upload: will-button

Post on 16-Apr-2017

323 views

Category:

Software


1 download

TRANSCRIPT

Traxticsearch

Search for the elusive ELK Stack

Previous Architecture

One Cluster:3 master nodes12 data nodesLogs, Processing data, “User” data

Current ArchitectureLogstash Cluster:3 master/ 9 dataLogs only

Custer Cluster:3 master/ 10 dataProcessing data, mission criticalSoon to be firewalled off

Winston Cluster:3 master/ 3 data“Prod” quality playgroundKibana accessrequires CCB to create index/dashboard

Stats by Cluster Logstash Cluster:12 nodes1,251 indices1,158 shards436M docs716 GB data(1,158 closed indices)

Custer Cluster:13 nodes1,187 indices1,995 shards115M docs1.75 TB data(559 closed indices)

Winston Cluster:6 nodes2 indices3 shards10M docs5.39 GB data(0 closed indices)

Decision to split

Data typesData usage

SLAs+ Use cases

Better Performance

Elastizabbix: Monitoring

● Written Angrily (...friday night)● Old fashioned● Auto-discovers nodes and indices● Dot-notation syntax to collect anything● Managed from the zabbix user interface● Will not overload the cluster with data

collection● Works surprisingly well

Elastizabbix: MonitoringElastic Stats API:GET _cluster/stats“indices”: { "docs": { "count": 418156163, "deleted": 2278242 }

}

Zabbix Item (avoids scripting):elastizabbix[cluster, indices.docs.count] = 418156163

Elastizabbix: AlertingTriggers (get an adult!):{elastizabbix[nodes,nodes.{#NODE}.jvm.mem.heap_used_percent].last()}>95 = Disaster!

● Escalate to operations (email, XMPP, slack, kibana, etc)

● Look at your favorite monitoring tool (zabbix, marvel, HQ, Kopf, etc)

● Do something about it before the API becomes unreliable.

The quest for mbeans

Relying on the Elasticsearch API for monitoring/statistics is the equivalent of relying on the patient for info during surgery.

Things I wish I knew before...

get data out of jail

Use case

● time based ?● sharding strategies

Bulk Indexing ● Tune for payload size not doc count ~ 5-15MB ● EsRejectedExecutionException or

TOO_MANY_REQUESTS (429) ● Handling failures

Mapping● _default_ mapping● dynamic mapping● templates

Eventually we learned...

● 0 vs 0.0● 253-1 vs 263-1● lucene query

syntax● bazillion shards