accumulo summit 2015: tracing in accumulo and hdfs [internals]

of 20/20
© Hortonworks Inc. 2015 Tracing in Accumulo & HDFS Billie Rinaldi VP Apache Accumulo, PMC Chair Sr. Member of Technical Staff, Hortonworks, Inc. April 28, 2015 Page 1 Apache, Accumulo, Apache Accumulo, and the Accumulo logo are trademarks of the Apache Software Foundation.

Post on 15-Jul-2015

106 views

Category:

Technology

2 download

Embed Size (px)

TRANSCRIPT

  • Hortonworks Inc. 2015

    Tracing in Accumulo & HDFS"Billie Rinaldi

    VP Apache Accumulo, PMC ChairSr. Member of Technical Staff, Hortonworks, Inc."

    April 28, 2015"

    Page 1"

    Apache, Accumulo, Apache Accumulo, and the Accumulo logo are trademarks of the Apache Software Foundation."

  • Hortonworks Inc. 2015

    What is Tracing?"Allows applications to time various operations across a distributed system, which enables better understanding of system behavior and performance!

    Each trace is broken into sub-operations called spans, so you can see which take the longest!

    Trace information is transmitted across RPC so the tracing can continue on other servers!

    Based on Googles Dapper paper!http://research.google.com/pubs/pub36356.html"

    Page 2"

  • Hortonworks Inc. 2015

    Tracing Operations"

    Page 3

    End

    Trace 1, Span A, No Parent

    Start time in milliseconds

    Trace 1, Span B, Parent A Trace 1, Span C, Parent A

    Trace 1, Span D, Parent B

  • Hortonworks Inc. 2015

    Span Components"Description string"Trace ID random long!Span ID random long!Parent Span ID random long!Start Time milliseconds!Stop Time milliseconds!Key/Value Annotations map of !Timeline Annotations list of milliseconds, string!

    Page 4"

  • Hortonworks Inc. 2015

    Tracing Components"Instrumentation library

    Formerly cloudtrace / accumulo-trace, now htrace (http://htrace.incubator.apache.org/)

    Span Receivers In-process class that takes spans and stores or sends them

    Collection System Receives spans and inserts them into a central store (e.g.

    Accumulo, HBase, leveldb, etc.) Accumulo tracer, Zipkin, htraced

    Visualization / Inspection System Accumulo monitor / shell formatter, Zipkin, htraced

    Page 5

  • Hortonworks Inc. 2015

    Tracing Components"

    Page 6

    Server Process SpanReceiver

    Server Process SpanReceiver

    Client Process SpanReceiver

    Collector Process

    Instrumented Code

    Storage

  • Hortonworks Inc. 2015

    Apache HTrace (incubating)"Released version 3.1.0-incubating

    Used by Hadoop 2.7.0 and Accumulo 1.7.0

    Ongoing work on improving htraced collector for upcoming 3.2.0-incubating Leveldb for storage REST API for span collection Simple UI

    Page 7

  • Hortonworks Inc. 2015

    Enabling Tracing"Load SpanReceivers in the current process by performing only one of the following steps

    Load SpanReceivers based on Accumulo conf using trace.span.receivers and related properties: DistributedTrace.enableTracing(myAppName);

    Load SpanReceivers based on Hadoop conf using hadoop.htrace.spanreceiver.classes and related properties: SpanReceiverHost.getInstance(conf);

    Load directly: Trace.addReceiver(spanReceiver);Page 8

  • Hortonworks Inc. 2015

    Creating Spans"Start a new span if tracing

    TraceScope traceScope = Trace.startSpan(operation);Start a new span even if not tracing

    TraceScope traceScope = Trace.startSpan(operation, Sampler.ALWAYS);

    Add annotations Trace.addTimelineAnnotation(...);Trace.addKVAnnotation(...);

    Close a span

    traceScope.close();

    Page 9

  • Hortonworks Inc. 2015

    Sampling"

    Page 10

    Server Process SpanReceiver

    Server Process SpanReceiver

    Client Process SpanReceiver

    Collector Process

    Instrumented Code

    Storage

    Sampling is performed at the root CountSampler, ProbabilitySampler, AlwaysSampler

  • Hortonworks Inc. 2015

    Existing Instrumentation"Accumulo

    Minor Compaction all / 10% Major Compaction all / 10% Garbage Collection 1% Replication 10% Shell commands as requested with trace command

    HDFSRPC DFSInputStream DFSOutputStream DFSClient as governed by hadoop.htrace.sampler and

    additional properties

    Page 11

  • Hortonworks Inc. 2015

    Span Receivers"Accumulo ZooTraceClient (Accumulos default)

    accumulo-site.xml trace.span.receivers = o.a.a.core.trace.ZooTraceClient trace.zookeeper.path = /tracers trace.span.receiver.* (e.g. tracer.queue.size)HDFS core-site.xml hadoop.htrace.spanreceiver.classes hadoop.htrace.tracer.zookeeper.path hadoop.htrace.tracer.zookeeper.host = zkHost:2181 hadoop.htrace.*

    Other Span Receivers ZipkinSpanReceiver, HTracedRESTReceiver, FlumeSpanReceiver, LocalFileSpanReceiver

    Page 12

  • Hortonworks Inc. 2015

    Trace Collector"Accumulo Tracer

    ZooTraceClient SpanReceiver Registers in ZooKeeper Uses Accumulo for storage Basic UI in Accumulo monitor

    Zipkin Collector https://github.com/twitter/zipkin ZipkinSpanReceiver Pluggable storage Search / Graphical UI

    Htraced (near future) HTracedRESTReceiver Uses leveldb for storage Search / Graphical UI

    Page 13

  • Hortonworks Inc. 2015

    Trace Inspection"Accumulo Monitor

    Page 14

  • Hortonworks Inc. 2015

    Trace Inspection"Accumulo Monitor

    Page 15

  • Hortonworks Inc. 2015

    Trace Inspection"Zipkin

    Page 16

  • Hortonworks Inc. 2015

    Trace Inspection"Zipkin

    Page 17

  • Hortonworks Inc. 2015

    Trace Inspection"Zipkin

    Page 18

  • Hortonworks Inc. 2015

    Summary"Accumulo tracing can now collect more detailed information due to HDFS tracing and use of a common instrumentation library with HDFS

    Accumulo and HDFS configure tracing separately, so they must be configured compatibly to take advantage of the new feature

    There are multiple options for collection and visualization systems

    Page 19

  • Hortonworks Inc. 2015"

    Questions?"[email protected]"[email protected]"[email protected]"IRC #accumulo"

    Page 20"