microservices tracing with spring cloud and zipkin @szczecin jug
TRANSCRIPT
1© 2017 Pivotal
Implementing Microservices Tracing with Spring Cloud and Zipkin
Marcin Grzejszczak, @mgrzejszczak
2
background image: 960x540 pixels - send to back of slide and set to 80% transparency
Spring Cloud developer at PivotalWorking mostly on
● Spring Cloud Sleuth ● Spring Cloud Contract● Spring Cloud Pipelines
About me
Twitter: @mgrzejszczakBlog: http://toomuchcoding.com
3
background image: 960x540 pixels - send to back of slide and set to 80% transparency
What is distributed tracing?
How to correlate logs with Spring Cloud Sleuth?
How to visualize latency with Spring Cloud Sleuth and Zipkin?
Agenda
10
https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
Time to debug
16
The answer: distributed tracing
• Span
• Trace
• Baggage
• Logs (annotations)
• Tags (binary annotations)
17
The answer: distributed tracing
• Span
• Trace
• Baggage
• Logs (annotations)
• Tags (binary annotations)
18
Span
The basic unit of work (e.g. sending RPC)
• Spans are started and stopped
• They keep track of their timing information
• Once you create a span, you must stop it at some point in the future
• Has a parent and can have multiple children
• All spans have unique span ids
• Spans in a single hierarchy share a trace id
19
Trace
A set of spans forming a tree-like structure.
• For example, if you are running a bookstore then
• Trace could be retrieving a list of available books
• Assuming that to retrieve the books you have to send 3 requests to 3
services then you could have at least 3 spans (1 for each hop) forming 1
trace
20
Baggage (from Sleuth 1.2.0)
Key value pairs that get propagated between network boundaries
• once set is accessible in every application for the duration of trace
• works for HTTP and messaging based communication
• WARNING: if size of the baggage is too large then your latency can get greater
SERVICE 1
REQUEST
No Trace IdNo Span Id
RESPONSE
SERVICE 2
SERVICE 3
Trace Id = XSpan Id = A
Trace Id = XSpan Id = A
REQUEST
RESPONSE
Trace Id = XSpan Id = BClient Send
Trace Id = XSpan Id = B
Client Received
Trace Id = XSpan Id = B
Server Received
Trace Id = XSpan Id = C
Trace Id = XSpan Id = BServer Sent
REQUEST
RESPONSE
Trace Id = XSpan Id = DClient Send
Trace Id = XSpan Id = D
Client Received
Trace Id = XSpan Id = D
Server Received
Trace Id = XSpan Id = E
Trace Id = XSpan Id = DServer Sent
Trace Id = XSpan Id = E
SERVICE 4
REQUEST
RESPONSE
Trace Id = XSpan Id = FClient Send
Trace Id = XSpan Id = F
Client Received
Trace Id = XSpan Id = F
Server Received
Trace Id = XSpan Id = G
Trace Id = XSpan Id = FServer Sent
Trace Id = XSpan Id = G
Trace Id = XSpan Id = C
22
Span Id = AParent Id = null
Span Id = BParent Id = A
Span Id = CParent Id = B
Span Id = DParent Id = C
Span Id = EParent Id = D
Span Id = FParent Id = C
Span Id = GParent Id = F
24
Is context propagation simple?
How do you pass tracing information (incl. Trace ID)
between:
• different libraries?
• thread pools?
• asynchronous communication?
• …?
25
Log correlation with Spring Cloud Sleuth
We take care of passing tracing information between threads / libraries / contexts for:
• Hystrix
• RxJava
• Rest Template
• Feign
• Messaging with Spring Integration
• Zuul
• ...
If you don’t do anything unexpected there’s nothing you need to do to make Sleuth work. Check the docs for more info.
27
Now let’s aggregate the logs!
Instead of SSHing to the machines to grep logs lets aggregate them!
• With Cloud Foundry’s (CF) Loggregator the logs from different instances
are streamed into a single place
• You can harvest your logs with Logstash Forwarder / FileBeat
• You can use ELK stack to stream and visualize the logs
28
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Camden.SR6</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
Spring Cloud Sleuth with Maven
29
SERVICE 1/start
REQUEST
RESPONSE
SERVICE 2
SERVICE 3
REQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
REQUEST
RESPONSE
“Hello from service3”
“Hello from service4”
“Hello from service2, response from service3 [Hello from service3] and from service4 [Hello from service4]”
39
● Client Send (CS) - The client has made a request
● Server Received (SR) - The server side got the request and will start processing
● Server Send (SS) - Annotated upon completion of request processing
● Client Received (CR) - The client has successfully received the response from the server side
Let’s log events!
41
● The request started at T=0ms
● It took 450 ms for the client to receive a response
● Server side received the request at T=100 ms
● The request got processed on the server side in 200 ms
ConclusionsCS 0 ms SR 100 ms
SS 300 msCR 450 ms
42
Why is there a delay between sending and receiving messages?!!11!one!?!1!
ConclusionsCS 0 ms SR 100 ms
SS 300 msCR 450 ms
44
Distributed tracing - terminology
• Span
• Trace
• Baggage
• Logs (annotations)
• Tags (binary annotations)
45
LogsRepresents an event in time associated with a span
● Every span has zero or more logs
● Each log is a timestamped event name
● Event should be the stable name of some notable moment in the lifetime of a span
○ For instance, a span representing a browser page load might add an event for each of the Performance.timing moments (check https://developer.mozilla.org/en-US/docs/Web/API/PerformanceTiming)
47
Main logs
● Client Send (CS)○ The client has made a request - the span was started
● Server Received (SR)○ The server side got the request and will start processing it
○ SR timestamp - CS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
48
Main logs
● Server Send (SS)○ Annotated upon completion of request processing
○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME
● Client Received (CR)○ The client has successfully received the response from the server side
○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE
○ CR timestamp - SS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
49
Key-value pair
● Every span may have zero or more key/value Tags
● They do not have timestamps and simply annotate the spans.
● Example of default tags in Sleuth○ message/payload-size○ http.method○ commandKey for Hystrix
Tag
52
SPANS SENT TO COLLECTORS
SPANS SENT TO COLLECTORS
STORE IN DB
APP
APP
UI QUERIES FOR TRACE INFO VIA API
How does Zipkin work?
54
Spring Cloud Sleuth and Zipkin integration
● We take care of passing tracing information between threads / libraries / contexts
● Upon closing of a Span we will send it to Zipkin
○ either via HTTP (spring-cloud-sleuth-zipkin)
○ or via Spring Cloud Stream (spring-cloud-sleuth-stream)
● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring-cloud-sleuth-zipkin-stream)
○ you can add the dependency to Zipkin UI!
55
Spring Cloud Sleuth Zipkin with Maven
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Camden.SR6</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
57
Sampling to the rescue!
● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin
● You can change that by changing the property
spring.sleuth.sampler.percentage (for 100% pass 1.0)
● Or register a custom org.springframework.cloud.sleuth.Sampler
implementation
58
SERVICE 1/start
REQUEST
RESPONSE
SERVICE 2/foo
SERVICE 3/bar
REQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4/baz
REQUEST
RESPONSE
SZCZECINSERVICE/szczecin
REQUEST
RESPONSE
61
Traced call
1st request Service1 calling Service2
Service2 calling Service3
Service2 calling Service4
67
● Log correlation allows you to match logs for a given trace
● Distributed tracing allows you to quickly see latency issues in your system
● Zipkin is a great tool to visualize the latency graph and system
dependencies
● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
● With 1.2.0 you’ll be able to propagate any information via baggage
● With 1.2.0 you’ll be able to use annotations to create / continue spans
and add logs and tags
Summary
68
● A test app for Spring Cloud end to end tests
● Source code:
https://github.com/spring-cloud-samples/brewery
● Around 10 applications involved
● Zipkin deployed to PCF for Brewery Sample app:
http://docsbrewing-zipkin-server.cfapps.io
Zipkin for Brewery
72
▪ Code for this presentation
(clone and run getReadyForConference.sh - NOTE: you need Vagrant!) :
https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation
▪ Sleuth samples: https://github.com/spring-cloud-samples/sleuth-documentation-apps
▪ Sleuth’s documentation: http://cloud.spring.io/spring-cloud-sleuth/
▪ Repo with Spring Boot Zipkin server: https://github.com/openzipkin/zipkin-java
▪ Zipkin deployed to PCF for Brewery Sample app:
http://docsbrewing-zipkin-server.cfapps.io
▪ Pivotal Web Services trial : https://run.pivotal.io/
▪ PCF on your laptop : https://docs.pivotal.io/pcf-dev/
Links
73
Learn More. Stay Connected.
▪ Read the docs
▪ Check the samples
▪ Talk to us on Gitter
Twitter: twitter.com/springcentral
YouTube: spring.io/video
LinkedIn: spring.io/linkedin
Google Plus: spring.io/gplus