microservices tracing with spring cloud and zipkin @szczecin jug

1© 2017 Pivotal

Implementing Microservices Tracing with Spring Cloud and Zipkin

Marcin Grzejszczak, @mgrzejszczak

2

background image: 960x540 pixels - send to back of slide and set to 80% transparency

Spring Cloud developer at PivotalWorking mostly on

● Spring Cloud Sleuth ● Spring Cloud Contract● Spring Cloud Pipelines

About me

Twitter: @mgrzejszczakBlog: http://toomuchcoding.com

3

background image: 960x540 pixels - send to back of slide and set to 80% transparency

What is distributed tracing?

How to correlate logs with Spring Cloud Sleuth?

How to visualize latency with Spring Cloud Sleuth and Zipkin?

Agenda

5

An ordinary system...

6

UI calls backend

UI -> BACKEND

7

CLICK 200

Everything is awesome

8

CLICK 500

Until it’s not

10

https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1

Time to debug

11

It doesn’t look like this

12

More like this

13

On which server / instancewas the exception thrown?

14

SSH and grep for ERROR to find it?

15

How to find all logs from all servers that correspond to that business action?

16

The answer: distributed tracing

• Span

• Trace

• Baggage

• Logs (annotations)

• Tags (binary annotations)

17

The answer: distributed tracing

• Span

• Trace

• Baggage



18

Span

The basic unit of work (e.g. sending RPC)

• Spans are started and stopped

• They keep track of their timing information

• Once you create a span, you must stop it at some point in the future

• Has a parent and can have multiple children

• All spans have unique span ids

• Spans in a single hierarchy share a trace id

19

Trace

A set of spans forming a tree-like structure.

• For example, if you are running a bookstore then

• Trace could be retrieving a list of available books

• Assuming that to retrieve the books you have to send 3 requests to 3

services then you could have at least 3 spans (1 for each hop) forming 1

trace

20

Baggage (from Sleuth 1.2.0)

Key value pairs that get propagated between network boundaries

• once set is accessible in every application for the duration of trace

• works for HTTP and messaging based communication

• WARNING: if size of the baggage is too large then your latency can get greater

SERVICE 1

REQUEST

No Trace IdNo Span Id

RESPONSE

SERVICE 2

SERVICE 3

Trace Id = XSpan Id = A

Trace Id = XSpan Id = A

REQUEST

RESPONSE

Trace Id = XSpan Id = BClient Send

Trace Id = XSpan Id = B

Client Received

Trace Id = XSpan Id = B

Server Received

Trace Id = XSpan Id = C

Trace Id = XSpan Id = BServer Sent

REQUEST

RESPONSE

Trace Id = XSpan Id = DClient Send

Trace Id = XSpan Id = D

Client Received

Trace Id = XSpan Id = D

Server Received

Trace Id = XSpan Id = E

Trace Id = XSpan Id = DServer Sent

Trace Id = XSpan Id = E

SERVICE 4

REQUEST

RESPONSE

Trace Id = XSpan Id = FClient Send

Trace Id = XSpan Id = F

Client Received

Trace Id = XSpan Id = F

Server Received

Trace Id = XSpan Id = G

Trace Id = XSpan Id = FServer Sent

Trace Id = XSpan Id = G

Trace Id = XSpan Id = C

22

Span Id = AParent Id = null

Span Id = BParent Id = A

Span Id = CParent Id = B

Span Id = DParent Id = C

Span Id = EParent Id = D

Span Id = FParent Id = C

Span Id = GParent Id = F

23

Is it that simple?

24

Is context propagation simple?

How do you pass tracing information (incl. Trace ID)

between:

• different libraries?

• thread pools?

• asynchronous communication?

• …?

25

Log correlation with Spring Cloud Sleuth

We take care of passing tracing information between threads / libraries / contexts for:

• Hystrix

• RxJava

• Rest Template

• Feign

• Messaging with Spring Integration

• Zuul

• ...

If you don’t do anything unexpected there’s nothing you need to do to make Sleuth work. Check the docs for more info.

26

Spring Cloud Sleuth logging format

We set a logging format for you...

27

Now let’s aggregate the logs!

Instead of SSHing to the machines to grep logs lets aggregate them!

• With Cloud Foundry’s (CF) Loggregator the logs from different instances

are streamed into a single place

• You can harvest your logs with Logstash Forwarder / FileBeat

• You can use ELK stack to stream and visualize the logs

28

<dependencyManagement>

<dependencies>

<dependency>

<groupId>org.springframework.cloud</groupId>

<artifactId>spring-cloud-dependencies</artifactId>

<version>Camden.SR6</version>

<type>pom</type>

<scope>import</scope>

</dependency>

</dependencies>

</dependencyManagement>

<dependency>


<artifactId>spring-cloud-starter-sleuth</artifactId>

</dependency>

Spring Cloud Sleuth with Maven

29

SERVICE 1/start

REQUEST

RESPONSE

SERVICE 2

SERVICE 3

REQUEST

RESPONSE

REQUEST

RESPONSE

SERVICE 4

REQUEST

RESPONSE

“Hello from service3”

“Hello from service4”

“Hello from service2, response from service3 [Hello from service3] and from service4 [Hello from service4]”

30

SERVICE 1/readtimeout

REQUEST

BOOM!

SERVICE 2

REQUEST

BOOM!

REQUEST

BOOM!

31

Log correlation with Spring Cloud SleuthDEMO

35

Great! We’ve found the exception!But meanwhile....

36

CLICK 200

The system is slow...

37

One of the services is slow...

38

Which one?How to measure that?

39

● Client Send (CS) - The client has made a request

● Server Received (SR) - The server side got the request and will start processing

● Server Send (SS) - Annotated upon completion of request processing

● Client Received (CR) - The client has successfully received the response from the server side

Let’s log events!

40

CS 0 ms SR 100 ms

SS 300 msCR 450 ms

41

● The request started at T=0ms

● It took 450 ms for the client to receive a response

● Server side received the request at T=100 ms

● The request got processed on the server side in 200 ms

ConclusionsCS 0 ms SR 100 ms

SS 300 msCR 450 ms

42

Why is there a delay between sending and receiving messages?!!11!one!?!1!

ConclusionsCS 0 ms SR 100 ms

SS 300 msCR 450 ms

43

https://blogs.oracle.com/jag/resource/Fallacies.html

44

Distributed tracing - terminology

• Span

• Trace

• Baggage



45

LogsRepresents an event in time associated with a span

● Every span has zero or more logs

● Each log is a timestamped event name

● Event should be the stable name of some notable moment in the lifetime of a span

○ For instance, a span representing a browser page load might add an event for each of the Performance.timing moments (check https://developer.mozilla.org/en-US/docs/Web/API/PerformanceTiming)

https://developer.mozilla.org/en-US/docs/Web/API/PerformanceTiming

https://developer.mozilla.org/en-US/docs/Web/API/PerformanceTiming

47

Main logs

● Client Send (CS)○ The client has made a request - the span was started

● Server Received (SR)○ The server side got the request and will start processing it

○ SR timestamp - CS timestamp = NETWORK LATENCY

CS 0 ms SR 100 ms

48

Main logs

● Server Send (SS)○ Annotated upon completion of request processing

○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME

● Client Received (CR)○ The client has successfully received the response from the server side

○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE

○ CR timestamp - SS timestamp = NETWORK LATENCY

CS 0 ms SR 100 ms

SS 300 msCR 450 ms

49

Key-value pair

● Every span may have zero or more key/value Tags

● They do not have timestamps and simply annotate the spans.

● Example of default tags in Sleuth○ message/payload-size○ http.method○ commandKey for Hystrix

Tag

50

How to visualise latency in a distributed system?

51

The answer is:ZIPKIN

52

SPANS SENT TO COLLECTORS

SPANS SENT TO COLLECTORS

STORE IN DB

APP

APP

UI QUERIES FOR TRACE INFO VIA API

How does Zipkin work?

53

How does Zipkin look like?

54

Spring Cloud Sleuth and Zipkin integration

● We take care of passing tracing information between threads / libraries / contexts

● Upon closing of a Span we will send it to Zipkin

○ either via HTTP (spring-cloud-sleuth-zipkin)

○ or via Spring Cloud Stream (spring-cloud-sleuth-stream)

● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring-cloud-sleuth-zipkin-stream)

○ you can add the dependency to Zipkin UI!

55

Spring Cloud Sleuth Zipkin with Maven

<dependencyManagement>

<dependencies>

<dependency>


<artifactId>spring-cloud-dependencies</artifactId>

<version>Camden.SR6</version>

<type>pom</type>

<scope>import</scope>

</dependency>

</dependencies>

</dependencyManagement>

<dependency>


<artifactId>spring-cloud-starter-zipkin</artifactId>

</dependency>

56

Hold it!

If I have billion services that emit gazillion spans - won’t I kill Zipkin?

57

Sampling to the rescue!

● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin

● You can change that by changing the property

spring.sleuth.sampler.percentage (for 100% pass 1.0)

● Or register a custom org.springframework.cloud.sleuth.Sampler

implementation

58

SERVICE 1/start

REQUEST

RESPONSE

SERVICE 2/foo

SERVICE 3/bar

REQUEST

RESPONSE

REQUEST

RESPONSE

SERVICE 4/baz

REQUEST

RESPONSE

SZCZECINSERVICE/szczecin

REQUEST

RESPONSE

59

DEMO

60

Traced call

61

Traced call

1st request Service1 calling Service2

Service2 calling Service3

Service2 calling Service4

62

Traced callRPC call

Tags

Events

63

Traced call - error

click!

64

Traced call - error

65

Baggage

Setting a baggage item

Retrieving a baggage item

66

Manipulating spans via annotations (from Sleuth 1.2.0)

New span

Continue span

67

● Log correlation allows you to match logs for a given trace

● Distributed tracing allows you to quickly see latency issues in your system

● Zipkin is a great tool to visualize the latency graph and system

dependencies

● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation

● With 1.2.0 you’ll be able to propagate any information via baggage

● With 1.2.0 you’ll be able to use annotations to create / continue spans

and add logs and tags

Summary

68

● A test app for Spring Cloud end to end tests

● Source code:

https://github.com/spring-cloud-samples/brewery

● Around 10 applications involved

● Zipkin deployed to PCF for Brewery Sample app:

http://docsbrewing-zipkin-server.cfapps.io

Zipkin for Brewery






69

Zipkin for Brewery

70

Zipkin for Brewery

72

▪ Code for this presentation

(clone and run getReadyForConference.sh - NOTE: you need Vagrant!) :

https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation

▪ Sleuth samples: https://github.com/spring-cloud-samples/sleuth-documentation-apps

▪ Sleuth’s documentation: http://cloud.spring.io/spring-cloud-sleuth/

▪ Repo with Spring Boot Zipkin server: https://github.com/openzipkin/zipkin-java

▪ Zipkin deployed to PCF for Brewery Sample app:


▪ Pivotal Web Services trial : https://run.pivotal.io/

▪ PCF on your laptop : https://docs.pivotal.io/pcf-dev/

Links



https://github.com/spring-cloud-samples/sleuth-documentation-apps

http://cloud.spring.io/spring-cloud-sleuth/

https://github.com/openzipkin/zipkin-java



https://run.pivotal.io/

https://docs.pivotal.io/pcf-dev/

73

Learn More. Stay Connected.

▪ Read the docs

▪ Check the samples

▪ Talk to us on Gitter

Twitter: twitter.com/springcentral

YouTube: spring.io/video

LinkedIn: spring.io/linkedin

Google Plus: spring.io/gplus





https://gitter.im/spring-cloud/spring-cloud-sleuth

https://gitter.im/spring-cloud/spring-cloud-sleuth

74

mgrzejszczak

microservices tracing with spring cloud and zipkin @szczecin jug

Technology