distributed & highly available server applications in java and scala

27
Distributed & highly available server applications in Java and Scala Max Alexejev, Aleksei Kornev JavaOne Moscow 2013 24 April 2013

Upload: max-alexejev

Post on 01-Nov-2014

3.924 views

Category:

Technology


5 download

DESCRIPTION

Presentation me and Alex Kornev gave on JavaOne Moscow 2013.

TRANSCRIPT

Page 1: Distributed & Highly Available server applications in Java and Scala

Distributed & highly available server applications in Java and ScalaMax Alexejev, Aleksei Kornev

JavaOne Moscow 2013

24 April 2013

Page 2: Distributed & Highly Available server applications in Java and Scala

What is talkbits?

Maxim Alexejev
Create "JavaOne Moscow 2013" geo-channel ?
Page 3: Distributed & Highly Available server applications in Java and Scala

Architecture

by Max Alexejev

Page 4: Distributed & Highly Available server applications in Java and Scala

Lightweight SOA

Key principles

•S1, S2 - edge services

•Each service is 0..1 servers and 0..N clients built together

•No special "broker" services

•All services are stateless

•All instances are equal

What about state?

State is kept is specialized distributed systems and fronted by specific services.

Example follows...

Page 5: Distributed & Highly Available server applications in Java and Scala

Case study: Talkbits backend

Recursive call

Page 6: Distributed & Highly Available server applications in Java and Scala

Requirements for a distrubuted RPC system Must have and nice to have

•Elastic and reliable discovery - schould handle nodes brought up and shut down transparently and not be a SPOF itself

•Support for N-N topology of client and server instances

•Disconnect detection and transparent reconnects

•Fault tolerance - for example, by retries to remaining instances where called instance goes down

•Clients backoff built-in - i.e., clients should not overload servers when load spikes - as far as possible

•Configurable load distribution - i.e., which server instance to call for this specific request

•Configurable networking layer - keepalives & heartbeats, timeouts, connection pools etc.)

•Distributed tracing facilities

•Portability among different platforms

•Distributed stack traces for exceptions

•Transactions

Page 7: Distributed & Highly Available server applications in Java and Scala

Key principles to be lightweight and get rid of architectural waste

•Java SE

•No containers. Even servlet containers are light and built-in

•Standalone applications: unified configuration, deployment, metrics, logging, single development framework - more on this later

•All launched istances are equal and process requests - no "special" nodes or "active-standby" patterns

•Minimal dependencies and JAR size

•Minimal memory footprint

•One service - one purpose

•Highly tuned for this one purpose (app, JVM, OS, HW)

•Isolated fault domains - i.e., single datasource or external service is fronted by one service only

No bloatware in technology stack!

"Lean" services

Page 8: Distributed & Highly Available server applications in Java and Scala

Finagle library

(twitter.github.io/finagle) acts as a distributed RPC framework.

Services are written in Java and Scala and use Thrift communication protocol.

Talkbits implementation choices

Apache Zookeeper (zookeeper.apache.org)

Provides reliable service discovery mechanics. Finagle has a nice built-in integration with Zookeeper.

Page 9: Distributed & Highly Available server applications in Java and Scala

Finagle server: networking

Finagle is built on top of Netty - asynchronous, non-blocking TCP server.

Finagle codec

trait Codec[Req, Rep]

class ThriftClientFramedCodec(...) extends Codec[ThriftClientRequest, Array[Byte]] { pipeline.addLast("thriftFrameCodec", new ThriftFrameCodec) pipeline.addLast("byteEncoder", new ThriftClientChannelBufferEncoder) pipeline.addLast("byteDecoder", new ThriftChannelBufferDecoder) ...}

Finagle comes with ready-made codecs for Thrift, HTTP, Memcache, Kestrel, HTTP streaming.

Page 10: Distributed & Highly Available server applications in Java and Scala

Finagle services and filters

// Service is simply a function from request to a future of response.trait Service[Req, Rep] extends (Req => Future[Rep])

// Filter[A, B, C, D] converts a Service[C, D] to a Service[A, B].abstract class Filter[-ReqIn, +RepOut, +ReqOut, -RepIn] extends ((ReqIn, Service[ReqOut, RepIn]) => Future[RepOut])

abstract class SimpleFilter[Req, Rep] extends Filter[Req, Rep, Req, Rep]

// Service transformation exampleval serviceWithTimeout: Service[Req, Rep] =

new RetryFilter[Req, Rep](..) andThen new TimeoutFilter[Req, Rep](..) andThen service

Finagle comes with rate limiting, retries, statistics, tracing, uncaught exceptions handling, timeouts and more.

Page 11: Distributed & Highly Available server applications in Java and Scala

Functional composition

Given Future[A]

Sequential composition

def map[B](f: A => B): Future[B]

def flatMap[B](f: A => Future[B]): Future[B]

def rescue[B >: A](rescueException: PartialFunction[Throwable, Future[B]]): Future[B]

Concurrent composition

def collect[A](fs: Seq[Future[A]]): Future[Seq[A]]

def select[A](fs: Seq[Future[A]]): Future[(Try[A], Seq[Future[A]])]

And more

times(), whileDo() etc.

Page 12: Distributed & Highly Available server applications in Java and Scala

Functional composition on RPC calls

Sequential composition

val nearestChannel: Future[Channel] = metadataClient.getUserById(uuid) flatMap { user => geolocationClient.getNearestChannelId( user.getLocation() ) } flatMap { channelId => metadataClient.getChannelById( channelId ) }

Concurrent composition

val userA: Future[User] = metadataClient.getUserById(“a”)val userB: Future[User] = metadataClient.getUserById(“b”)val userC: Future[User] = metadataClient.getUserById(“c”)

val users = Future.collect(Seq(userA, userB, userC)).get()

*All this stuff works in Java just like in Scala, but does not look as cool.

Page 13: Distributed & Highly Available server applications in Java and Scala

Finagle server: threading model

You should never block worker threads in order to achieve high performance (throughput).

For blocking IO or long compuntations, delegate to FuturePool.

val diskIoFuturePool = FuturePool(Executors.newFixedThreadPool(4))

diskIoFuturePool( { scala.Source.fromFile(..) } )

Boss thread accepts new client connections and binds NIO Channel to a specific worker thread.

Worker threads perform all client IO.

Page 14: Distributed & Highly Available server applications in Java and Scala

More gifts and bonuses from Finagle

In addition to all said before, Finagle has

•Load-distribution in N-N topos - HeapBalancer ("least active connections") by default

•Client backoff strategies - comes with TruncatedBinaryBackoff implementation

•Failure detection

•Failover/Retry

•Connection Pooling

•Distributed Tracing (Zipkin project based on Google Dapper paper)

Page 15: Distributed & Highly Available server applications in Java and Scala

Finagle, Thrift & Java: lessons learnedPros

•Gives a lot out of the box

•Production-proven and stable

•Active development community

•Lots of extension points in the library

Cons

•Good for Scala, usable with Java

•Works well with Thrift and HTTP (plus trivial protocols), but lacks support for Protobuf and other stuff

•Poor exceptions handling experience with Java (no Scala match-es) and ugly code

•finagle-thrift is a pain (old libthrift version lock-in, Cassandra dependencies clash, cannot return nulls, and more). All problems avoidable thought.

•Cluster scatters and never gathers when whole Zookeeper ensemble is down.

Page 16: Distributed & Highly Available server applications in Java and Scala

Finagle: competitors & alternatives

Trending

•Akka 2.0 (Scala, OpenSource) by Typesafe

•ZeroRPC (Python & Node.js, OpenSource) by DotCloud

•RxJava (Java, OpenSource) by Netflix

Old

•JGroups (Java, OpenSource)

•JBOSS Remoting (Java, OpenSource) by JBOSS

•Spread Toolkit (C/C++, Commercial & OpenSource)

Page 17: Distributed & Highly Available server applications in Java and Scala

Configuration, deployment, monitoring and logging

by Aleksei Kornev

Page 18: Distributed & Highly Available server applications in Java and Scala

Get stuff done...

Page 19: Distributed & Highly Available server applications in Java and Scala

Typical application

Page 20: Distributed & Highly Available server applications in Java and Scala

Architecture of talkbits service

One way to configure service, logs, metrics.

One way to package and deploy service.

One way to lunch service.

Bundled in one-jar.

Page 21: Distributed & Highly Available server applications in Java and Scala

One delivery unit. Contains:

Java service

In a single executable fat-jar.

Installation script

[Re]installs service on the machine, registers it in /etc/init.d

Init.d script

Contains instructions to start, stop, restart JVM and get quick status.

Delivery

Page 22: Distributed & Highly Available server applications in Java and Scala

Logging

Confuguration

•SLF4J as an API, all other libraries redirected

•Logback as a logging implementation

•Each service logs to /var/log/talkbits/... (application logs, GC logs)

•Daily rotation policy applied

•Also sent to loggly.com for aggregation, grouping etc.

Aggregation

•loggly.com

•sshfs for analyzing logs by means of linux tools such as grep, tail, less, etc.

Aggregation alternatives

Splunk.com, Flume, Scribe, etc...

Page 23: Distributed & Highly Available server applications in Java and Scala

Metrics

Application metrics and health checks are implemented with CodaHale lib (metrics.codahale.com). Codahale reports metrics via JMX.

Jolokia JVM agent (www.jolokia.org/agent/jvm.html) exposes JMX beans via REST (JSON / HTTP), using JVMs internal HTTP server.

Monitoring agent use jolokia REST interface to fetch metrics and send them to monitoring system.

All metrics are divided into common metrics (HW, JVM, etc) and service-specific metrics.

Page 24: Distributed & Highly Available server applications in Java and Scala

Deployment

Fabric (http://fabfile.org) used for environments provisioning and services deployment.

Process

•Fabric script provisions new env (or uses existing) by cluster scheme

•Amazon instances are automatically tagged with services list (i.e., instance roles)

•Fabric script reads instance roles and deploys (redeploys) appropriate components.

Page 25: Distributed & Highly Available server applications in Java and Scala

MonitoringAs monitoring platform we chose Datadoghq.com. Datadog is a SaaS which is easy to integrate into your infrastucture. Datadog agent is opensourced and implemented in Python. There are many predefined checksets (plugins, or integrations) for popular products out of the box - including JVM, Cassandra, Zookeeper and ElasticSearch.

Datadog provides REST API.

Alternatives

•Nagios, Zabbix - need to have bearded admin in team. We wanted to go SaaS and outsource infrastructure as far as possible.

•Amazon CloudWatch, LogicMonitor, ManageEngine, etc.

Process

Each service has own monitoring agent instance on a single machine. If node has 'monitoring-agent' role in the roles tag of EC2 instance, monitoring agent will be installed for each service on this node.

Page 26: Distributed & Highly Available server applications in Java and Scala

Talkbits cluster structure

Page 27: Distributed & Highly Available server applications in Java and Scala

QA

Max Alexejevhttp://ru.linkedin.com/pub/max-alexejev/51/820/ab9http://www.slideshare.net/MaxAlexejev/[email protected]

Aleksei [email protected]