prezo tooracleteam (2)

32
Assorted Topics on Scalable Distributed Services - Sharma Podila Senior Software Engineer

Upload: sharma-podila

Post on 06-May-2015

244 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prezo tooracleteam (2)

Assorted Topics on Scalable Distributed Services

- Sharma PodilaSenior Software Engineer

Page 2: Prezo tooracleteam (2)
Page 3: Prezo tooracleteam (2)
Page 4: Prezo tooracleteam (2)
Page 5: Prezo tooracleteam (2)
Page 6: Prezo tooracleteam (2)
Page 7: Prezo tooracleteam (2)
Page 8: Prezo tooracleteam (2)
Page 9: Prezo tooracleteam (2)

PS: Lots of things can, and will, fail at that scale

Page 10: Prezo tooracleteam (2)

Topics

Managing Service DependenciesAsynchronous Event Processing (RxJava)Async IOMesos - DataCenter OSDeployment and Build Automation

Page 11: Prezo tooracleteam (2)

Managing Service Dependencies

Distributed architectures can have dozens of dependencies

Each can fail independentlyEven 0.01% downtime on each of dozens of services equates to potentially hours a month of downtime if not engineered for resilience

Service A

Service B

Service C

requestDependency

Dependency

Page 12: Prezo tooracleteam (2)

A Healthy Request Flow

Page 13: Prezo tooracleteam (2)

When One Backend Service Becomes Latent

Page 14: Prezo tooracleteam (2)

Threads and other resources can exhaust with high volume

Page 15: Prezo tooracleteam (2)

Hystrix to the Rescue

Wrap calls to external systems in a dependency command object, run in separate threadTimeout calls after time ~ >99.5th of all latenciesControl #threads w/ a pool or semaphoreMeasure success, trip circuit if neededPerform fallback logicMonitor metrics and change in real time

Hystrix Wiki

Page 16: Prezo tooracleteam (2)

Taming Tail Latencies of Service Calls

Real time metrics show problems as they occurTrends help configure timeoutsSet timeouts based on histogram data

99.5th + buffer is a good startTier timeouts for retrials on other servers (e.g., 5, 15, 30)

Page 17: Prezo tooracleteam (2)

Reactive Data/Event Stream Processing

Page 18: Prezo tooracleteam (2)

Processing a Data/Event Stream

Iterator<T> iterator = dataStream.iterator();

while(iterator.hasNext()) { process(iterator.next()); }

What if dataStream represents an unbounded stream?What if data comes over the network? With latencies, failures.What if data comes from multiple sources?How would you manage concurrency? Threads? Semaphores?

RxJava implementation of reactive extensions addresses these questions

“...provides a collection of operators with which you can filter, select, transform, combine, and compose Observables. This allows for efficient execution and composition…”

Page 19: Prezo tooracleteam (2)

RxJavaJava impl for Reactive Extensions

A library for composing asynchronous and event-based programs by using observable sequences

Extends the observer pattern to support sequences of data/events and adds operators that allow you to compose sequences together declaratively while abstracting away concerns about things like low-level threading, synchronization, thread-safety, concurrent data structures, and non-blocking I/O

Event Iterable (pull) Observable (push)

retrieve data T next() onNext(T)

discover error throws Exception onError(Exception)

complete returns onComplete()

Page 20: Prezo tooracleteam (2)

RxJava Operator Examples

Page 21: Prezo tooracleteam (2)

Example Code: Iterable and Observable

getDataFromLocalMemory()

.skip(10)

.take(5)

.map({ s -> return s + " transformed" })

.forEach({ println "next => " + it })

getDataFromNetwork()

.skip(10)

.take(5)

.map({ s -> return s + " transformed" })

.subscribe({ println "onNext => " + it })

Data can be pushed from multiple sourcesNo need to block for result availability

RxJava is a tool to react to push dataJava Futures as an alternative are non-trivial with nested async execution

Page 22: Prezo tooracleteam (2)

Async IO

Page 23: Prezo tooracleteam (2)

Async IO with Netty, RxNetty

Netty is an NIO client server framework(see Java IO Vs. NIO)

Supports non-blocking IOHigh throughput, low latency, less resource consumptionRxNetty is Reactive Extensions adaptor for Netty

When using something like Netty, Total #threads in app = Total #cores in the system

Page 24: Prezo tooracleteam (2)

RxNetty Server Examplepublic static void main(final String[] args) {

final int port = 8080;

RxNetty.createHttpServer(port, new RequestHandler<ByteBuf, ByteBuf>() {

@Override

public Observable<Void> handle(HttpServerRequest<ByteBuf> request, final HttpServerResponse<ByteBuf> response) {

System.out.println("New request recieved");

System.out.println(request.getHttpMethod() + " " + request.getUri() + ' ' + request.getHttpVersion());

for (Map.Entry<String, String> header : request.getHeaders().entries()) {

System.out.println(header.getKey() + ": " + header.getValue());

}

<continued…>

Page 25: Prezo tooracleteam (2)

RxNetty Server Example (Cntd.) return request.getContent().materialize()

.flatMap(new Func1<Notification<ByteBuf>, Observable<Void>>() {

@Override

public Observable<Void> call(Notification<ByteBuf> notification) {

if (notification.isOnCompleted()) {

return response.writeStringAndFlush("Welcome!!!");

} else if (notification.isOnError()) {

return Observable.error(notification.getThrowable());

} else {

ByteBuf next = notification.getValue();

System.out.println(next.toString(Charset.defaultCharset()));

return Observable.empty();

}

}

});

}

}).startAndWait();

}

Page 26: Prezo tooracleteam (2)

Mesos

Page 27: Prezo tooracleteam (2)

Mesos Cluster Manager

Resource allocation across distributed applications (aka Frameworks) on shared pool of nodes.Akin to Google BorgPlugable isolation for CPU, I/O, etc. via Linux CGroups, Docker, etc.Fault tolerant leader election via ZooKeeperUsed at Twitter, AirBnB, etc.

Page 28: Prezo tooracleteam (2)

Mesos Architecture

Page 29: Prezo tooracleteam (2)

Mesos Resource Offers

Page 30: Prezo tooracleteam (2)

Mesos Framework Development

Implement Framework Scheduler and ExecutorScheduler:

resourceOffers(SchedulerDriver driver, java.util.List<Offer> offers)

executorLost(SchedulerDriver driver, ExecutorID executorId, SlaveID slaveId, int status)

statusUpdate(SchedulerDriver driver, TaskStatus status)

… and more

Executor:launchTask(ExecutorDriver driver, TaskInfo task)

killTask(ExecutorDriver driver, TaskID taskId)

… and more

Page 31: Prezo tooracleteam (2)

Mesos Framework Fault Tolerance

Mesos task reconciliationPeriodic heartbeats from executorsState engine

taskStatesStream

.groupBy(JobId)

.flatmap(groupedObservable) {

groupedObservable.takeWhile(state != TerminalState)

.debounce(2000)

.doOnNext(state) { schedule(taskStuckInState(state), stateTimeout); }

}

RxJava Style

Page 32: Prezo tooracleteam (2)