scylla summit 2017: smf: the fastest rpc in the west

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

smfthe fastest RPC fmwrk. in the west

Principal Engineer, Platform Engineering - Akamai Technologies

Alexander Gallego

Principal Engineer @ Akamai - Platform Group

Ex CTO / founder of Concord.io - A distributed

stream processor written in C++ atop Apache

Mesos (Now part of Akamai)

First employee, engineer for Yieldmo.com (ad-tech)

startup in NYC

maintainer of smf: github.com/senior7515/smf

background

Can we do transactional streaming?

▪ At Concord.io, I worked on a streaming platform

o Can we do transactional writes (3x replication - even if in memory)

• Can we do it with low latency and high throughput?

– double digit ms *tail* latency at 1024 batches?

▪ Fastest open source queue did 150ms p90 and 2secs p9999

o Unpredictable JVM spikes -

• Spark once stalled for 4 seconds reading from Kafka - couldn’t replicate.

o Concord was in C++ - we wanted predictability

Can we do better?

How about this! - per node overhead

7us p90 latency

26us p100 latency8us p99 latency

How about this! - per node overhead

Read Socket RPC Parsing

Method ExecutionFlush Socket

60 byte payload + 20 bytes of TCP - with full type serialization! p90 = 7 microseconds, p99 = 8 microseconds, p100 = 26 microseconds

p99=8us

smf RPC

▪ Built for microsecond tail latencies

▪ Atop seastar::future<>s

▪ IDL Compiler/CodeGen - using Google Flatbuffers’ IDL

▪ Multi-language compatibility

▪ Small - 16 byte overhead (with rich types, headers, compression,etc)

… it’s like gRPC / Thrift / Cap n’ Proto - for microsecond latencies.

namespace smf_gen.demo;

table Request { name: string;}table Response { name: string;}rpc_service SmfStorage { Get(Request):Response;}

output

Service Definition

output

smf_gen --filename demo_service.fbs

Service Definition

output

smf::rpc_client

smf::rpc_typed_envelope<Request> req;req.data->name = "Hello, smf-world!";

auto client = SmfStorageClient::make_shared("127.0.0.1",2121);client->Get(req.serialize_data()).then([ ](auto reply) { std::cout << reply->name() << std::endl;});

data to send

actual socketseastar::shared_ptr<T>non-thread safe

data to send

actual socketseastar::shared_ptr<T>non-thread safe

Method to call

smf::rpc_server

class storage_service : public SmfStorage { virtual seastar::future<rpc_typed_envelope<Response>> Get(rpc_recv_typed_context<Request> rec) final { rpc_typed_envelope<Response> data; data.data->name = "Hello, cruel world!"; data.envelope.set_status(200); return make_ready_future<decltype(data)>(std::move(data)); }};

code-gen’ed service

Method

return data

smf::rpc_filter

template <typename T>

struct rpc_filter {

seastar::future<T> operator()(T t);

struct zstd_compression_filter : rpc_filter<rpc_envelope> {

explicit zstd_compression_filter(uint32_t min_size)

: min_compression_size(min_size) {}

seastar::future<rpc_envelope> operator()(rpc_envelope &&e);

const uint32_t min_compression_size;

// add it to your clients

client->outgoing_filters().push_back(

smf::zstd_compression_filter(1000));

// add it to your servers

using zstd_t = smf::zstd_compression_filter;

return rpc.invoke_on_all(

&smf::rpc_server::register_outgoming_filter<zstd_t>,1000);

static thread_local auto incoming_stage =

seastar::make_execution_stage("smf::incoming",

&rpc_client::apply_incoming_filters);

static thread_local auto outgoing_stage =

seastar::make_execution_stage("smf::outgoing",

&rpc_client::apply_outgoing_filters);

request anatomy

smf request anatomy

/// total = 128bits == 16bytes

MANUALLY_ALIGNED_STRUCT(4) header FLATBUFFERS_FINAL_CLASS {

int8_t compression_;

int8_t bitflags_;

uint16_t session_;

uint32_t size_;

uint32_t checksum_;

uint32_t meta_;

STRUCT_END(header, 16);

smf request anatomy

int8_t bitflags_;

uint16_t session_;

uint32_t size_;

uint32_t checksum_;

uint32_t meta_;

- Turn off padding by compiler.- Enforce layout. - Store everything in little endian- X-lang, X-platform compat- noop on most platforms

smf request anatomy

int8_t bitflags_;

uint16_t session_;

uint32_t size_;

uint32_t checksum_;

uint32_t meta_;

zstd, lz4

smf request anatomy

int8_t bitflags_;

uint16_t session_;

uint32_t size_;

uint32_t checksum_;

uint32_t meta_;

zstd, lz4

headers?

smf request anatomy

int8_t bitflags_;

uint16_t session_;

uint32_t size_;

uint32_t checksum_;

uint32_t meta_;

zstd, lz4

headers?

max # of concurrent requests per client

smf request anatomy

int8_t bitflags_;

uint16_t session_;

uint32_t size_;

uint32_t checksum_;

uint32_t meta_;

zstd, lz4

headers?

xxhash32 - very fast! 5.4GB/s

smf request anatomy

int8_t bitflags_;

uint16_t session_;

uint32_t size_;

uint32_t checksum_;

uint32_t meta_;

zstd, lz4

headers?

xxhash32 - very fast! 5.4GB/s

request_id or status (response) code

smf Code Gen’d XOR id

auto fqn = fully_qualified_name;

service_id = hash( fqn(service_name) )

method_id = hash( ∀ fqn(x) input_args_types,

∀ fqn(x) output_args_types,

fqn(method_name),

separator = “:”)

rpc_dispatch_id = service_id ^ method_id;

/// RequestID: 212494116 ^ 1719559449

/// ServiceID: 212494116

/// MethodID: 1719559449

future<smf::rpc_recv_typed_context<Response>>

Get(smf::rpc_envelope e) {

e.set_request_id(212494116, 1719559449);

return send<smf_gen::demo::Response>(std::move(e));

Method ID

handles.emplace_back(

"Get", 1719559449,

[this](smf::rpc_recv_context c) {

using req_t = smf::rpc_recv_typed_context<Request>;

auto session_id = c.session();

return Get(req_t(std::move(c))).then(

[session_id](auto typed_env){

typed_env....mutate_session(session_id);

return make_ready_future<rpc_envelope>(

typed_env.serialize_data());

Method ID

struct rpc_service {

virtual const char *service_name() const = 0;

virtual uint32_t service_id() const = 0;

virtual std::vector<rpc_service_method_handle> methods() = 0;

virtual ~rpc_service() {}

telemetry

smf built in telemetry

High Dynamic Range Histogram (HDR) … Expensive 185 KB

::hdr_init(1, // 1 microsec - minimum value

INT64_C(3600000000), // 1 hour in microsecs - max value

3, // Number of significant figures

&hist); // Pointer to initialize

// clients

client = ClientService::make_shared(std::move(opts));

client->enable_histogram_metrics();

// servers enabled by default

smf built in telemetry (prometheus)

performance?

smf DPDK client - DPDK server*

7us p90 latency

26us p100 latency8us p99 latency

smf end-to-end latency (DPDK)

p100=500usp90=51usp99=56us

2 Threads. Includes connection open time - cold cache

smf end-to-end latency (DPDK)

Same graph, minus the **first** request of each of the 2 threads

p100=151us

p50=51us p99=56us

smf future work

● Currently could only do 1.5MM qps on the server setup ○ size = 60 byte payload + 20 TCP frame bytes○ Hit TCP.hh bug in seastar with httpd/seawreck and my own impl

■ `(_snd.window > 0) || ((_snd.window == 0) && (len == 1))' failed.

■ Could be my lab setup■ Because of this - couldn't fill the wire fast enough

● Add JVM, Python, Go, codegen

● Improve Docs: https://senior7515.github.io/smf/

THANK YOU

gallego.alexx@gmail.com | alexgallego.org

@emaxerrno

Please stay in touch

Any questions? https://senior7515.github.io/smf/

smf Write Ahead Log (latency)

percentile Apache Kafka smf WAL speedup

p50 878ms 21ms 41X

p95 1340ms 36ms 37x

p99 1814ms 49ms 37x

p999 1896ms 54ms 35x

p100 1930ms 54ms 35x

scylla summit 2017: smf: the fastest rpc in the west

Technology

scylla serrata

scylla summit 2016: why kenshoo is about to displace...

2.imimg.com · smf 56318 smf 57220 smv 50044 application...

scylla summit 2017: from elasticsearch to scylla at zenly

metasploit rpc api guide - ehc group · rpc api 1 rpc api...

polimer smf

matters 1.17 (ger) -...

between scylla and carybdis

smf/clc training course1. 2 smf/clc calibration tool for...

test and high-performance cable assemblies · pdf filetest...

rpc france - home | rpc › - › media › rpc › files...

graph processing with titan and scylla

scylla summit 2017: a toolbox for understanding scylla in...

ukraine: between scylla and charybdis

scylla summit 2016: graph processing with titan and scylla

scylla and charybdis

scylla db@cassandra meetup, tlv, 2015

biologi populasi kepiting bakau scylla serrata - …

scylla summit 2016: scylla at samsung sds

mud crab (scylla serrata)