siphon - near real time databus using kafka, eric boyd, nitin kumar
Post on 16-Apr-2017
1.871 Views
Preview:
TRANSCRIPT
Thursday, April 14, 2016
Siphon – Near Real Time Databus Using KafkaEric Boyd – CVP Engineering – Microsoft
Nitin Kumar – Principal Eng Manager - Microsoft
Linux is a
cancer
Thursday, April 14, 2016
Ads Oslo Schedule
Ads Oslo Feature List
Bing Ads Execution
• Shipped once every 6 months
• Averaged 3 marketplace experiments per month
• Big bets on marketplace features that didn’t work.
• Focused teams on 6 tracks with
independent metrics.
• Pushed teams to ship as quickly as they
could, focusing only on moving their
metric.
• Built/borrowed infrastructure to enable
much more rapid experimentation.
• Over 3 years got to a rate of >1000
experiments a month
Profitability!!
Eric joinsMSFT
What drove the turnaround?
• Focus on small teams with clear metrics each team was driving.
• Pushing each team to experiment and iterate as fast as possible. Data alone determines what gets shipped.
• Iterated on key metrics until we found the ones with the most impact.
• Commitment that we would get 1.5-2% better each month, and ship a package of experimentally tested improvements each month.
Relationship with Open Source
• From “Linux is a cancer…”
• To contributing to open source • Storm with C# - SCP.NET (http://www.nuget.org/packages/Microsoft.SCP.Net.SDK/)
• Spark with C# - Mobius (https://github.com/Microsoft/Mobius)
• Kafka with C# - C# Client for Kafka (https://github.com/Microsoft/Kafkanet)
• BOND (https://github.com/Microsoft/bond)
• Across MSFT• C#• VSCode• Hyper-V drivers for Linux• https://github.com/Microsoft/ with 18 pages of repositories!
Microsoft Big Data History
• Massive batch oriented systems• Hundreds of thousands of machines• Exabytes of storage• SQL-like language with C# extensions
Moving to streaming
Data Bus
Devices Services
Streaming Processing
BatchProcessing
Applications
Scalable pub/sub for NRT data streams
Interactive analytics
Vision
• A Databus for all Near Real Time (NRT) data in an organization.
• Quick and Easy Publication, Discovery and Subscription of NRT dataset.
• Compatibility with various Stream Processing systems like Storm, Spark, Splunk.
Siphon Adoption
15 months since launch
Excel Word Outlook
Windows 10
Usage
Bing Ads Campaign perf
Bing Live site telemetryCortana
Office 365
0
10
20
30
40
50
60
70
80
Thro
ugh
pu
t (i
n G
Bp
s)
Siphon Data Volume (Ingress and Egress)
Volume published (GBps) Volume subscribed (GBps) Total Volume (GBps)
0
2
4
6
8
10
12
14
16
18
Thro
ugh
pu
t (e
ven
ts p
er s
ec)
Mill
ion
s
Siphon Events per second (Ingress and Egress)
EPS In Eps Out Total EPS
1.3 millionEVENTS PER SECOND INGRESS AT PEAK
~1 trillionEVENTS PER DAY PROCESSED AT PEAK
3.5 petabytesPROCESSED PER DAY
100 thousandUNIQUE DEVICES AND MACHINES
1,300PRODUCTION KAFKA BROKERS
Scale: Kafka at Microsoft (Ads, Bing, Office)
Kafka Brokers 1300+ across 5 Datacenters
Operating System Windows Server 2012 R2
Hardware Spec 12 Cores, 32 GB RAM, 4x2 TB HDD (JBOD), 10 GB Network
Incoming Events 1.3 million per sec, (112 Billion per day, 500 TB per day)
Outgoing Events 5 million per sec, (~1 Trillion per day, 3.5 PB per day)
Kafka Topics/Partitions 50+/5000+
Kafka version 0.8.1.1 (3 way replication)
Siphon Architecture
Asia DC
Zookeeper Canary
Kafka
Collector
Agent
Services Data Pull (Agent)
Services Data Push
Device Proxy Services
Consumer API (Push/
Pull)
Europe DC
Zookeeper Canary
Kafka
US DC
Zookeeper Canary
Kafka
Streaming
Batch
Audit Trail
Open Source
Microsoft Internal
Siphon
Multiple sources and schemas
Siphon Bond
Schema
Part
A Main Header
MessageId
AuditId
TimeStamp
Part
B Extended HeaderKey-Value[]
Part
C Payload
CSV
XML
JSONJSON
XML
CSVSiphon Bond
Schema
Bond (https://github.com/Microsoft/bond) Cross platform framework for working with schematized data. Cross language (de) serialization. Similar to Protobuf, Thrift and AVRO.
Collector – Data Ingestion (Producer)
• Http(s) Server • Restful API with SSL support.• Abstraction from Kafka
internals (Partition, Kafka version)• Throttling, QPS Monitoring• PII scrubbing• Load balancing/failover to multiple DCs• Supported for both Windows and Linux
servers.
Device Proxy Services
Collector
Kafka Brokers
Broker
Broker
Broker
Broker
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
Collector
Collector
Load
Bal
ance
r
Services Data Push
Agent
Services Data Pull (Agent)
Open Source
Microsoft Internal
Siphon
URL : http://localhost/produce/<version>?topic=<toipic>Method : POST
Pull & Push Consumers
Virtual Network A
HLC
Pull
Kafka Brokers
Broker
Broker
Broker
Broker
P0
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P1Collector
Collector
RES
T A
PI
Virtual Network B
Pull• RESTful API with SSL support• Works for out of network consumers• Supports metadata and data operation• Implement Simple consumer APIs• Spark streaming receiver for Kafka REST
Push• Configurable push to destinations like HDFS,
Cosmos, Kafka.• Utilizes KafkaNet - .NET High Level Consumer
(https://github.com/Microsoft/Kafkanet)
High Level Consumer
Monitoring using Canary
Device Proxy Services
Collector
Kafka Brokers
Broker
Broker
Broker
Broker
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
Collector
Collector
Load
Bal
ance
rServices Data Push
Agent
Services Data Pull (Agent)
Synthetic message
Audit Trail
Canary - https://github.com/Microsoft/Availability-Monitor-for-Kafka
High Level Consumer
Device Proxy Services
Collector
Kafka Brokers
Broker
Broker
Broker
Broker
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
Collector
Collector
Load
Bal
ance
rServices Data Push
Agent
Services Data Pull (Agent)
Audit Trail
Sampled vs Full Auditing support
Data completeness – Audit Trail
Production Experience – Telemetry Charts
• Monitoring using ELK• E2E Latency
• Data Completeness
• Processing Lag
• EPS breakdown by data center.
Key Takeaways
• Scale out with Kafka (50K -> 1M -> multi-million Events Per sec)
• Ability to build tunable Auditing/Monitoring
• Producer/Consumer Restful API provides a nice abstraction
• Config driven Pub/Sub system
top related