engagement case study google sdn peering: an early · pdf fileengagement case study murali...
TRANSCRIPT
![Page 1: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/1.jpg)
Google SDN Peering: An Early Engagement Case StudyMurali Suriar, [email protected] behalf of Google Technical Infrastructure and Network Infrastructure SRE
August 30, 2017
![Page 2: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/2.jpg)
Who am I?
● Murali Suriar● Seven years at Google*
○ Network Engineer, Dublin○ SRE, London
■ Initially working on proxies/load balancing■ Currently running SDN control systems
● @msuriar on Github, Twitter, IRC
* = minus a brief stint on a boat
![Page 3: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/3.jpg)
Today's talk
● What is SDN?● A brief history of SDN at Google● An overview of Espresso (SDN internet peering)● SRE early engagement with the Espresso dev team
![Page 4: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/4.jpg)
What is SDN?
![Page 5: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/5.jpg)
Traditional networking
● Common protocols and standards (mostly).● Proprietary/vertically integrated implementations.
![Page 6: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/6.jpg)
An aside - why hardware?
● IP networking all about packets per second (pps).● Weird standards.
![Page 7: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/7.jpg)
Planes of a switch/router
![Page 8: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/8.jpg)
Planes of a switch/router ("swouter")
Control
Management
Forwarding/data
![Page 9: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/9.jpg)
Planes of a switch/router ("swouter")
● Control plane scales with protocol/network complexity.
● Network vendors use long-term supported hardware.
● Long depreciation cycles lead to underpowered control plane.
Control
Management
Forwarding/data
![Page 10: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/10.jpg)
The dream of SDN
● Create standard for programming the forwarding plane.
● Separate control plane from network devices.
ManagementControl
ForwardingForwarding
ForwardingForwarding
![Page 11: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/11.jpg)
Complexities of SDN
● Need a new network to connect control and data plane together.
● Network engineers need to learn about running binaries and managing machines.
● Or sysadmins/SREs need to learn about networking.
![Page 12: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/12.jpg)
New failure modes of SDN
● Less shared fate between control plane and data plane.
● Single controller outage has (potentially) large impact on data plane.
● Increased latency in reacting to some classes of failures.
![Page 13: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/13.jpg)
A brief history of SDN at Google
![Page 14: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/14.jpg)
B4WAN
Interconnect
Andromeda NFV and network
virtualization
JupiterDatacenter Networking
The Pillars of SDN @ Google
![Page 15: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/15.jpg)
B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15]
B4: Google's Software Defined WAN
![Page 16: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/16.jpg)
B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15]
B4: From Copy Network to Business Critical
B4 tr
affic
2012 — 2016
![Page 17: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/17.jpg)
10.1.4/24
VNET: 5.4/16
VNET: 192.168.32/24
VNET: 10.1.1/24 Load Balancing
DoS
ACLs
VPN
NFVInternal Network
Andromeda
ToR
Google Infrastructure Services
10.1.1/24
ToR
10.1.2/24
ToR
10.1.3/24
ToR
![Page 18: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/18.jpg)
Watchtower
Saturn
Firehose 1.1
Google Datacenter Network InnovationAnd hardware scale that we could not buy
18
Time
Capa
city
Firehose 1.0
Jupiter
4 Post
1.3Pb/s clusters in 2013
![Page 19: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/19.jpg)
B4WAN
Interconnect
Andromeda NFV and network
virtualization
JupiterDatacenter Networking
The Pillars of SDN @ Google
PublicInternet?
![Page 20: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/20.jpg)
B4WAN
Interconnect
Andromeda NFV and network
virtualization
JupiterDatacenter Networking
The Pillars of SDN @ Google
Espresso SDN for public
Internet
![Page 21: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/21.jpg)
Enter Espresso
![Page 22: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/22.jpg)
Espresso in Context
B4
Jupiter Data CenterGoogle
![Page 23: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/23.jpg)
Espresso in Context
B4
B2
Peering Metro
Jupiter Data CenterGoogle
![Page 24: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/24.jpg)
Espresso in Context
B4Espresso
B2
Internet
Peering Metro
User
Jupiter Data CenterGoogle
![Page 25: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/25.jpg)
Cloud 1.0Espresso
SDNPeering
RouterCentric
Protocols
Espresso: Before and After
Local viewConnectivity firstCoarse fault recovery
Per-metro and global viewApplication signalsReal-time optimization
![Page 26: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/26.jpg)
Espresso Architecture Overview
Label-switched Fabric
BGP speaker
External Peer
Espresso Metro
Peering Fabric
eBGP Peering
![Page 27: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/27.jpg)
Espresso Architecture Overview
Label-switched Fabric
HostHostHostHostHost
Host
Packet Processor
BGP speaker
External PeereBGP Peering
Espresso Metro
Labeled packets specify egress
HostHostHostHostHost
Peering Fabric
![Page 28: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/28.jpg)
Espresso Architecture Overview
Label-switched Fabric
HostHostHostHostHost
Host
Packet Processor
LocalControl
Global Controller
BGP speaker
External PeereBGP Peering
Espresso Metro
Application Signals
Labeled packets specify egress
HostHostHostHostHost
Peering Fabric
![Page 29: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/29.jpg)
SRE for Espresso
![Page 30: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/30.jpg)
Complexities of Espresso
● Large set of distributed systems.● Many teams, different skill sets.● Massive, top to bottom change.● How do we contain and direct all of this so we make
progress?
![Page 31: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/31.jpg)
Espresso team
● Cross functional team○ Network engineers○ SREs○ Developers○ Testers○ ...
![Page 32: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/32.jpg)
Espresso team
● Responsible for supporting Espresso from inception to production.
● Set up testing infrastructure.● Set up job control, monitoring.● Oncall when Espresso shipped its first bytes.● Eventually spun down and handed off oncall to
permanent teams.
![Page 33: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/33.jpg)
Test/release infrastructure
● Unit tests on everything.● Some software integration tests.● Automated hardware integration tests.● CD pipeline cutting a release every night from latest
green commit and deploying to hardware testbeds.
![Page 34: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/34.jpg)
Production environment
● Reused/adapted standard building blocks.○ Borg○ Chubby○ PrometheusBorgmon
● Had a post lab, prod-parallel testbed which paged Espresso oncall.
![Page 35: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/35.jpg)
"I have a question…"
"Do you know how to let a Borg job SSH into a production machine?"
"Yes. I'm not going to tell you how, though. What are you trying to do?"
(SSH is almost never used for system to system communication at Google; we prefer RPCs.)
![Page 36: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/36.jpg)
"I have a question…"
"I want to save some binary data to disk, then log in, copy it off, and then get it into Dremel."
"So… you want to save some structured (ProtoBuf?) logs into Dremel."
"Yes."
(It turns out Google has an existing toolkit to solve precisely this problem.)
![Page 37: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/37.jpg)
Monitoring/alerting
● Lots of possible points of failure:○ Peering Fabric.○ Packet processing on
hosts.○ Software (Local
controller, BGP speakers).
○ Global control plane.● How to tell what's broken?
![Page 38: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/38.jpg)
Monitoring/alerting
● Lots of possible points of failure○ Peering Fabric.○ Packet processing on
hosts.○ Software (Local
controller, BGP speakers).
○ Global control plane.● How to tell what's broken?
![Page 39: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/39.jpg)
Monitoring/alerting
"Network devices have counters everywhere. If we page on the drop counters, that'll catch all the failures we see with traditional peering devices?"
"Oooor… we could build some blackbox probing infrastructure to catch failures which don't show up in counters?"
![Page 40: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/40.jpg)
Monitoring/alerting
● Built a couple of high signal, symptom based alerts○ Black box prober, doing end to end test of control-
and dataplane.● Used lots of whitebox telemetry to help point to root
cause.○ ALL THE GRAPHS.
![Page 41: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/41.jpg)
Monitoring/alerting
GFE
USPS (ACL)
Monitoring
1. Blackbox realtime monitoring of PF availability + encap + decap + GFE reachability.2. Greybox realtime monitoring of pocket processor ACL: decap + ACL-is-blocking + ACL-is-permitting.3. Passive loss/blackhole monitoring.
PFInternet B2 GFE
Packet processor
Monitoring
![Page 42: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/42.jpg)
Introspection
● Alerting/monitoring tells you something is broken.● How do find out what exactly is causing you to be
paged?
![Page 43: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/43.jpg)
Introspection tools
● Google has standard HTTP endpoints for debugging.○ "Show me the important things about this binary."○ "Packet processor, what do you know about
192.0.2.1?"● Custom traceroute-like tools for debugging dataplane.
![Page 44: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/44.jpg)
What broke?
● Most common failure mode: control plane breakage.● Example: Local controller OOM on new version.
○ No traffic impact. (Fail static.)○ Caught in first production canary.○ Added regression test.
![Page 45: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/45.jpg)
What broke?
● SDN management.● Example: accidentally disabled non-SSH access to
Peering Fabrics.○ No traffic impact. (Fail static)○ Used SSH access to restore SDN management.○ Added more conservative canarying for device
management changes.
![Page 46: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/46.jpg)
Comprehensibility
● Complex system needed an architecture diagram.● Espresso architecture doc has:
○ All components.○ What talked to what.○ Links to individual design docs.○ (Later) Who was oncall for what.
![Page 47: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/47.jpg)
Oncall
● Everyone in Espresso team in the oncall rotation:○ SREs.○ Developers.○ Network engineers.
● Some people never oncall before.● Some people already oncall for other stuff.● Needed to account for all of this in oncall practices.
![Page 48: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/48.jpg)
Oncall
● Initially Espresso team oncall for all Espresso deployments.
● Then only for a couple of sites where we were testing new features.
● Eventually spun down and handed off to many existing teams.
![Page 49: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/49.jpg)
Summary
![Page 50: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/50.jpg)
What did early engagement get us?
● Dev familiarity with production.○ When you're paged by a bug, you fix it faster.
● Broad knowledge across lots of disciplines.● Significant design changes:
○ Reusing more production infrastructure.○ Symptom based monitoring.
![Page 51: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/51.jpg)
Lessons learned
● Design for testability.● Reuse whatever you can.● System architecture diagrams are great.● Focus on a few, high signal, symptom based alerts.● Lots of white box telemetry to aid with root causing.
![Page 52: Engagement Case Study Google SDN Peering: An Early · PDF fileEngagement Case Study Murali Suriar, msuriar@google.com ... BGP speaker eBGP Peering External Peer Espresso Metro Labeled](https://reader030.vdocuments.net/reader030/viewer/2022020411/5a9ece6b7f8b9a8e178be208/html5/thumbnails/52.jpg)
Thank You!Thank You!