geecon microservices 2015 scaling micro services at gilt
TRANSCRIPT
![Page 1: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/1.jpg)
scaling μ-services at Gilt [email protected]
Sopot, Poland11th September 2015
Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman
@gilttech
![Page 2: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/2.jpg)
why was I late today?
and…
were micro-services to blame?
![Page 3: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/3.jpg)
svc-localised-string
mongodb
login-reg mosaic product listing
product search
product search
A localisation file was loadedwith an character encoding
The driver spun on CPU, consuming CPU credits
The service starved and fell over.
Core parts of the site were broken
![Page 4: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/4.jpg)
so…
… how did I really feel about micro-services yesterday?
![Page 5: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/5.jpg)
gilt: luxury designer brands at discounted prices
![Page 6: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/6.jpg)
we shoot the product in our studios
![Page 7: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/7.jpg)
we receive, store, pick, pack and ship...
![Page 8: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/8.jpg)
we sell every day at noon
![Page 9: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/9.jpg)
stampede...
![Page 10: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/10.jpg)
this is what the stampede really looks like...
![Page 11: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/11.jpg)
![Page 12: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/12.jpg)
rails to riches: 2007 - ruby-on-rails monolith
![Page 13: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/13.jpg)
2011: java, loosely-typed, monolithic services
Hidden linkages; buried business logic
Monolithic Java App; huge bottleneck for innovation.
lots of duplicated code :(
teams focused on business lines
Large loosely-typed JSON/HTTP services
![Page 14: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/14.jpg)
enter: µ-services
“How can we arrange our teams around strategic initiatives? How can we make it fast and easy to get to change to production?”
![Page 15: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/15.jpg)
2015: micro-services
![Page 16: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/16.jpg)
driving forces behind gilt’s emergent architecture
● team autonomy● voluntary adoption (tools, techniques,
processes)● kpi or goal-driven initiatives● failing fast and openly● open and honest, even when it’s difficult
![Page 17: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/17.jpg)
service growth over time: point of inflexion === scala.
![Page 18: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/18.jpg)
what are all these services doing?
![Page 19: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/19.jpg)
anatomy of a gilt service
![Page 20: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/20.jpg)
anatomy of a gilt service - typical choices
gilt-service-framework,
log4j, cloudwatch Cave,
, java, javascript
or
![Page 21: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/21.jpg)
lines of code per service
![Page 22: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/22.jpg)
# source files per service
![Page 23: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/23.jpg)
service discovery: straight forward
zookeeper
Brocade Traffic Manager (aka Zeus, Stringray, SteelApp,...)
![Page 24: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/24.jpg)
from bare-metal...
PHXIAD
![Page 25: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/25.jpg)
… to vapour.
![Page 26: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/26.jpg)
single tenant deployment: one AMI per service instance
![Page 27: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/27.jpg)
reproducible, immutable deployments: docker
![Page 28: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/28.jpg)
service discovery: new services use ELB
zookeeper
Amazon ELB
![Page 29: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/29.jpg)
# running AMIs per service
![Page 30: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/30.jpg)
lift’n’shift + elastic teams
Existing Data Centre
dual 10Gb direct connect line, 2ms latency
![Page 31: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/31.jpg)
AWS instance sizing
![Page 32: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/32.jpg)
evolution of architecture and tech organisation
![Page 33: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/33.jpg)
Lessen dependencies between teams: faster code-to-prod
Lots of initiatives in parallel
Your favourite <tech/language/framework> here
We (heart) μ-servicesGraceful degradation of service
Disposable Code: easy to innovate, easy to fail and move on.
![Page 34: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/34.jpg)
We (heart) cloudDo devops in a meaningful way.Low barrier of entry for new tech (dynamoDB, Kinesis, ...)Isolation
Cost visibilitySecurity tools (IAM)Well documentedResilience is easyHybrid is easyPerformance is great
![Page 35: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/35.jpg)
seven μ-service challenges (& some solutions) no one ever said this was gonna be easy
![Page 36: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/36.jpg)
1. staging vs test-in-prodWe find it hard to maintain staging environments across multiple teams with lots of services.
● We think TiP is the way to go: invest in automation, use dark canaries in prod.
● However, some teams have found TiP counter-productive, and use minimal staging environments.
![Page 37: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/37.jpg)
2. ownershipWho ‘owns’ that service? What happens if that person decides to work on something else?
We have chosen for teams and departments to own and maintain their services. No throwing this stuff over the fence.
![Page 38: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/38.jpg)
1. Software is owned by departments, tracked in ‘genome project’. Directors assign services to teams.
2. Teams are responsible for building & running their services; directors are accountable for their overall estate.
bottom-up ownership, RACI-style
![Page 39: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/39.jpg)
‘ownership donut’ informs tech strategy
3. Ownership is classified: active, passive, at-risk.
‘done’ === 0% ‘at risk’
![Page 40: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/40.jpg)
3. deploymentServices need somewhere to live. We’ve open-sourced tooling over docker and AWS to give:
elasticity + fast provisioning + service isolation+ fast rollback
+ repeatable, immutable deployment.
https://github.com/gilt/ionroller
![Page 41: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/41.jpg)
4. lightweight APIsWe’ve settled on REST-style APIs, using http://apidoc.me. Separate interface from implementation; ‘an AVRO for REST” (Mike Bryzek, Gilt Founder)
We strongly recommend zero-dependency strongly-typed clients.
![Page 42: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/42.jpg)
5. audit + alertingHow do we stay compliant while giving engineers full autonomy in prod?
Really smart alerting: http://cavellc.github.io
orders[shipTo: US].count.5m == 0
![Page 43: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/43.jpg)
6. io explosionEach service call begets more service calls; some of which are redundant...=> unintended complexity and performance
Looking to lambda architecture for critical-path APIs: precompute, real-time updates, O(1) lookup
![Page 44: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/44.jpg)
7. reportingMany services => many databases => data is centralized.
Solution: real-time event queues to a data-lake.
![Page 45: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/45.jpg)
so…
how did I really feel about yesterday’s outage?
great.
![Page 46: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/46.jpg)
svc-localised-string
mongodb
login-reg mosaic product listing
product search
product search
A localisation file was loadedwith an character encoding
The driver spun on CPU, consuming CPU credits
The service was small: it was re-writtenin about an hour, deployed and fixed the site.
We knew exactly where the problem was.
We focussed and rapidly deployed tentative incremental fixes.
Once we fixed that problem, all of our problems were fixed.
Try that in a monolith :)
![Page 47: GeeCON Microservices 2015 scaling micro services at gilt](https://reader034.vdocuments.net/reader034/viewer/2022042706/588427251a28ab485c8b6919/html5/thumbnails/47.jpg)
scaling μ-services at Gilt [email protected]
Sopot, Poland11th September 2015
Adrian Trenaman, SVP Engineering, Gilt, @adrian_trenaman
@gilttech