breaking azure for fun and profit
TRANSCRIPT
![Page 1: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/1.jpg)
Breaking Azure for Fun and Profit
Pavel MichailovIdentity Division
![Page 2: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/2.jpg)
Service Challenges
![Page 3: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/3.jpg)
Cloud Services - Resilience▪ Not a solved problem
▪ Goal is: ▪ 100% uptime▪ No degradation▪ Responsive
![Page 4: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/4.jpg)
Cloud Services - Deployment
![Page 5: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/5.jpg)
Cloud Services – Testing challenges▪ Continuous evolution
▪ Multiple dependencies
▪ Global distribution
▪ Traffic fluctuation
![Page 6: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/6.jpg)
Fault Injection System
▪ Inject faults in deployed service
▪ Verify correct service response
▪ Overcome limitations of traditional testing
![Page 7: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/7.jpg)
Agenda
System Overview
Applications
![Page 8: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/8.jpg)
System Architecture
Target Service VMs
Fault Management
Service
Fault Agent
Fault AgentCloud Management Service
Cloud Management Service
![Page 9: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/9.jpg)
Faults
▪ Resource pressure
▪ Network
▪ Processes
▪ Virtual machine
▪ Application specific
▪ Custom
![Page 10: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/10.jpg)
Resource Pressure Faults
▪ CPU
▪ Memory
▪ Hard disk▪ Capacity▪ Read▪ Write
![Page 11: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/11.jpg)
Network faults▪ Types
▪ Disconnect▪ Latency
▪ Filters▪ Domain / IP / Subnet▪ Port
![Page 12: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/12.jpg)
Process faults
▪ Stop / Kill
▪ Restart
▪ Crash
▪ Hang
![Page 13: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/13.jpg)
Virtual Machine / OS faults
▪ Stop
▪ Restart
▪ Re-image
▪ Machine Hang
▪ Change date
![Page 14: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/14.jpg)
Application specific faults
▪ Hooks▪ Instrument service code
▪ Intercept / Re-route calls▪ No access to service code
![Page 15: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/15.jpg)
Custom Faults
▪ Support for custom code execution
▪ Partner teams contribute as needed
▪ Faults subject to security review
![Page 16: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/16.jpg)
Injection mechanism
▪ VM External
▪ VM Internal – Service code external Agent
▪ VM Internal – Service code internal Hooks
![Page 17: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/17.jpg)
External injection▪ VM / Region Stop
▪ VM / Region Restart
▪ Re-image
Target VMTarget VM
Cloud Management Service
Cloud Management Service
![Page 18: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/18.jpg)
VM internal injection - Agent▪ Resource pressure
▪ Network
▪ Processes
▪ OS
▪ Detours
▪ …Target Service VM
Target Application
Virtual Machine
Operating System
Fault Agent
![Page 19: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/19.jpg)
VM internal injection - Hooks▪ Application behavior
▪ Flexibility
▪ Service specificTarget Application
![Page 20: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/20.jpg)
Security and Safety▪ Azure AD Integration
▪ Granular access control
▪ Secure communication
▪ Kill-switch/automated removal
![Page 21: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/21.jpg)
Applications
Resilience verification
Test new features Training
![Page 22: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/22.jpg)
Resilience Verification
![Page 23: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/23.jpg)
Automated Regression Testing
▪ Scheduled periodic test runs
▪ Verify alert generation
▪ Verify telemetry and service behavior
![Page 24: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/24.jpg)
Scheduled Runs
![Page 25: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/25.jpg)
Verify Alert Generation
▪ Integration with internal alerting system
▪ Configurable time window, expected field values
▪ Incident auto-mitigation/resolution
![Page 26: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/26.jpg)
Verify Service Behavior
![Page 27: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/27.jpg)
Security Verification
▪ Custom Faults▪ Local User Creation▪ Malware upload – EICAR test file
▪ Verify security alerting
![Page 28: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/28.jpg)
New Feature Verification
▪ Fill gap in testing frameworks
▪ Manual injection of relevant faults
▪ Existing regression tests catch edge-cases
![Page 29: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/29.jpg)
Challenges – Moving Parts
▪ Multiple unmocked components
▪ Complex scenarios difficult to verify reliably
▪ Time consuming
![Page 30: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/30.jpg)
Challenges – Adoption
▪ Full benefit only when applied across stack
▪ Non-functional testing often deprioritized
▪ Multi-team coordination difficult
![Page 31: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/31.jpg)
Recovery Games
![Page 32: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/32.jpg)
Recovery Games - Planning
▪ Attacker prepares weekly fault
▪ Identify area of interest
▪ Develop and test fault
![Page 33: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/33.jpg)
Recovery Games – During the Game
▪ Attacker injects fault, provides hints
▪ Defender assesses impact
▪ Defender provides mitigation plan
▪ Senior team members and managers observe
![Page 34: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/34.jpg)
Recovery Games - Goals
▪ Familiarize with monitoring tools
▪ Recognize outage patterns
▪ Train on assessing the impact
▪ Root-cause / mitigation mindset
▪ Practice log analysis
![Page 35: Breaking Azure for Fun and Profit](https://reader036.vdocuments.net/reader036/viewer/2022062522/58859b421a28abd2498b5a11/html5/thumbnails/35.jpg)
Recovery Games – Issue Discovery