check before you change - usenix · server1 (s1) 172.28.228.21 indaas weighted maxsat solver + +...
TRANSCRIPT
![Page 1: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/1.jpg)
Check before You Change: Preventing Correlated Failures in Service Updates
Ennan Zhai Ang Chen Ruzica Piskac Mahesh Balakrishnan
Bingchuan Tian Bo Song Haoliang Zhang
![Page 2: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/2.jpg)
Background
• Cloud services ensure reliability by redundancy: - Storing data redundantly - Replicating service states across multiple nodes
• Examples: - Amazon AWS, AliCloud, Google Cloud, etc. replicate their
data and service states
![Page 3: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/3.jpg)
However, cloud outages still occur
Why redundancy does not help?
![Page 4: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/4.jpg)
An AWS Outage in 2018
![Page 5: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/5.jpg)
Elastic Compute Cloud (EC2)
Elastic Block Store (EBS)
![Page 6: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/6.jpg)
... ...... ...
EBS Cluster2EBS Cluster1
Elastic Compute Cloud (EC2)
![Page 7: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/7.jpg)
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
Netflix
EBS Cluster2EBS Cluster1
![Page 8: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/8.jpg)
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
Netflix
EBS Cluster2EBS Cluster1
![Page 9: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/9.jpg)
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Cluster2EBS Cluster1
Netflix
![Page 10: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/10.jpg)
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Cluster2EBS Cluster1
Netflix
![Page 11: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/11.jpg)
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Cluster2EBS Cluster1
Netflix
Correlated failures resulting from deep dependencies
![Page 12: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/12.jpg)
Correlated Failures
• Correlated failures are harmful and epidemic: - Propagated to all the redundant instances - Undermine redundancy and fault tolerance efforts
![Page 13: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/13.jpg)
Correlated failures are prevalent
![Page 14: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/14.jpg)
Service initialization
Service Runtime
State of the Art
![Page 15: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/15.jpg)
Service initialization
Service Runtime
Post-Failure Forensics
1. Diagnosis (e.g., Sherlock [SIGCOMM’07]) 2. Accountability (e.g., AVM [OSDI’10]) 3. Provenance (e.g., DiffProv [SIGCOMM’16]) 4. ... …
State of the Art
![Page 16: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/16.jpg)
Service initialization
Service Runtime
Post-Failure Forensics
1. Diagnosis (e.g., Sherlock [SIGCOMM’07]) 2. Accountability (e.g., AVM [OSDI’10]) 3. Provenance (e.g., DiffProv [SIGCOMM’16]) 4. ... …
State of the Art
Proactive Auditing
1. INDaas [OSDI’14] 2. reCloud [CoNEXT’16] 3. RepAudit [OOPSLA’17] 4. ... …
![Page 17: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/17.jpg)
CNF = (A2) ˄ (A1 ˅ A3)
let Server(“172.28.228.21”) -> s1let Server(“172.28.228.22”) -> s2let [s1, s2] -> replet FaultGraph(rep) -> ftlet RankRCG(ft, 2, NET, ft) -> ranklist
1. {Core1[“75.142.33.98”]}2. {Agg1[“10.0.0.1”], Agg2[“10.0.0.2”]}
Core Router1(Core1)
Core Router2(Core2)
Agg Switch3(Agg3)
Server2 (S2)172.28.228.22
Server3 (S3)172.28.228.23
10.0.0.1 10.0.0.2
75.142.33.98 75.142.33.99
Agg Switch1(Agg1)
Agg Switch2(Agg2) 10.0.0.3
Internet
Server4 (S4)172.28.228.24
HDFS
HBase
HDFS
HBase
Server1 (S1)172.28.228.21
INDaaS
Weighted MaxSAT solver
++
Replication
Replica1 Replica2
A2A1 A3 <Weight Vector>
Auditing Engine
Auditing Program in RAL
Auditing Results
Service Deployment (network/software stacks)
Proactive Auditing
• They did pre-deployment recommendations: - Step1: Automatically collecting dependency data - Step2: Modeling system stack in fault graph - Step3: Evaluating alternative deployments’ independence
![Page 18: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/18.jpg)
CNF = (A2) ˄ (A1 ˅ A3)
let Server(“172.28.228.21”) -> s1let Server(“172.28.228.22”) -> s2let [s1, s2] -> replet FaultGraph(rep) -> ftlet RankRCG(ft, 2, NET, ft) -> ranklist
1. {Core1[“75.142.33.98”]}2. {Agg1[“10.0.0.1”], Agg2[“10.0.0.2”]}
Core Router1(Core1)
Core Router2(Core2)
Agg Switch3(Agg3)
Server2 (S2)172.28.228.22
Server3 (S3)172.28.228.23
10.0.0.1 10.0.0.2
75.142.33.98 75.142.33.99
Agg Switch1(Agg1)
Agg Switch2(Agg2) 10.0.0.3
Internet
Server4 (S4)172.28.228.24
HDFS
HBase
HDFS
HBase
Server1 (S1)172.28.228.21
INDaaS
Weighted MaxSAT solver
++
Replication
Replica1 Replica2
A2A1 A3 <Weight Vector>
Auditing Engine
Auditing Program in RAL
Auditing Results
Service Deployment (network/software stacks)
Proactive Auditing
• They did pre-deployment recommendations: - Step1: Automatically collecting dependency data - Step2: Modeling system stack in fault graph - Step3: Evaluating alternative deployments’ independence
SW
Net
![Page 19: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/19.jpg)
CNF = (A2) ˄ (A1 ˅ A3)
let Server(“172.28.228.21”) -> s1let Server(“172.28.228.22”) -> s2let [s1, s2] -> replet FaultGraph(rep) -> ftlet RankRCG(ft, 2, NET, ft) -> ranklist
1. {Core1[“75.142.33.98”]}2. {Agg1[“10.0.0.1”], Agg2[“10.0.0.2”]}
Core Router1(Core1)
Core Router2(Core2)
Agg Switch3(Agg3)
Server2 (S2)172.28.228.22
Server3 (S3)172.28.228.23
10.0.0.1 10.0.0.2
75.142.33.98 75.142.33.99
Agg Switch1(Agg1)
Agg Switch2(Agg2) 10.0.0.3
Internet
Server4 (S4)172.28.228.24
HDFS
HBase
HDFS
HBase
Server1 (S1)172.28.228.21
INDaaS
Weighted MaxSAT solver
++
Replication
Replica1 Replica2
A2A1 A3 <Weight Vector>
Auditing Engine
Auditing Program in RAL
Auditing Results
Service Deployment (network/software stacks)
Proactive Auditing
• They did pre-deployment recommendations: - Step1: Automatically collecting dependency data - Step2: Modeling system stack in fault graph - Step3: Evaluating alternative deployments’ independence
![Page 20: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/20.jpg)
CNF = (A2) ˄ (A1 ˅ A3)
let Server(“172.28.228.21”) -> s1let Server(“172.28.228.22”) -> s2let [s1, s2] -> replet FaultGraph(rep) -> ftlet RankRCG(ft, 2, NET, ft) -> ranklist
1. {Core1[“75.142.33.98”]}2. {Agg1[“10.0.0.1”], Agg2[“10.0.0.2”]}
Core Router1(Core1)
Core Router2(Core2)
Agg Switch3(Agg3)
Server2 (S2)172.28.228.22
Server3 (S3)172.28.228.23
10.0.0.1 10.0.0.2
75.142.33.98 75.142.33.99
Agg Switch1(Agg1)
Agg Switch2(Agg2) 10.0.0.3
Internet
Server4 (S4)172.28.228.24
HDFS
HBase
HDFS
HBase
Server1 (S1)172.28.228.21
INDaaS
Weighted MaxSAT solver
++
Replication
Replica1 Replica2
A2A1 A3 <Weight Vector>
Auditing Engine
Auditing Program in RAL
Auditing Results
Service Deployment (network/software stacks)
Proactive Auditing
• They did pre-deployment recommendations: - Step1: Automatically collecting dependency data - Step2: Modeling system stack in fault graph - Step3: Evaluating alternative deployments’ independence
![Page 21: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/21.jpg)
Redundancy configuration fails
...
![Page 22: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/22.jpg)
Server 2 failsServer 1 fails
Redundancy configuration fails
...
AND gate: all the sublayer nodes fail, the upper layer node fails
![Page 23: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/23.jpg)
Net fails
+" +"
HW fails SW fails SW fails
Server 2 failsServer 1 fails
Net fails HW fails
Redundancy configuration fails
...
OR gate: one of the sublayer nodes fails, the upper layer node fails
![Page 24: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/24.jpg)
Net fails
+" +"
Core1 Agg2Agg1
HW fails
+" +"
Path1 Path2 HBase HDFS
+"
SW fails SW fails
Server 2 failsServer 1 fails
Net fails HW fails
Redundancy configuration fails
... ...... ...
...
...... ...
![Page 25: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/25.jpg)
Net fails
+" +"
Core1 Agg2Agg1
HW fails
+" +"
Path1 Path2 HBase HDFS
+"
SW fails SW fails
Server 2 failsServer 1 fails
Net fails HW fails
Redundancy configuration fails
... ...... ...
...
...... ...
![Page 26: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/26.jpg)
Service initialization
Service Runtime
Post-Failure Forensics
1. Diagnosis (e.g., Sherlock [SIGCOMM’07]) 2. Accountability (e.g., AVM [OSDI’10]) 3. Provenance (e.g., DiffProv [SIGCOMM’16]) 4. ... …
State of the Art
Proactive Auditing
1. INDaas [OSDI’14] 2. reCloud [CoNEXT’16] 3. RepAudit [OOPSLA’17] 4. ... …
![Page 27: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/27.jpg)
Service initialization
Service Runtime
Post-Failure Forensics
1. Diagnosis (e.g., Sherlock [SIGCOMM’07]) 2. Accountability (e.g., AVM [OSDI’10]) 3. Provenance (e.g., DiffProv [SIGCOMM’16]) 4. ... …
Correlated Failure Risks in Updates
Proactive Auditing
1. INDaas [OSDI’14] 2. reCloud [CoNEXT’16] 3. RepAudit [OOPSLA’17] 4. ... …
Changing network paths
Upgrading software components
Azure global outage: Our DNS update mangled domain records
Benjamin Treynor Sloss, Google's VP of engineering, explained that the root cause of last Sunday's outage was a configuration change for a small group of servers in one region being wrongly applied to a larger number of servers across several neighboring regions.
![Page 28: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/28.jpg)
Service initialization
Service Runtime
Post-Failure Forensics
1. Diagnosis (e.g., Sherlock [SIGCOMM’07]) 2. Accountability (e.g., AVM [OSDI’10]) 3. Provenance (e.g., DiffProv [SIGCOMM’16]) 4. ... …
Problem 1: Inefficient Auditing in Updates
Proactive Auditing
1. INDaas [OSDI’14] 2. reCloud [CoNEXT’16] 3. RepAudit [OOPSLA’17] 4. ... …
Changing network paths
Upgrading software components
O(50) hours per auditing V.S.
One update every 3 hours
![Page 29: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/29.jpg)
Service initialization
Service Runtime
Post-Failure Forensics
1. Diagnosis (e.g., Sherlock [SIGCOMM’07]) 2. Accountability (e.g., AVM [OSDI’10]) 3. Provenance (e.g., DiffProv [SIGCOMM’16]) 4. ... …
Problem 2: Lack of fixing risks
Proactive Auditing
1. INDaas [OSDI’14] 2. reCloud [CoNEXT’16] 3. RepAudit [OOPSLA’17] 4. ... …
Changing network paths
Upgrading software components
Fix ?Fix ?
![Page 30: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/30.jpg)
Service initialization
Service Runtime
Post-Failure Forensics
1. Diagnosis (e.g., Sherlock [SIGCOMM’07]) 2. Accountability (e.g., AVM [OSDI’10]) 3. Provenance (e.g., DiffProv [SIGCOMM’16]) 4. ... …
Proactive Auditing
1. INDaas [OSDI’14] 2. reCloud [CoNEXT’16] 3. RepAudit [OOPSLA’17] 4. ... …
Changing network paths
Upgrading software components
Our Contribution
CloudCanary
Fast Audit & Fix
![Page 31: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/31.jpg)
CloudCanary’s Workflow
SnapAudit
UpdatedService Snapshot
Reliability Goal
Dependency acquisition
and
Fault graph generator
Fault Graph
DepBooster
1. {CoreRouter-1}2. {Agg1, Agg2}… ...
Improvement PlansOperator
![Page 32: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/32.jpg)
SnapAudit
UpdatedService Snapshot
Reliability Goal
Dependency acquisition
and
Fault graph generator
Fault Graph
DepBooster
1. {CoreRouter-1}2. {Agg1, Agg2}… ...
Improvement PlansOperator
CloudCanary’s Workflow
![Page 33: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/33.jpg)
SnapAudit
UpdatedService Snapshot
Reliability Goal
Dependency acquisition
and
Fault graph generator
Fault Graph
DepBooster
1. {CoreRouter-1}2. {Agg1, Agg2}… ...
Improvement PlansOperator
CloudCanary’s Workflow
![Page 34: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/34.jpg)
SnapAudit
UpdatedService Snapshot
Reliability Goal
Dependency acquisition
and
Fault graph generator
Fault Graph
DepBooster
1. {CoreRouter-1}2. {Agg1, Agg2}… ...
Improvement PlansOperator
• Challenge 1: SnapAudit • Challenge 2: DepBooster
CloudCanary’s Workflow
![Page 35: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/35.jpg)
SnapAudit
UpdatedService Snapshot
Reliability Goal
Dependency acquisition
and
Fault graph generator
Fault Graph
DepBooster
1. {CoreRouter-1}2. {Agg1, Agg2}… ...
Improvement PlansOperator
• Challenge 1: SnapAudit • Challenge 2: DepBooster
CloudCanary’s Workflow
![Page 36: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/36.jpg)
SnapAudit
UpdatedService Snapshot
Reliability Goal
Dependency acquisition
and
Fault graph generator
Fault Graph
DepBooster
1. {CoreRouter-1}2. {Agg1, Agg2}… ...
Improvement PlansOperator
• Challenge 1: SnapAudit • Challenge 2: DepBooster
CloudCanary’s WorkflowCloudCanary
![Page 37: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/37.jpg)
E1 E1
A Fault Graph
![Page 38: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/38.jpg)
E1 E1
Risk Groups in Fault Graphs
• A risk group means a set of leaf nodes whose simultaneous failures lead to the failure of root node
![Page 39: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/39.jpg)
{A2} and {A1, A3} are risk groups {A1} or {A3} is not risk group
E1 E1
Risk Groups in Fault Graphs
• A risk group means a set of leaf nodes whose simultaneous failures lead to the failure of root node
![Page 40: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/40.jpg)
{A2} and {A1, A3} are risk groups {A1} or {A3} is not risk group
E1 E1
Risk Groups in Fault Graphs
• A risk group means a set of leaf nodes whose simultaneous failures lead to the failure of root node
Identifying correlated failure risks can be reduced to
the problem of finding risk groups in the fault graph.
However, analyzing risk groups is
NP-complete problem
![Page 41: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/41.jpg)
SnapAudit
UpdatedService Snapshot
Reliability Goal
Dependency acquisition
and
Fault graph generator
Fault Graph
DepBooster
1. {CoreRouter-1}2. {Agg1, Agg2}… ...
Improvement PlansOperator
• Challenge 1: SnapAudit • Challenge 2: DepBooster
CloudCanary’s WorkflowCloudCanary
![Page 42: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/42.jpg)
Service initialization
Service Runtime
Changing network paths
Upgrading software components
The Insight of SnapAudit
… …
Δ is small Δ′�is small Δ′�′� is small
S S’ S’’
![Page 43: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/43.jpg)
Service initialization
Service Runtime
Changing network paths
Upgrading software components
The Insight of SnapAudit
… …
Δ is small Δ′�is small Δ′�′� is small
FirstAudit
S S’ S’’
CloudCanary
![Page 44: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/44.jpg)
Service initialization
Service Runtime
Changing network paths
Upgrading software components
The Insight of SnapAudit
… …
Δ is small Δ′�is small Δ′�′� is small
FirstAudit
S S’ S’’
CloudCanary
IncAudit IncAudit IncAudit
![Page 45: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/45.jpg)
Service initialization
Service Runtime
Changing network paths
Upgrading software components
… …
Δ is small Δ′�is small Δ′�′� is small
FirstAudit
S S’ S’’
CloudCanary
IncAudit IncAudit IncAudit
SnapAudit: FirstAudit & IncAudit
![Page 46: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/46.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
X Y
+"
Z
… … … …
R
![Page 47: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/47.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}
![Page 48: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/48.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}
![Page 49: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/49.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}
![Page 50: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/50.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {A, B} - {A, C} - {B, B} - {B, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}
![Page 51: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/51.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {A, B} - {A, C} - {B, B} - {B, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}
![Page 52: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/52.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {A, B} - {A, C} - {B} - {B, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}
![Page 53: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/53.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {A, B} - {A, C} - {B} - {B, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}
![Page 54: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/54.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C} - {B} - {B}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}
![Page 55: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/55.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}H(F)=x31g
- {B} - {A, C}
![Page 56: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/56.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}
![Page 57: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/57.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {S, T}
![Page 58: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/58.jpg)
FirstAudit Primitive
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {B} - {S, T}
![Page 59: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/59.jpg)
Service initialization
Service Runtime
Changing network paths
Upgrading software components
SnapAudit: FirstAudit & IncAudit
… …
Δ is small Δ′�is small Δ′�′� is small
FirstAudit
S S’ S’’
CloudCanary
IncAudit IncAudit IncAudit
![Page 60: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/60.jpg)
Our Insight
• Algorithm sketch: - Finding all the border nodes (black nodes) - Computing their risk groups - Merging these risk groups towards root
Update
![Page 61: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/61.jpg)
Original Deployment
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=lktd
- {A} - {K} - {S, T}
Z
… … … …
RH(R)=aed8
- {A} - {K} - {B} - {S, T}
![Page 62: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/62.jpg)
Updated Deployment
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
Z
… … … …
Q
R
H(Z)=lktd
- {A} - {K} - {S, T}
H(R)=aed8
- {A} - {K} - {B} - {S, T}
![Page 63: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/63.jpg)
Updated Deployment
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=2xzb
- {A} - {K} - {S, T}
Z
… … … …
Q
R - {A} - {K} - {B} - {S, T}
H(R)=aed8
![Page 64: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/64.jpg)
Updated Deployment
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=2xzb
- {A} - {K} - {S, T}
Z
… … … …
Q
RH(R)=45zc
- {A} - {K} - {B} - {S, T}
![Page 65: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/65.jpg)
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=2xzb
- {A} - {K} - {S, T}
Z
… … … …
Q
RH(R)=45zc
- {A} - {K} - {B} - {S, T}
Step 1: Find Border Nodes
Border Nodes
![Page 66: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/66.jpg)
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=2xzb
- {A} - {K} - {S, T}
Z
… … … …
Q
RH(R)=45zc
- {A} - {K} - {B} - {S, T}
Border Nodes
Step 2: Q’s Risk Groups
![Page 67: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/67.jpg)
Boolean formula= E1∧E2
= (A1∨A2)∧(A2∨A3)
Our Insight
![Page 68: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/68.jpg)
Boolean formula= E1∧E2
= (A1∨A2)∧(A2∨A3)
SAT solverSatisfying assignment: {A1=1, A2=0, A3=1}
Our Insight
![Page 69: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/69.jpg)
Boolean formula= E1∧E2
= (A1∨A2)∧(A2∨A3)
SAT solver{A1=0, A2=1, A3=0}
• Problem: - Standard SAT solver outputs an arbitrary satisfying
assignment
Our Insight
![Page 70: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/70.jpg)
Boolean formula= E1∧E2
= (A1∨A2)∧(A2∨A3)
SAT solver{A1=0, A2=1, A3=0}
• Problem: - Standard SAT solver outputs an arbitrary satisfying
assignment - What we want is top-k critical (minimal) risk groups
Our Insight
![Page 71: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/71.jpg)
• Using MinCostSAT solver
- Satisfiable assignment with the least weights
- Obtain the least C = ∑ ci ∙ wi
- Very fast with 100% accuracy
• We can use Weighted Partial MaxSAT: - Solve 100 instances less than 100 sec - Each instance contains ~1000 clauses - Industry-scale competition - Pr
Identifying Risk Groups
![Page 72: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/72.jpg)
• Using MinCostSAT solver
- Satisfiable assignment with the least weights
- Obtain the least C = ∑ ci ∙ wi
- Very fast with 100% accuracy
• We can use Weighted Partial MaxSAT: - Solve 100 instances less than 100 sec - Each instance contains ~1000 clauses - Industry-scale competition - Pr
We set the values of all the leaf nodes (i.e., Wi) as 1
Identifying Risk Groups
![Page 73: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/73.jpg)
• Using MinCostSAT solver
- Satisfiable assignment with the least weights
- Obtain the least C = ∑ ci ∙ wi
- Very fast with 100% accuracy
• We can use Weighted Partial MaxSAT: - Solve 100 instances less than 100 sec - Each instance contains ~1000 clauses - Industry-scale competition - Pr
1 1 1
A1 A2 A3 Weight1 0 00 1 0 10 0 11 1 0 21 0 1 20 1 1 20 0 01 1 1 3
Identifying Risk Groups
![Page 74: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/74.jpg)
• Using MinCostSAT solver
- Satisfiable assignment with the least weights
- Obtain the least C = ∑ ci ∙ wi
- Very fast with 100% accuracy
• We can use Weighted Partial MaxSAT: - Solve 100 instances less than 100 sec - Each instance contains ~1000 clauses - Industry-scale competition - Pr
1 1 1
A1 A2 A3 Weight1 0 00 1 0 10 0 11 1 0 21 0 1 20 1 1 20 0 01 1 1 3
Identifying Risk Groups
![Page 75: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/75.jpg)
• Find out the top-k critical risk groups
- Use a ∧ to connect the current formula and the negation of the resulting assignment
• We can use Weighted Partial MaxSAT: - Solve 100 instances less than 100 sec - Each instance contains ~1000 clauses - Industry-scale competition - Pr
(A1∨A2)∧(A2∨A3) ∧ ¬(¬A1 ∧ A2 ∧ ¬A3)
Identifying Risk Groups
![Page 76: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/76.jpg)
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=2xzb
- {A} - {K} - {S, T}
Z
… … … …
Q
RH(R)=45zc
- {A} - {K} - {B} - {S, T}
Step 2: Q’s Risk Groups
![Page 77: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/77.jpg)
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=2xzb
- {A} - {K} - {S, T}
Z
… … … …
Q
RH(R)=45zc
- {A} - {K} - {B} - {S, T}
Step 2: Q’s Risk Groups
H(Q)=x1r7
- {S} - {K}
![Page 78: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/78.jpg)
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=2xzb
- {A} - {K} - {S, T}
Z
… … … …
Q
RH(R)=45zc
- {A} - {K} - {B} - {S, T}
H(Q)=x1r7
- {S} - {K}
Step 3: Merging Changed Caches
![Page 79: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/79.jpg)
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=2xzb
- {A} - {K} - {S}
Z
… … … …
Q
RH(R)=45zc
- {A} - {K} - {B} - {S, T}
H(Q)=x1r7
- {S} - {K}
Step 3: Merging Changed Caches
![Page 80: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/80.jpg)
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=2xzb
- {A} - {K} - {S}
Z
… … … …
Q
RH(R)=45zc
- {A} - {K} - {B} - {S, T}
H(Q)=x1r7
- {S} - {K}
Step 3: Merging Changed Caches
![Page 81: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/81.jpg)
A B
D
B C
E
+" +"
F
+"
H(D)=43cd
- {A} - {B}
H(E)=a4vo
- {B} - {C}
H(F)=x31g
- {B} - {A, C}
X YH(X)=xbn7
- {A} - {S,T} H(Y)=bbk9
- {K} - {A,S}
+"
H(Z)=2xzb
- {A} - {K} - {S}
Z
… … … …
Q
RH(R)=45zc
- {A} - {K} - {B} - {S}
H(Q)=x1r7
- {S} - {K}
Step 3: Merging Changed Caches
![Page 82: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/82.jpg)
SnapAudit
UpdatedService Snapshot
Reliability Goal
Dependency acquisition
and
Fault graph generator
Fault Graph
DepBooster
1. {CoreRouter-1}2. {Agg1, Agg2}… ...
Improvement PlansOperator
• Challenge 1: SnapAudit • Challenge 2: DepBooster
CloudCanary’s WorkflowCloudCanary
![Page 83: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/83.jpg)
Core Router1(Core1)
Core Router2(Core2)
Agg Switch3(Agg3)
Server1 (S1)172.28.228.21
Server2 (S2)172.28.228.22
Server3 (S3)172.28.228.23
10.0.0.1 10.0.0.2
75.142.33.98 75.142.33.99
Agg Switch1(Agg1)
Agg Switch2(Agg2)
10.0.0.3
Internet
Server4 (S4)172.28.228.24
HDFS
HBase
HDFS
HBase
Correlated Failure Risk Repairing
![Page 84: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/84.jpg)
$Server -> 172.28.228.21, 172.28.228.22 goal(failProb(ft)<0.08 | ChNode | Agg3)
Specification:
Core Router1(Core1)
Core Router2(Core2)
Agg Switch3(Agg3)
Server1 (S1)172.28.228.21
Server2 (S2)172.28.228.22
Server3 (S3)172.28.228.23
10.0.0.1 10.0.0.2
75.142.33.98 75.142.33.99
Agg Switch1(Agg1)
Agg Switch2(Agg2)
10.0.0.3
Internet
Server4 (S4)172.28.228.24
HDFS
HBase
HDFS
HBase
Correlated Failure Risk Repairing
![Page 85: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/85.jpg)
$Server -> 172.28.228.21, 172.28.228.22 goal(failProb(ft)<0.08 | ChNode | Agg3)
Specification:
Core Router1(Core1)
Core Router2(Core2)
Agg Switch3(Agg3)
Server1 (S1)172.28.228.21
Server2 (S2)172.28.228.22
Server3 (S3)172.28.228.23
10.0.0.1 10.0.0.2
75.142.33.98 75.142.33.99
Agg Switch1(Agg1)
Agg Switch2(Agg2)
10.0.0.3
Internet
Server4 (S4)172.28.228.24
HDFS
HBase
HDFS
HBase
Correlated Failure Risk Repairing
![Page 86: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/86.jpg)
$Server -> 172.28.228.21, 172.28.228.22 goal(failProb(ft)<0.08 | ChNode | Agg3)
Specification:
Core Router1(Core1)
Core Router2(Core2)
Agg Switch3(Agg3)
Server1 (S1)172.28.228.21
Server2 (S2)172.28.228.22
Server3 (S3)172.28.228.23
10.0.0.1 10.0.0.2
75.142.33.98 75.142.33.99
Agg Switch1(Agg1)
Agg Switch2(Agg2)
10.0.0.3
Internet
Server4 (S4)172.28.228.24
HDFS
HBase
HDFS
HBase
Plan 1: Move replica from S1 -> S4 Plan 2: Move replica from S2 -> S4
DepBooster
Correlated Failure Risk Repairing
![Page 87: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/87.jpg)
$Server -> 172.28.228.21, 172.28.228.22 goal(failProb(ft)<0.08 | ChNode | Agg3)
Specification:
Core Router1(Core1)
Core Router2(Core2)
Agg Switch3(Agg3)
Server1 (S1)172.28.228.21
Server2 (S2)172.28.228.22
Server3 (S3)172.28.228.23
10.0.0.1 10.0.0.2
75.142.33.98 75.142.33.99
Agg Switch1(Agg1)
Agg Switch2(Agg2)
10.0.0.3
Internet
Server4 (S4)172.28.228.24
HDFS
HBase
HDFS
HBase
Plan 1: Move replica from S1 -> S4 Plan 2: Move replica from S2 -> S4
Synthesis
Correlated Failure Risk Repairing
DepBooster
![Page 88: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/88.jpg)
$Server -> 172.28.228.21, 172.28.228.22 goal(failProb(ft)<0.08 | ChNode | Agg3)
Specification:
Core Router1(Core1)
Core Router2(Core2)
Agg Switch3(Agg3)
Server1 (S1)172.28.228.21
Server2 (S2)172.28.228.22
Server3 (S3)172.28.228.23
10.0.0.1 10.0.0.2
75.142.33.98 75.142.33.99
Agg Switch1(Agg1)
Agg Switch2(Agg2)
10.0.0.3
Internet
Server4 (S4)172.28.228.24
HDFS
HBase
HDFS
HBase
Plan 1: Move replica from S1 -> S4 Plan 2: Move replica from S2 -> S4
Synthesis
Correlated Failure Risk Repairing
DepBooster
![Page 89: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/89.jpg)
SnapAudit
UpdatedService Snapshot
Reliability Goal
Dependency acquisition
and
Fault graph generator
Fault Graph
DepBooster
1. {CoreRouter-1}2. {Agg1, Agg2}… ...
Improvement PlansOperator
• Challenge 1: SnapAudit • Challenge 2: DepBooster
CloudCanary’s WorkflowCloudCanary
![Page 90: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/90.jpg)
Evaluation
• Comparing CloudCanary with the state of the art
• Evaluating CloudCanary’s practicality via real dataset
![Page 91: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/91.jpg)
Evaluation
Accuracy Efficiency Improvement
INDaaS [OSDI’14]
ProbINDaaS [OSDI’14]
reCloud [CoNEXT’16]
RepAudit [OOPSLA’17]
CloudCanary
✔
✔
✘✘✔
✘✔✔✔✔
✘✘✘✘✔
![Page 92: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/92.jpg)
Efficiency Comparison
CloudCanary
INDaaS
ReCloud (107)
6481
S0 S1 S2 S3 S4 S5
Auditing Time (hours)
Update Snapshots
S6
ProbINDaaS (107)
2 16 41 2 82 16 1 2 81 2 161 210 0 0 0 0 0 0
RepAudit
~8 hours
> 16 hours
![Page 93: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/93.jpg)
Accuracy V.S. Efficiency
20
40
60
80
100
1 2 4 8 16 32 256 1024 4096
Acc
ura
cy
Turnaround time (minutes)
INDaaS
ProbINDaaS (107 rounds)reCloud (107 rounds)RepAudit
CloudCanary
• 20,608 switches; 524,288 servers; 638,592 software components • Auditing a random update affecting 20% components
![Page 94: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/94.jpg)
20
40
60
80
100
1 2 4 8 16 32 256 1024 4096
Acc
ura
cy
Turnaround time (minutes)
INDaaS
ProbINDaaS (107 rounds)reCloud (107 rounds)RepAudit
CloudCanary
Accuracy V.S. Efficiency• 20,608 switches; 524,288 servers; 638,592 software components • Auditing a random update affecting 20% components
Our approach is 200x faster than state-of-the-arts, and offers 100% accurate results.
![Page 95: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/95.jpg)
Evaluation
• We evaluated CloudCanary via real update trace:
Detected Num Confirmed Examples
Microservices 50+ 96% Authentication and access control systems introduce most risk groups
Power Sources 10+ 100%Primary and backup power sources are
carelessly assigned to multiple racks hosting a critical service
Network 30+ 100% Aggregation and ToR switches are easily updated to be risk groups
![Page 96: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/96.jpg)
Conclusion
• CloudCanary is the first system for real-time auditing - SnapAudit primitive: Quickly auditing update snapshot - DepBooster: Quickly generating improvement plans
• We evaluated CloudCanary with real trace and large-scale emulations
![Page 97: Check before You Change - USENIX · Server1 (S1) 172.28.228.21 INDaaS Weighted MaxSAT solver + + Replication Replica1 Replica2 A1 A2 A3 Auditing Engine Auditing](https://reader034.vdocuments.net/reader034/viewer/2022050206/5f58edfe112d542f8b3381ab/html5/thumbnails/97.jpg)
Thanks, questions?
• CloudCanary is the first system for real-time auditing - SnapAudit primitive: Quickly auditing update snapshot - DepBooster: Quickly generating improvement plans
• We evaluated CloudCanary with real trace and large-scale emulations