re-architecting congestion management in losslessethernet · re-architecting congestion management...

38
Re-architecting Congestion Management in Lossless Ethernet Wenxue Cheng, Kun Qian, Wanchun Jiang(CSU), Tong Zhang, Fengyuan Ren NNS group @ Department of Computer Science and Technology, Tsinghua University

Upload: others

Post on 04-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Re-architecting Congestion Management in

Lossless EthernetWenxue Cheng, Kun Qian, Wanchun Jiang(CSU), Tong Zhang, Fengyuan Ren

NNS group @ Department of Computer Science and Technology, Tsinghua University

Page 2: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Data Center Networks

Large Scales> 10000 machines

Short Messages< 1 KB

Special ProtocolsRDMA

Small RTT< 100𝜇𝑠

High Bandwidth10/40~100/400 Gbps

Shallow Buffer< 30 MB for ToR

Packets Loss ↑

Performance ↓

Page 3: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Data Center Networks

Small RTT< 100𝜇𝑠

High Bandwidth10/40~100/400 Gbps

Shallow Buffer< 30 MB for ToR

Large Scales> 10000 machines

Short Messages< 1 KB

Special ProtocolsRDMA

Lossless Ethernet

Page 4: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Priority-based Flow Control (PFC)

XOFF XON

Upstream Port(Sender)

Downstream Port(Receiver)

Page 5: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

PAUSE

Priority-based Flow Control (PFC)

PAUSE

XOFF XON

Upstream Port(Sender)

Downstream Port(Receiver)

Page 6: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Priority-based Flow Control (PFC)

PAUSE

XOFF XON

Upstream Port(Sender)

Downstream Port(Receiver)

Congestion Spreading

Page 7: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

RESUMERESUME

Priority-based Flow Control (PFC)

XOFF XON

Upstream Port(Sender)

Downstream Port(Receiver)

Page 8: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

PFC IssuesCongestion Spreading & Head-of-Line Blocking

F0F1

Burst

H0

H1

H2

R0

R1

S0 S1

… H15

P2P1

P0

Congestion tree from P2 to H0 and H1.

F0 is a victim flow.

Page 9: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Control Schemes

F0F1

Burst

H0

H1

H2

R0

R1

S0 S1

… H15

P2P1

P0

Congestion control schemes are needede.g. QCN[IEEE 802.1], DCQCN[RoCEv2] and TIMELY[SIGCOMM 2015].

Congestion NotificationRate Adjustment

Time

Rate

Page 10: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Experimental Observation

Time (ms)0 3

H0H1H2…

H15

16 Messages of 64KB

Page 11: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Experimental Observation

(1) Congestion spreading still exists.

Sending Rate of F1

Evolution-based rate decrease is slower than PFC’s effect.

Page 12: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Experimental Observation

Sending Rate of F0 (Gbps)

(1) Congestion spreading still exists.(2) F0 is also victimized by CC.

PFC infects congestion detection of congestion control schemes.

Page 13: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Experimental Observation

(1) Congestion spreading still exists.(2) F0 is also victimized by CC.(3) Rate recovery is inadaptable to

dynamic network conditions.

Liner rate increase method and tuning parameters.

Page 14: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Basic Idea

Re-architecting Congestion Management

• Congestion Flows ⟷Victim Flows

Congestion Detection

• Fast Rate Decrease• Automatic Rate Increase

Rate Adjustment

Page 15: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Detection

Real-Congestion (P2)

∑𝑅 > 𝐶

Non-Congestion (P3)

∑𝑅 < 𝐶

Quasi-Congestion (P0)

RESUME RESUMEPAUSE

∑𝑅 ?𝐶

Page 16: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Detection

Quasi-Congestion (P0)1 1 1 1

RESUME RESUMEPAUSE

Real-Congestion (P2)1 1 1 11 1

Explicit Congestion Notification (ECN)• Only based on queue length• Fail to distinguish quasi-congestion and

real-congestion

Page 17: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Detection

Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has

been paused• Counter PN: number of packets that

has been paused

Page 18: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Detection

Quasi-Congestion (P0)

Real-Congestion (P2)Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has

been paused• Counter PN: number of packets that

has been paused

PAUSERESUMEPN=3

0 0 0PN=2PN=1PN=0

Page 19: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Detection

Quasi-Congestion (P0)

Real-Congestion (P2)Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has

been paused• Counter PN: number of packets that

has been paused

PAUSERESUMEPN=3PN=2PN=1PN=0

1 0 0 0

Page 20: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Detection

Quasi-Congestion (P0)

Real-Congestion (P2)Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has

been paused• Counter PN: number of packets that

has been paused

PAUSERESUMEPN=3

0 0PN=2PN=1

Page 21: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Detection

Quasi-Congestion (P0)0 0 1 0

RESUME RESUMEPAUSE

Real-Congestion (P2)1 1 1 11 1

Partially marked with ECN

Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has

been paused• Counter PN: number of packets that

has been paused

Page 22: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Detection

Quasi-Congestion (P0)0 0 1 0

RESUME RESUMEPAUSE

Real-Congestion (P2)111111

Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has

been paused• Counter PN: number of packets that

has been paused PN=0

Page 23: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Detection

Quasi-Congestion (P0)0 0 1 0

RESUME RESUMEPAUSE

Real-Congestion (P2)1 1 1 11 1

Continuously marked with ECN

Non-Paused ECN (NP-ECN)• Don’t change ECN for packets that has

been paused• Counter PN: number of packets that

has been paused

Page 24: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Congestion Detection

Quasi-Congestion (P0)0 0 1 0

RESUME RESUMEPAUSE

Real-Congestion (P2)1 1 1 11 1

Continuously marked with ECN

Partially marked with ECNNon-Paused ECN (NP-ECN)• Don’t change ECN for packets that has

been paused• Counter PN: number of packets that

has been paused

Victim Flows

Congested Flows

Page 25: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Rate Adjustment

How to adjust the rates of • Congested Flows --> target?• Victim Flows --> no decrease?• Non-congested Flows

Burst = 40𝐺𝑏𝑝𝑠, F0 = 20𝐺𝑏𝑝𝑠,Reduce F1’s rate

H4

Page 26: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

F0 = 20𝐺𝑏𝑝𝑠, Reduce F1’s rate

Rate Adjustment

How to adjust the rates of • Congested Flows è reduce to receiving rate immediately • Victim Flows & Uncongested Flows à rate increase

Page 27: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Rate Adjustment

How to adjust the rates of • Congested Flows è reduce to receiving rate immediately • Victim Flows & Uncongested Flows à rate increase

Receiver-Driven Rate Decrease• 𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒 ← min 𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒, 1 − 𝑤IJK 𝑟𝑒𝑐𝑅𝑎𝑡𝑒• No PFC & no serious throughput loss & 1 control loop

Page 28: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Rate Adjustment

How to adjust the rates of • Congested Flows è reduce to receiving rate immediately • Victim Flows & Non-congested Flows à rate increase

Receiver-Driven Rate Decrease• 𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒 ← min 𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒, 1 − 𝑤IJK 𝑟𝑒𝑐𝑅𝑎𝑡𝑒• No congestion & No PFC triggers in one control loop

Self-weighted Rate increase

• N𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒 ← 𝑠𝑒𝑛𝑑𝑅𝑎𝑡𝑒 1 − 𝑤 +𝑀𝑎𝑥𝑅𝑎𝑡𝑒 R 𝑤𝑤 ← 𝑤 1 − 𝑤 +𝑤IST R 𝑤

• Automatic gentle-to-aggressive

Page 29: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Photonic Congestion Notification (PCN)

• Receiver-driven Rate Decrease

• Self-weighted Rate increase

• NP-ECN• Identify

Congested Flows • Rate Estimator

SenderSwitches

Receiver

Congestion Notification Packet (CNP)

Period T

Page 30: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

PCN’s Benefit

Time (ms)0 3

H0H1H2…

H15

16 Messages of 64KB

No congestion tree

No serious throughput loss

Page 31: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Benefit

Time (ms)0 3

H0H1H2…

H15

16 Messages of 64KB

Sending Rate of F0 (Gbps)

Sending Rate of F1 (Gbps)

F1 is reduced in one loop

F0 is not victimized by PCN

Page 32: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Evaluation SetupTestbed Setup

• Dumbbell topology • Implementation on DPDK (Intel 82599)• 4 hosts (PowerEdge R530) connected to single ToR• 10Gbps

NS-3 Simulation Setup• Clos topology• 512 hosts / 32 ToRs / 16 Leafs / 8 Spines• 10Gbps / 40Gbps

Page 33: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Evaluations

Basic Prosperities

• Convergence• Fairness• Stability

Workbench

• Burst Tolerance• Parameter

sensitivity• Realistic

Workloads

Special Cases

• Flow Scalability• Adversarial Traffic• Multiple

Bottlenecks• Multiple Priorities• DeadlockTestbed

NS-3 Simulations

Page 34: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Evaluation: Large-Scale SimulationsSimulation Setup

W1: Web-server workloadW2: Hadoop cluster workload

Pod 0 Pod 7

512 hosts

Page 35: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Evaluation: Large-Scale Simulations

Web-server Workload

PAUSE Rate (Mbps) Flow Complete Time (ms) Flow Complete Rate (Kps)

Page 36: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Evaluation: Large-Scale Simulations

Hadoop Workload

PAUSE Rate (Mbps) Flow Complete Time (ms) Flow Complete Rate (Kps)

Page 37: Re-architecting Congestion Management in LosslessEthernet · Re-architecting Congestion Management in LosslessEthernet WenxueCheng, KunQian, WanchunJiang(CSU), Tong Zhang, ... RESUME

Conclusion

Evaluations on testbed and ns-3 simulation show, PCN triggers fewer PFC and achieves lower flow completion time.

Re-architecting congestion management

Proposing Photonic Congestion Notification (PCN)• NP-ECN à victim flows/congested flows• Receiver-driven rate decrease à no PFC in 1 loop• Automatic rate increase