![Page 1: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/1.jpg)
Cross-Layer Scheduling in Cloud Systems
Hilfi Alkaff, Indranil Gupta, Luke Leslie
Department of Computer Science
University of Illinois at Urbana-Champaign
1Distributed Protocols Research Group: http://dprg.cs.uiuc.edu
![Page 2: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/2.jpg)
Inside a Datacenter: Networks Connecting Servers
Tree
Fat Tree[Leiserson 85]
Jellyfish [Singla 12]
Clos [Dally 04]
VL2 [Greenberg 09]
2
![Page 3: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/3.jpg)
Tree
Fat Tree[Leiserson 85]
Jellyfish [Singla 12]
Clos [Dally 04]
VL2 [Greenberg 09]
Structured Networks Unstructured Networksand/or routing
Inside a Datacenter: Networks Connecting Servers
3
![Page 4: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/4.jpg)
SDN• Software Defined Networking
• For any end-host pair, multiple routes available
• SDN Controller helps to choose one of these routes– Configures switches accordingly
• Which route is the “best”?
4
![Page 5: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/5.jpg)
SDNs and Applications• Which route is the “best”?• Our approach
– Best network routes should really be decided based on the application that is using the network• To minimize interference (and thus congestion) and to optimize bandwidth use• Today: SDN routes selected application-agnostic way
– But the application itself can help, by placing tasks at servers• Today: Applications schedule tasks in network-agnostic way, leading to bad
bandwidth utilization– SDN Controller and Application Scheduler should coordinate with
each other• This is our cross-layer scheduling approach
5
![Page 6: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/6.jpg)
Applications: Short Real-Time Analytics Jobs
Batch Processing: MapReduce, Hadoop
Stream Processing: Storm
6
![Page 7: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/7.jpg)
Tasks
Storm
Tasks
Hadoop
7
![Page 8: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/8.jpg)
Tasks and Flows
Storm
Tasks
Hadoop
Flows 8
![Page 9: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/9.jpg)
Challenges• Two large state spaces to explore
1. Set of Possible Routes for each end-to-end flow
– Large numbers of flows and possible routes
2. Set of Possible Task to Server Placements
– Large numbers of servers and tasks
9
![Page 10: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/10.jpg)
Our Strategy• To explore state space, use simulated annealing– At application level scheduler– And separately at routing (SDN) level
• Simulated Annealing– probabilistic approach – avoids getting stuck in local optima with some non-zero
probability of jumping away– probability of jumping away decreases quickly over time
(annealing process for steel)
10
![Page 11: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/11.jpg)
Pre-computation• For all pairs of servers, pre-compute the k shortest paths
– Store it in a hash table, indexed by server pair
– Compact storage by merging overlapping routes (for a server pair) into a tree
• Small in size and Quick to compute– 1000 servers, k=10
– 50 M entries
– After compaction, 6 MB
– 3 minutes to generate
11
![Page 12: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/12.jpg)
When a Job Arrives• Don’t change the allocations or routes of existing jobs
– Non-intrusive
– Reduces state space to explore
• Simulated Annealing is run offline, and the resultant schedule is used to schedule new job’s tasks and flows
• Primary Simulated Annealing (SA) runs at Application level– Calls Routing level SA
12
![Page 13: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/13.jpg)
Simulated Annealing Steps• Start from an arbitrary state
– Tasks to servers, and routes to flows
• Generate next-state S’(At Application Level)
1. De-allocate one task• Prefer tasks that affect computation more, e.g., closer to beginning or end of
topology
2. Allocate this task to random server
3. Call Routing level SA
13
![Page 14: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/14.jpg)
Simulated Annealing Steps (2)…
3.Call Routing level SA
4.(At Routing Level)
5.De-path one route• Select random server pair
• Remove its worst path
– Prefer higher number of hops, and break ties by lower bandwidth
6.Allocate Path: Change this route to a better path– Prefer lower number of hops, and break ties by higher bandwidth
14
![Page 15: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/15.jpg)
Simulated Annealing Steps (3)• After generating next-state S’
– Calculate utility(S’)
– Utility function considers all jobs in cluster (not just new job)
– Utility function accounts for bottlenecked paths from source tasks to sink tasks
• If utility(S’) > utility(current state)– Transition from current state to S’
• If utility(S’) ≤ utility(current state)– Transition with probability e(utility(S’)-utility(current state))/t
– Non-zero probability of transitioning even if S’ is a worse state
– Probability decreases over time (t)
• Wait until convergence
• Re-run entire simulated annealing 5 times, and take best result
15
![Page 16: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/16.jpg)
Experiments• Implemented into Apache Hadoop (YARN)
• Implemented into Apache Storm
• Deployment experiments on Emulab: up to 30 hosts– Emulated network using ZeroMQ and Thrift
– Emulated Fat-Tree and Jellyfish
• Larger scale simulation experiments – Upto 1000 hosts
16
![Page 17: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/17.jpg)
Experimental Settings• 10 hosts, 100 Mbps, 5 links per router, #links selected via scaling rules
– 3 GHz, 2 GB RAM
• Hadoop cluster workload– Facebook’s SWIM benchmark
– Shuffle ranges from 100 B to 10 GB
– 1 job per second
• Storm cluster workload: Random tree topologies– Topologies constructed as randomly with number of children selected by Gaussian (mean = sd = 2)
– 100 B tuples
– Each source generate 1 MB – 100 MB of data
– 10 jobs per minute
• Each experimental run is 10 minutes
17
![Page 18: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/18.jpg)
Tree
Fat Tree[Leiserson 85]
Jellyfish [Singla 12]
Clos [Dally 04]
VL2 [Greenberg 09]
Structured Networks Unstructured Networksand/or routing
Inside a Datacenter: Networks Connecting Servers
18
![Page 19: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/19.jpg)
Storm on Jellyfish Topology
App+Routing SA: 34.1% improvement in throughput at 30 hosts
Application-only SA: 21.2%Routing-only SA: 23.2% Performance
improves with scale
19
![Page 20: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/20.jpg)
Hadoop on Fat-Tree Topology
App+Routing SA: 26% improvement in throughput at 30 hosts
Application-only SA & Routing-only SASmaller than combining both
Performance improves with scale
20
![Page 21: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/21.jpg)
Other Experimental Results• Similar results for other combinations
• Hadoop on Jellyfish– App+Routing SA: 31.9% improvement in throughput at 30 hosts
– Performance improves with scale
– Application-only SA: 18.8%
– Routing-only SA: 25.5%
• Storm on Fat-Tree– App+Routing SA: 30% improvement in throughput at 30 hosts
– Performance improves with scale
– Application-only SA: 21.1%
– Routing-only SA: 22.7%
21
![Page 22: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/22.jpg)
Other Experimental Results (2)• Scheduling time is small
– Time to schedule a new job in a 1000 server cluster– Fat-Tree: 0.48 s (Hadoop) to 0.53 s (Storm)
– Jellyfish: 0.67 s (Hadoop) to 0.74 s (Storm)
• No starvation – Worst case degradation in completion time for any job is 20% in Hadoop, 30% in
Storm
– Outliers are large jobs (rare in real-time analytics with short jobs)
• Fault-recovery is fast– Upon failure, re-run simulated annealing once
– Recovery occurs within 0.35 s to 0.4 s
22
![Page 23: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/23.jpg)
Takeaways• Today: Application schedulers and SDN scheduler are disjoint
– Leads to suboptimal placement and routing
• Our approach: coordinated cross-layer scheduling– Explore small state spaces
– Use simulated annealing
• At 30 hosts, gives between 26% to 34% improvement in Hadoop and Storm for both structured/unstructured networks – Other networks will fall between these two numbers
• Overheads are small, and improvement gets better with scale
23Distributed Protocols Research Group: http://dprg.cs.uiuc.edu
![Page 24: Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign](https://reader037.vdocuments.net/reader037/viewer/2022103123/56649d355503460f94a0c1da/html5/thumbnails/24.jpg)
Ongoing/Future WorkOur work opens the door:
•Explore other heuristics, e.g., data affinity for tasks, congestion
•Explore other non-SA approaches
•Available bandwidth estimation
•OpenFlow integration
•Batching multiple jobs into scheduling
24Distributed Protocols Research Group: http://dprg.cs.uiuc.edu