wait-free data structures on embedded multi-core systems
DESCRIPTION
Presentation for my master's thesis on wait-free data structures for embedded multi-core systems.TRANSCRIPT
![Page 1: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/1.jpg)
• Vortrag zur Masterarbeit
• Aufgabensteller: Prof. Dr. Dieter Kranzlmüller
• Betreuer: Dr. Karl Fürlinger (LMU)
Dr. Tobias Schüle (Siemens CT)
• Datum des Vortrags: 05.11.2014
Evaluation of Task Scheduling
Algorithms and Wait-Free Data
Structures for Embedded Multi-Core
Systems
Tobias Fuchs
![Page 2: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/2.jpg)
Structure of this talk
1. Introduction1. Motivation
2. Problem Statement and Objectives
2. Wait-free data structures1. Foundations
2. Pools
3. Queues
4. Stacks
3. Task Scheduling1. Work stealing
2. Prioritized work stealing in EMBB
4. Conclusion
2Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 3: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/3.jpg)
Wait-freedom:
Motivation
3Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 4: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/4.jpg)
Motivation
Wait-free algorithms
• Strongest possible fault tolerance
• Guarantee progress and upper bound for execution time
Gains:
+ Progress is potentially a formal constraint in real-time
computing
+ Wait-freedom eliminates the classic concurrency problems:
Deadlocks, Priority Inversion, Convoying, Kill-Intolerance
4Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 5: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/5.jpg)
Problem statement
State of the art
No suitable wait-free data structures for embedded systems:
• Employing mechanisms such as garbage collection
• Not designed for restricted resources
• No evaluation for latency
Challenges:
- Transforming data structures to wait-free equivalents is
non-trivial, usually from-scratch redesign
- Implementations depend on platform architecture
5Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 6: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/6.jpg)
Objectives
1. Review and evaluation of state of the art approaches for
suitability on embedded systems
2. Real-time compliant implementations of wait-free data
structures
3. Definition, implementation and evaluation of suitable
benchmark scenarios for wait-free data structures and
task scheduling algorithms
+ Automated verification derived from semantic definition
6Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 7: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/7.jpg)
Foundations
7Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 8: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/8.jpg)
Progress conditions
Classification of progressOn the Nature of Progress (Herlihy, Shavit 2011)
8Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 9: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/9.jpg)
Real-time requirements
Performance priorities on real-time systems
Guarantees on worst-case runtime behavior
Aim for latency / jitter-reduction, neglecting throughput
Avoid non-determinism, as in malloc / new (see: MISRA)
9Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 10: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/10.jpg)
Evaluation methodology
10Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
Real-time applications are designed to optimize latency
Related work does not evaluate latency, but only mean or
median throughput
Evaluation of worst-case latency is tough:
• In related work, measurements outside of 97.5% confidence
interval are considered outliers and ignored
• These outliers are our data
![Page 11: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/11.jpg)
Pools
11Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 12: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/12.jpg)
Wait-free data structures:
Pools
Pools
… realize dynamic memory allocation
… while eliminating heap fragmentation
• Fundamental data structure of any concurrent container
• Fixed number of objects in static or automatic memory
• Pools manage concurrent removal and reclamation of
objects
RemoveAny(pool, er) Remove and return element erAdd(pool, e) Add element e back to the pool
12Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 13: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/13.jpg)
Pools:
Related work
Related work
Close to none:
• Several lock-free pools, e.g. tree-based
• Wait-free pools: array-based, simple yet inefficient
Why are wait-free pools hard to design?
Common wait-free paradigms require dynamic memory
allocation …
13Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 14: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/14.jpg)
Array-based pools
Array-based wait-free pools
• Consists of array holding atomic reservation flags
• Threads traverse reservation array from the beginning
and try to reserve a flag atomically (CAS)
• Index of successfully toggled flag is acquired element index
• Worst-case complexity: O(n)
14Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 15: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/15.jpg)
Compartment pool
Wait-free pool with thread-specific compartments
• Array-based pool with additional range of elements that
can only be acquired by a specific thread
• Threads acquire elements from their private compartment
first
15Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 16: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/16.jpg)
Wait-free data structures:
Pools - Evaluation
16Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 17: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/17.jpg)
Wait-free data structures:
Pools - Evaluation
17Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 18: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/18.jpg)
Wait-free data structures:
Pools - Evaluation
18Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 19: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/19.jpg)
Queues
19Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 20: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/20.jpg)
Queues:
Related work
Related work
Kogan and Petrank presented the first wait-free queue for
multiple enqueuers and dequeuersWait-Free Queues With Multiple Enqueuers and Dequeuers (Kogan, Petrank, 2011)
- Implemented in Java
- Relying on garbage collection
- Requires monotonic counter (phase)
20Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 21: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/21.jpg)
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueRedesign helping scheme to remove phase counter
• In original publication, new phase value is greater than all
phases of any announced operation (including non-pending)
21Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 22: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/22.jpg)
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueRedesign helping scheme to remove phase counter
• Modification: Help all other non-pending operations first
• Possibly helping operations that are newer than the thread‘s
own operation
22Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 23: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/23.jpg)
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueRedesign helping scheme to remove phase counter
• Fairness is maintained: all other threads are guaranteed
to help this thread’s operation before engaging in their own
23Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 24: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/24.jpg)
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueMemory reclamation
Hazard pointers scheme typically presented as a solutionHazard pointers: Safe memory reclamation for lock-free objects (Michael, 2004)
24Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 25: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/25.jpg)
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueIntroduce hazard pointers
Step 1: Find upper memory bound for hazard pointers
Step 2: Guard queue nodes using hazard pointers
25Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 26: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/26.jpg)
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueIntroduce hazard pointers
Step 2: Guard queue nodes using hazard pointers
Culprit: Guarding is not wait-free
pointer p = node.Next;// -- possible change of node.Next –while(hp.GuardPointer(p) && p != node.Next) {
// Release and retry, unbounded number of retrieshp.ReleaseGuard(p);
}
26Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 27: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/27.jpg)
Kogan-Petrank queue
Adapting the Kogan-Petrank wait-free queueIntroduce hazard pointers
Step 2: Guard queue nodes using hazard pointers
Culprit: Guarding is not wait-free
Fortunately, retry loops can be avoided in the Kogan-
Petrank queue, but the implementation is not trivial
see implementation at
https://github.com/fuchsto/embb/tree/benchmark/
27Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 28: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/28.jpg)
Queues - Evaluation
Queue benchmark scenarios
In addition to scenarios for bag semantics
• Buffer latency
Elements enqueued with current timestamp, difference from
timestamp at dequeue is buffer latency
28Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 29: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/29.jpg)
Queues - Evaluation
29Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 30: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/30.jpg)
Queues - Evaluation
30Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 31: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/31.jpg)
Stacks
31Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 32: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/32.jpg)
Stacks:
Related work
Related work
Fatourou presented a wait-free “universal” construction
that is applicable for stacksWait-Free Queues With Multiple Enqueuers and Dequeuers (Kogan, Petrank, 2011)
32Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 33: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/33.jpg)
Elimination stack
Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2011)
Principle
• Optimized helping scheme
• Threads apply operations to a local copy of the stack
• Every thread tries to replace the global shared object with
its local copy via CAS
• Only applicable for shared objects with small state
33Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 34: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/34.jpg)
Elimination stack
Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2011)
Elimination
• Push and Pop have reverse semantics:Push(Pop(stack)) = Pop(Push(stack)) = stack
• Eliminated operations are completed immediately
if they do not alter the object’s state
Significantly improves performance if applicable
34Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 35: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/35.jpg)
Elimination stack
Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2013)
Original version is not suitable for real-time applications:
- ABA problem is prevented using tagged pointers
- Thread-local pools with unbounded capacity
- No deallocation in published algorithm
35Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 36: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/36.jpg)
Elimination stack
Fatourou’s universal construction SIMA highly efficient universal construction (Fatourou, 2013)
Modified version of Fatourou’s stack
- Uses hazard pointers for safe reclamation
- Uses compartment pool with limited capacity
- Employs the elimination scheme from the original
publication
36Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 37: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/37.jpg)
Stacks:
Evaluation
37Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 38: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/38.jpg)
Stacks:
Evaluation
38Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 39: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/39.jpg)
Task scheduling
39Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 40: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/40.jpg)
Task Scheduling:
Objectives
Task Scheduling
• Intra-process task scheduling with priority queues
• Low-overhead, fine-grained scheduling of thousands of
small tasks
Priorities:
Focus on low latency and jitter reduction (i.e. predictability),
thus regarding maximum throughput as a secondary
benchmark.
40Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 41: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/41.jpg)
Task scheduling:
Work stealing
Work stealing
41Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
• One worker thread per
SMP core, no migration
• Tasks passed as &func
• Load-balancing on task
queues
• Many flavors of concrete
implementations
![Page 42: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/42.jpg)
Task scheduling:
Work stealing
Work stealing with task priorities
42Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
• Extended work-stealing
by queues for every
priority
•
![Page 43: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/43.jpg)
Conclusion
43Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 44: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/44.jpg)
Conclusion
Revisiting the objective
• Wait-free implementations of pools, queues and stacks now
available for real-time applications
• Benchmark framework and evaluation tools (R) are
published as open source
• Reproducible evaluation of real-time performance
• Verification tool chain on the way
44Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 45: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/45.jpg)
Conclusion
Recommendations
• Wait-free data structures can rival performance of lock-free
implementations
• But are hard to maintain
• Formal wait-freedom is practically not achievable
Employ wait-free data structures for fault-tolerance, not as a
guarantee for critical deadlines
45Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems
![Page 46: Wait-free data structures on embedded multi-core systems](https://reader034.vdocuments.net/reader034/viewer/2022052304/559b738a1a28ab844f8b4585/html5/thumbnails/46.jpg)
Thank You
Source code (data structures, benchmarks, R scripts): https://github.com/fuchsto/embb/tree/benchmark/
Official development source base of embb:https://github.com/siemens/embb/tree/development/
Wiki to this thesis:http://wiki.coreglit.ch
46Task Scheduling Algorithms and Wait-Free Data Structures for Embedded Multi-Core Systems