capriccio: scalable threads for internet...

Capriccio: Scalable Threads for Internet Services

Rob von Behren, Jeremy Condit, Feng Zhou, George C. Necula and Eric Brewer

University of California, Berkeley

Symposium on Operating System Principles (SOSP), October 2003

Presented By: Vijayasarathy Kannan

Overview

• Background

• Thread Design

• Linked Stack Management

• Resource Aware Scheduling

• Evaluation

• Related Work

• Conclusions

• Thoughts

Internet Services

• Transitioned from being a data-repository to providing a variety ofnetwork-accessible services.

• Focus has shifted from structured web sites providing static contentto distributed on-line services.

• Scalability and performance demands are increasing rapidly.

• Need for a programming model to design servers that cater tothese demands.

• Threaded and Event-based models have become popular choices.

Background

Thread Design

Linked Stack Management

Resource Aware Scheduling

Evaluation

Related Work

Conclusions

Thoughts

Duality Argument 1

• Neither type of the models are inherently preferable.

• A system based on one model has a distinct counterpart in theother.

• Provide mapping between concepts of the two models.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

1 On the Duality of Operating System Structures – Hugh C. Lauer and Roger M. Needham.

Events are favored!

• Events (Better than threads for most purposes) 1

• Flexible control flow.

• Lower synchronization cost due to less shared state.

• Faster than threads on single CPU.

• Threads (Use only when true CPU concurrency is needed) 1

• Hard to program and debug (synchronization, deadlocks).

• Hard to achieve good performance and port threaded code.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

1 Why Threads Are A Bad Idea – John Ousterhout

Why not threads ?

• Natural extension of sequential programming style to exploit concurrency.

• Map tasks that need to be executed with associated flows of control.

• Encapsulate state and express control flow in a natural manner (“callback soup”problem with events).

• Automatically manage state – no need to save and restore state manually (“stackripping” problem with events). 1

• Ease of exception handling due to stack lifetime. 1

• Tools and infrastructure available.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

1 Why Events Are A Bad Idea (for high-concurrency servers) - Rob von Behren, Jeremy Condit and Eric Brewer.

So, which paradigm ?

“These two categories are duals of each other.” – Lauer and Needham.

“Improve thread runtime system to eliminate historical reasons that favorevents” – Knot 1, Capriccio’s library based server.

Event-based servers - µServer 1

Hybrid approach – WatPipe 1, a hybrid pipelined server based on SEDA.

Event-based systems with thread-like code – Cooperative Task Management,Adya et al.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

1 Comparing the Performance of Web Server Architectures - David Pariag, Tim Brecht, Ashif Harji, Peter Buhr, and Amol Shukla.

Capriccio: Objectives & Approach

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

• Scalability and Flexibility – User-Level Threads

• Memory Management – Linked Stacks

• Application-specific scheduling – Resource Aware Scheduler

User-Level Threads

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

• Concerns about Kernel Threads

• Kernels have evolved over the years.

• Semantics among modern kernels vary substantially.

• Decouple threads of the programming model from those of theunderlying kernel.

• Logical threads encapsulate OS variation and kernel evolution.

Motivation for using UL Threads

Flexibility

• Scale number of threads without worrying about threading overhead.

• Makes scheduling of threads flexible (specific to applications).

Performance

• Lightweight – reduced synchronization overhead.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

A case against UL Threads

• Complicate preemption.

• Translation of blocking I/O to non-blocking I/O introduces complexity.

• More user-mode to kernel-mode switches due to non-blocking I/O.

• Cannot take advantage of multiple processors.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Thread Package Implementation

• Context Switches• Uses Toernig’s coroutine library.• Cooperative scheduling.

• I/O• Intercepts blocking I/O calls and converts them to asynchronous I/O.• Uses epoll and AIO mechanisms.

• Scheduling• Similar to event-driven application.• Based on thread resource utilization.

• Synchronization• Through cooperative scheduling.• Simple locked/unlocked flag for synchronization primitives.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Memory Management

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

• Bounded stack size for threads.

• Perform compiler analysis to determine stack space.

• Allocate on-demand and release memory when thread requires lessstack space.

• Linked Stack Management feature to minimize wasted stack space.

Weighted Call Graph

Nodes represent functions in the program; weighted by the maximum stack size neededfor execution.

A directed edge between nodes U and V indicates that function U called function Vdirectly.

Path Lengths (sum of weights of all nodes in the path) represent the total size of theassociated sequence of stack frames.

0.2K0.2K

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Checkpoints

• Inserted at call sites (edges) at compile time.

• Checks whether there is enough stack space left to reach nextcheckpoint.

• Allocate a new stack chunk and adjust stack pointer when the checkfails.

• Unlink the stack chunk when function call returns.

• Ensure that all paths between checkpoints are within a desired boundon stack size.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Where to place checkpoints ?

• On entry – C0

• On each back-edge – C1

• On each edge where the needed stack space to reach a leaf-node orthe next checkpoint exceeds the bound – C2 and C3

0.2K0.2K

Bound on Stack Size = 1K Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Dynamic Allocation

• Function B is executing.• 3 chunks (at C0, C2 and C3) have been allocated.• 0.5K and 0.2K are wasted in the first and second chunks

(shaded in gray).

Internal wasted space – Use MaxPath tospecify maximum desired path length

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Dynamic Allocation (contd.)

• Function A has directly called function D.• Only two chunks were necessary.• No wastage of space when function D is called.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

• Function C has directly called function D.• Total path length is only 0.9K; results in 0.1K space wastage.

External wasted space – useMinChunk to specify minimumstack chunk size

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

• A new chunk is allocated when E calls C.• Now at C1, code decides that there is enough space

remaining in the current chunk to reach either a leaf (D) orthe next checkpoint (C1).

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Challenges

• Function pointers• Issue

Cannot determine which function may be called from a given function atcompile time.

• SolutionCategorize them based on number and type of arguments.

• External function calls• Issue

Pre-compiled libraries make bounding of stack space difficult.

• SolutionAnnotate external functions with a bound.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Memory Benefits

• Avoids pre-allocation of large stacks.

• Linked stack chunks can be reused – reducing the working set of theapplication.

• bigstack() micro-benchmark – supplements scalability claims.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Blocking Graph

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Nodes denote locations (call chain until it reached the point) in theprogram that blocked.

An edge connects two consecutive blocking points.

Annotating a Blocking Graph

• Annotate nodes and edges with information about thread behavior.

• Weighted averages for• Edges – average running time for each edge.

• Nodes – how long the next edge will take.

• Annotate changes in CPU, memory and file-descriptors usage.

• Annotations help determine thread’s usage of each resource – willrunning the thread increase/decrease resource usage ?

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Making use of a Blocking Graph

• Track resource utilization and decide if each resource is within its limit.

• Use annotations to predict impact on each resource if a thread isscheduled.

• Prioritize threads for scheduling – when a resource is scarce, schedulethreads that release that resource.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Limitations of Resource-Aware Scheduler

• Difficult to determine maximum capacity of a resource.

• Application-specific and logical resources are hidden.

• Threads that do not yield present a performance problem.

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Thread Scalability

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Web Server performance

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Related Work

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

• Programming Models- Comparison of Thread-based and Event-based models.

- Scalable server systems – Flash and Harvest systems.

- Extensive web-server performance comparison – Knot vs. µServer vs. WatPipe

• User-Level Threads- Cooperative Threading – Filaments, NT’s Fibers, State Threads Package.

• Application-Specific Optimization- Application code into the Kernel – SPIN and VINO operating systems.

• Resource-Aware Scheduling- Monitor progress of applications – Douceur and Bolosky, Fowler et al.

Trivia

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

• Rust Programming Language 1

• Runtime implemented “segmented stacks” but later abandoned.

• Go Programming Language 2 – Incorporates Capriccio’s properties

• Share by communicating – achieve concurrency by communicating(shared) variables to separate threads of execution.

• Contiguous stacks – allocate/reallocate when stack for a routine fills up.

• Goroutines – function executing concurrently with other goroutines inthe same address space.

1 http://www.rust-lang.org/2 Documentation at http://golang.org

Conclusions

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

• Stay with threads! Fix them for scalable servers.

• Linked Stacks and Resource-Aware Scheduling help achieveperformance improvements.

• Opportunities for performance tuning.

Thoughts

• Presents a strong case for threaded programming model forscalable Internet servers.

• Capriccio’s runtime overcomes limitations of thread packages.

• Micro-benchmarks and testing setup used well supportevaluation methodology.

• No Multi-CPU support, yet!

Background

Thread Design

Evaluation

Related Work

Conclusions

Thoughts

Questions ?

capriccio: scalable threads for internet...

Documents

piggyggy gback contracting office of inspector … gback...

finefinefine---- grained javascript execution isolation...

1 chapter 5 threads 2 contents overview benefits user and...

industrial threads by threads technology (india), mumbai

1.2 the resolution of clock...

automatic trust negotiation 1dennis kafura – cs5204 –...

programação usando threads em c - dcc.ufmg.br · threads...

dynamo highly available key-value store 1dennis kafura –...

type of threads - how to identify threads

processor management (part 1: threads)...p! threads !...

thomas hennerty - people at vt computer...

mapreduce concurrency for data-intensive applications...

cryptographic security secret sharing, vanishing data...

modern concurrency abstractions for...

ariadne: architecture of a portable threads system...

1 threads & scheduling what are threads? vs. processes where...

macrame lace book - cs.arizona.edu · macrame lace threads...

automatic trust negotiation presented by: scott hackman...

6.1 capitulo 6 threads processos versus threads utilidade de...

access control a meta-model 1dennis kafura – cs5204 –...