cs 3214: project 2 fork-join threadpoolcs3214/spring2020/documents/threadpo… · fork-join...
TRANSCRIPT
Fork-Join Threadpool
Help SessionFriday, March 20th, 6 PMMonday, March 23th, 6 PM
Kent McDonough <[email protected]>Benjamin Reidys <[email protected]>
CS 3214: Project 2
Topics– Basics / Getting Started– Threadpool Design
– Work Stealing– Work Helping
– Implementation Decisions– Logistics
– Grades– Testdriver / scoreboard
– FAQ / Debugging Tips
REVISED
Basics / Getting Started
Some Basics– Threads
– Different from processes?– How?
– Threadpools– Automatic dependency management of tasks– Easy to quickly add concurrency without locking
semantics: just call the threadpool API
Some Basics
Getting Started– https://git.cs.vt.edu/cs3214-staff/threadlab
– Fork / clone the repo– Set to private– Similar to shell: grading on git usage
$ git clone https://git.cs.vt.edu/ashishb/threadlab
Getting Started– What do we write?
– Only threadpool.c– Forward declarations of structs, prototypes of
functions given in threadpool.h– Provide implementations of functions, definitions
of structs– Any other functions/variables should be static
Futures– What is a future?– They’re becoming a more common concept in
today’s world of parallel computers
fetch('https://s1.jamestaylr.net/status', {method: 'get'
}).then((response) => {return response.json();
}).then((json) {console.log(json);
}).catch((err) => {console.error(err);
});
Futures
Futures– For this project, a future is an instance of a task that
you must execute– To the user, a future is an opaque object, they can
only use it to get the result later– The result can be retrieved via future_get()– The job of the threadpool is to execute these futures
in parallel (as much as possible)
struct future– For this project, you control what the representation
of a future is– You’ll need (at a minimum) to keep track of the fork_join_task and the argument– fork_join_task_t is just a typedef of a
function pointer type– It’s just a function that you will execute to get the
result that you should return later
struct future– You may also need synchronization primitives for
your future, such as semaphores or mutexes
struct future {fork_join_task_t task;void * args;void * result;...
};
struct thread_pool– Should contain any state you need for a threadpool– Ideas:
– Locks (pthread_mutex_t)– Queues/Deques (can use P1 struct list)– Signaling primitives (semaphores or condition
variables, your choice)– Other information you may need
Functions to Implementstruct thread_pool * thread_pool_new(int nthreads);
void thread_pool_shutdown_and_destroy(struct thread_pool *);
struct future * thread_pool_submit(struct thread_pool *pool,fork_join_task_t task,void * data);
void * future_get(struct future *);
void future_free(struct future *);
ThreadpoolDesign
Threadpool Design– Methodologies
– Split up tasks among n workers– Work sharing– Work stealing
– Differences?– Advantages, disadvantages?
– Read section 2.1 thoroughly– No global variables!
Examplemergesort(threadpool tp, array A) {
future* f = threadpool_submit(tp, mergesort, A[..left]);
merge_sort_parallel(tp, A[right..]);return merge(future_get(f), A[right..]);
}
Task Tree
sort(A[0..32])
sort(A[32..64])
sort(A[0..64])
sort(A[0..16])
sort(A[16..32])
sort(A[32..48])
sort(A[48..64])
Illustrated TP APIA BGlobal
sort(A[0..64])
sort(A[0..32])
sort(A[0..16])
sort(A[32..64])
sort(A[16..32])
💤 💤
sort(A[32..48])
sort(A[48..64])
Illustrated Work StealingA BGlobal
sort(A[0..64])
sort(A[0..32])sort(A[0..16])
sort(A[32..64])
sort(A[16..32])
💤 💤
sort(A[32..48])
sort(A[48..64])
Work Stealing– Global queue of tasks– Local deque of tasks, per worker– Workers main loop:
– Do I have tasks? Pop from front (LIFO)– Are there global tasks? Pop from back (FIFO)– Does anyone else have tasks? Pop from back
(FIFO)– Idle threads spread out the work evenly– Each queue/deque has its own synchronization
Work Helping– In future_get, you can only return the result once
the future is done executing– The task might not be completed when future_get is called
– It might not even be running yet!
Examplefib(int n) = {
future* f = submit(fib, n - 1);int y = fib(n - 2);return future_get(f) + y;
}
Work Helping– Consider what needs to happen for you to get the
result from future_get in all cases– If the future is already executed, you’re done, you
have the result!– Woo!
– If the future is not done?
Work Helping– Naively, you could try to block on a semaphore
/condition variable until the future is completed, then return the result
– What if you only have one thread in the threadpool?
Work Helping– You want to minimize threads sleeping, and
maximize the time the threads are executing tasks– If no threads are executing the task you depend on,
you might as well do it yourself– What if a thread is executing the task you depend
on?– May be beneficial to execute other tasks instead of
sleeping until that task is done
Illustrated Example
F1 F2 F3 F4
List Lock
future_get only has a pointer to this future.
How does it gain access to the list to
remove it, or another thread’s list to work on
dependent tasks?F4 F5
List Lock
Implementation Decisions
Thread Local Variables– Want to be able to efficiently access your workers
deque (and probably locks) during thread_pool_submit()
struct future * thread_pool_submit(struct thread_pool * pool,fork_join_task_t task,void * data);
Thread Local Variables– How do you know which worker you are?– Could naively iterate all workers and check pthread_self()...
– Better idea to use some variable which would be different per thread!
– AKA thread local variables/storage
__thread int i;extern __thread struct state s;static __thread char *p;
Discussion of Method Requirements
struct future * thread_pool_submit(struct thread_pool * poolfork_join_task_t task, void * data)
void * future_get(struct future * future)
struct thread_pool * thread_pool_new(int nthreads)
Logistics /Grading
Logistics– Please submit code that compiles– Test using the driver before submitting!
– Don’t just run the tests individually– “Passing” a test means that you get the correct
result without crashing, within the time limit– When grading, these tests will be run 3-5 times, and
if you crash a single time, it’s considered failing– Benchmarked times will be the average of the 3-5
runs, assuming you pass all of them
Logistics: Grading– Grade breakdown:
– 9 points for logistics (git, documentation, etc.)– 18 points for basic tests– 28 points for advanced tests– 45 points for performance
– New dimension to systems assignments: performance
Logistics: Test Points– Test points breakdown:
– 6 points per basic test, 2 per thread count for passing
– 2 points per advanced test/size, only counts if you pass all thread counts for a test.
– That is, if you pass Mergesort Large with 5 threads, but don’t pass with 20 threads you won’t get any points for Mergesort Large.
Logistics: Performance– Relative to peers and sample implementations– No multithread performance increase = none of the
performance points (you may still get correctness points)
– Points only for the tests on the scoreboard (N queens, mergesort, quicksort, all the largest size)
– 5 points per thread count per test– Total of 45 points
Logistics: Performance– Pay attention to how your peers are doing on the
scoreboard– Aim for less than 5 seconds on the sorts (with 20
threads) and less than 2 seconds on N queens (with 20 threads) as a very loose guideline based on previous results
Test Driver
– Run with -r for only basic tests– Run with -t basic1 to select one test– Can take a long time to run all tests– Reports if you passed each test, and times for the
benchmarked ones
$ ~cs3214/bin/fjdriver.py [options]
Test Driver– Make sure you run multiple times, race conditions
can cause you to crash only 20% of the time– Will run multiple times to ensure consistency when
grading (and get a good average for times)– All of the tests are C programs, compiled against
your threadpool– Threadpool acts as a library
– Some toy tests (parallel fibonacci)– Some more practical tests that are benchmarked
Test Driver– Simulate grading environment:
– Runs the tests 5 times and averages results
$ ~cs3214/bin/fjdriver.py -g -B 5
Willgrindhttps://courses.cs.vt.edu/~cs3214/spring2020/#!/willgrind
• Handy tool for mass benchmarking/checking for race conditions
• Creates a custom webpage with visualizations
• Run from the directory containing threadpool.c
$ ~cs3214/bin/wjdriver.py
Scoreboard– https://courses.cs.vt.edu/~cs3214/spring2020/#/fjpo
olstats– You can post your results to the scoreboard by
using the fjpostresults.py script– Remove your old submissions to not clog it up
FAQ /Discussion
FAQHow long does this take?– Writing, not so long
– Implementations roughly 250 lines– Can spend a lot of time debugging
– GDB, Helgrind, and regular Valgrind are friends– Start early, as with all the projects– Don’t underestimate creating prototypes or trying
out different strategies
FAQMy code only fails the test some times, do I get partial credit?– No, unless you get lucky and it doesn’t fail when
grading
FAQMy code fails ___ test but not the others!– The tests are similar, but each submit tasks
differently and at different times– Possible to uncover different race conditions– Use helgrind to see where you might have atomicity
violations– Use GDB (info threads, thread apply all bt) if you’re
deadlocking to see where you’re stuck
FAQWhy does this not work as expected?
lock(list)future = list_pop(list)unlock(list)
lock(future)future.in_list = false…unlock(future)
lock(future)if (future.in_list)
lock(future.list_its_in)list_remove(future.elem)unlock(future.list_its_in)…
unlock(future)
Thread A Thread B
FAQWhy does this not work as expected?
lock(list)future = list_pop(list)
lock(future)future.in_list = false…unlock(future)unlock(list)
lock(future)if (future.in_list)
lock(future.list_its_in)list_remove(future.elem)unlock(future.list_its_in)…
unlock(future)
Thread A Thread B
Debugging– Multi-threading is difficult– Sometimes GDB will be useful, other times you may
want to try Helgrind instead– Unless you’re doing something very fancy, your
code should run quietly under Helgrind– However, running quietly under Helgrind gives no
guarantees
Debugging– Also try out Willgrind, a tool developed specifically
for this project.– Find race conditions, deadlock, and even profile
how balanced the tasks are between your threads.
Exercise 3:Outfoxed
Overview– Great intro to threadpool!– Teaches important threading constructs
– Don’t be lazy– Ordering, locking, synchronization
– Semaphores– Condition variables / broadcasts– Mutex locks
Program Flow– Three classes of threads: chicken detectives, clues,
and suspects– Functions: decode_clue, eliminate_suspect, announce_thief
🐔
🌷
🌂💼
🎩
👓🦊
🦊
🦊
🐔
🐔
Questions?Thank you for attending!