efficient(concurrency(supportroblkw.com/likamwa2015starfish-slides.pdf · facedetect(img) vision*...
TRANSCRIPT
Efficient Concurrency Supportfor
Computer Vision Applications
Robert LiKamWaLin Zhong
Rice University
Starfish
1
In the year 2020...
2
Where did I place my keys?
Oh, I haven't talked to Fred
recently...
3
Remind me to tell Lin to let me graduate!
Continuous mobile vision
?????
?????
4
Continuous mobile vision
Insufficient concurrency support for vision!
Energy consumption overwhelms wearable battery!
5
Application Capture Service
Vision Headers Vision
Library
6
Application Capture Service
Vision Headers Scale
Face
VisionLibrary
7
VisionLibrary
VisionLibrary
We need efficient concurrency support!
Camera is overworkedComputation is overworked
200+ ms
200+ ms
200+ ms
8
VisionLibrary
VisionLibrary
VisionLibrary
Observations:• Vision apps utilize common vision library
9
Face
SURF
Scale
Scale
Scale
Face
Observations:• Vision apps utilize common vision library• Capture/Computation is redundant across apps
common primitives
10
Face
SURF
Scale
Scale
Scale
Face
Observations:• Vision apps utilize common vision library• Capture/Computation is redundant across apps
common primitives
11
Share computations,Reduce redundancyKey Idea:
Proposal: Split-‐process design for centralized control
Scale
SURF
Face
Starfish CoreService
12
Vision Library
Starfish LibraryUses vision headers
to intercept library calls
Starfish Core ServiceExecutes, tracks, and shareslibrary call computations
Vision LibraryStarfishLibrary
StarfishLibrary
StarfishLibrary
Starfish CoreService
13
Vision Library
Vision LibraryStarfishLibrary
StarfishLibrary
StarfishLibrary
StarfishSplit-‐process library solution for
efficient, transparent multi-‐app service
14
First CallExecute call (20-‐200 ms)
Store results
Subsequent Call(s)Skip execution (0 ms)
Retrieve results
Starfish Core
Vision LibraryExec.
Function CacheStorage/Retrieval
Function Cache Arg Search
faceDetect(img)
faceDetect(img)
Vision LibraryStarfishLibrary
StarfishLibrary
StarfishLibrary
Starfish Core
FunctionCache
15
StarfishShare computations,Reduce redundancy
Cache Search
Vision Library
Vision LibraryStarfishLibrary
StarfishLibrary
StarfishLibrary
16
Timing optimizations:+ Reuse "Fresh Frames” to promote cache hits+ Delay call return to encourage device sleep
Starfish Core
FunctionCache
Cache Search
Vision Library
Vision LibraryStarfishLibrary
StarfishLibrary
StarfishLibrary
Starfish Core
FunctionCache
17
StarfishLibrary
StarfishLibrary
StarfishLibrary
Cache Search
Vision Library
+ Promote sharing by reusing "Fresh Frames”+ Maintain code privacy by delaying call return+ Manage concurrent requests through fine-‐grained locks
Mitigating Split-‐Process Overhead
Starfish Core
18
StarfishLibrary
StarfishLibrary
StarfishLibrary
FunctionCache
Cache Search
Vision Library
Reduce expense of argument passing
Avoid library object modification
Split-‐Process in the Literature
19
Zero-‐copyRequire library redesign
Object trackingOptimized for code offload
• Object Redefinition• Lightweight RPC• SUN RPC
• Cloud Transfer • COMET• MAUI• CloneCloud
Split-‐Process Argument Passing
20
faceDetect( )
Goal: Minimize deep copy overhead
3.5 ms
Shared MemoryStarfish Library Starfish Core
deep copyis expensive
Vision Library
Execution
1) Protected shallow copy
21
Issue copy-‐on-‐write (mprotect) on received objects
faceDetect( )
Shared MemoryStarfish Library Starfish Core
Vision Library
Execution
CoW
CoW
2) Direct output marshalling
22
Write new data directly into shared memory
faceDetect( )
Shared MemoryStarfish Library Starfish Core
Vision Library
Execution
CoW
CoW malloc()
3) Reuse arguments from prior calls
23
Track, reuse previous inputs/outputs
faceDetect( )
Shared MemoryStarfish Library Starfish Core
Vision Library
Execution
CoW
CoW malloc()
Argument Table (img, shm_ptr)
(Usually) Zero-‐Copy Argument Passing
24
+ Reduce deep copies through argument reuse+ Reduce allocation through buffer reuse
faceDetect( )
Shared MemoryStarfish Library Starfish Core
Vision Library
Execution
CoW
CoW malloc()
Argument Table (img, shm_ptr)
Starfish Optimizations• Share Computations• Maintain expectations of developers & users
• Increase sharing efficiency by relaxing timing
• Decrease redundancy• Reduce argument copy• Reuse memory buffers
25
26
Experimental Platform
Benchmarks:1) Per-‐call micro-‐benchmarks2) Multi-‐app benchmarks
OMAP4430 dual-‐core Cortex-‐A9 pinned to 600 MHz
OpenCV + Android + Google Glass Monsoon Power Monitor
27
First CallNative : 21 ms
Unopt. Starfish : 42 ms Starfish : 24ms
Memory optimizations cut Starfish overhead in half
0" 5" 10" 15" 20" 25"
Receive&Outputs&
Prepare&Outputs&
Allocate&Outputs&
Exec.&Function&
Search&Cache&
Send&Inputs&
Prepare&Inputs&
Execution&time&(ms)&
Library'call'execution:'resize()Native&
Starfish&Unopt.&
Starfish&
28
First CallNative : 21 ms
Unopt. Starfish : 42 ms Starfish : 24ms
Second CallNative : 21 ms
Unopt. Starfish : 10 ms Starfish : 6 ms
Starfish works well whennative function execution >> 5 ms
0" 5" 10" 15" 20" 25"
Receive&Outputs&
Prepare&Outputs&
Allocate&Outputs&
Exec.&Function&
Search&Cache&
Send&Inputs&
Prepare&Inputs&
Execution&time&(ms)&
Library'call'execution:'resize()Native&
Starfish&Unopt.&
Starfish&
Memory optimizations cut Starfish overhead in half
Places where Starfish fails• 5-‐6 ms performance overhead per-‐call:– Bad for fast computations– Bad with large, deep arguments
• Non-‐cacheable functions– Random, temporal, external dependencies– Functions with specific parameters
29
But great for many high-‐level functions:Face detect, Corner detect, Image resize
Starfish vs. Multi-‐App Workload
30
Scale FaceDetect
FaceRecog
If Lin then ThatSocial LoggerFacebookGoogle+TwitterMySpaceWhatsapp
...
0.3 FPS
0"0.5"1"
1.5"2"
2.5"3"
3.5"4"
4.5"5"
1" 2" 3" 4" 5" 6" 7" 8" 9" 10"
Frame&Rate&(FPS)&
Number&of&App&Instances&
Na/ve" Starfish"
When running multiple apps, Starfish achieves higher performance
31
Higher is
better
When running multiple apps,Starfish draws single-‐app power
32
0"
500"
1000"
1500"
2000"
2500"
1" 2" 3" 4" 5" 6" 7" 8" 9" 10"
Power&Consump-on&
(mW)&
Number&of&App&Instances&
Na.ve" Starfish"
Loweris
better
When running multiple apps,Starfish draws single-‐app power
33
0"
500"
1000"
1500"
2000"
2500"
1" 2" 3" 4" 5" 6" 7" 8" 9" 10"
Power&Consump-on&
(mW)&
Number&of&App&Instances&
Na.ve" Starfish"
Starfish achieves efficient concurrency support!
Starfish Core
FunctionCache
34
StarfishShare computations,Reduce redundancy
Cache Search
Vision Library
Vision LibraryStarfishLibrary
StarfishLibrary
StarfishLibrary
35
Continuingmobile vision
MitigatingAnalog
Signal ChainBandwidth
Preserving
User/Subject
Privacy
DesigningEfficientSystems
Starfish: Share computations, Reduce redundancy
36
Transparent, efficient library call caching• Timing optimizations for
frame freshness and performance preservation• Memory optimizations for
minimal deep copy and small cache footprint
Google Glass Experiments:• Low overhead
Memory optimizations slash overhead in half• Reduced power draw
Multi-‐app workloads draw Single app power
•