fast and parallel webpage layout
DESCRIPTION
CS722 Advanced System TopicsTRANSCRIPT
Fast and Parallel Webpage Layout � Leo A. Meyerovich, Rastislav Bodik
University of California, Berkeley
CPSC 722: Advanced Systems Seminar Presenter: Tian Pan
NYTimes: Facebook to rewrite their iOS app BBC: Facebook recodes iOS mobile app to address speed complaints Guardian: Facebook doubles iPhone app speed by dumping HTML5 for native code …
Let’s get started with a story… in June, 2012 Facebook…
There are 85,000 + iPhone applications in the same situation: refactoring existing UI / rewrite clients completely + downloaded over 2 billion times - cover less than 1% of online content
So we still need: A browser supporting emerging and diverse class of mobile devices
A fast and parallel mobile browser
However, - limited CPU computational resources. - The power wall forces hardware architects to apply increases in transistor counts towards improving parallel performance, not sequential performance.
1. Problem and background 2. Challenges
3. Solutions 4. Conclusion
Outline
Data flow in a browser
Lower bounds on CPU times for loading popular pages (Laptop)
Where are the bottlenecks in loading a page?
Where are the bottlenecks in loading a page?
Layout matching and rendering (34%)
Lower bounds on CPU times for loading popular pages (Laptop)
Input HTML tree
CSS
Fonts
Absolute element positions
Output
Layout matching and rendering (34%)
Layout matching and rendering steps
Categories I. Selector matching
step 1 II. Box and text layout
step 2, 4, 5, 6 III. Glyph handling
step 3 IV. Painting or rendering
step 7
Where are the bottlenecks in layout matching and rendering?
3 < 2 < 1 Challenges:
1. CSS selector matching 2. Box and text layout solving 3. Glyph rendering
1. Problem and background 2. Challenges
3. Solutions 3.1. CSS selector matching 3.2. Box and text layout 3.3. Glyph rendering
4. Conclusion
Outline
3.1 CSS Selector Matching Match CSS rules with HTML nodes
Style constraints p img { margin: 10px; } Selector
<p> <img blahblah></p>
DOM node with CSS rules
id hash table
attributes rules id1 r1 id2 r2 … …
CSS a list of selector{rules}
Selector {Rules} …id1 r1 …id2 r2 …class1 r3 …tag1 r4 …class2 r5 …class3 r6 … …
attributes rules class1 r3 class2 r5 class3 r6 … …
attributes rules tag1 r4 … …
class hash table
tag hash table
attributes rules id1 r1 id2 r2 … …
attributes rules class1 r3 class2 r5 class3 r6 … …
attributes rules tag1 r4 … …
node attributes
n1 id2 class2 class3 tag1
n2 id1 tag1
n3 class1 … …
HTML nodes
Map
node rules n2 r1 n1 r2 … …
… …
… …
n3 r3 n1 r5 n1 r6 … …
… …
… …
n1 r4 n3 r4
node rules
n1 r2 r5 r6 r4
n2 r1 r4
n3 r4 … …
Reduce
Optimizations adopted by WebKit: • Hashtables. [×] check CSS repeatedly for every node
[√] read only once, build hashmap, and check hash • Right-to-left matching. Most selectors can be matched
by only examing a short suffix of the path. Other Optimization: • Hash Tiling. partition the hashtable to idHash,
classHash, tagHash, … for reducing cache misses. (Also could have been parallel.)
• Tokenization. store attributes as int of tokens instead of string to save cache and comparison time.
• Random load balancing. Allocate selectors matching randomly instead of sequentially as origin.
Other Optimization: • Result pre-allocation. Pre-allocate space for popular
sites. • Delayed set insertion. Preallocate a vector with a size
of potential matches. • Non-STL sets. Create the vector with a size of
potential matches, add matches one by one and do linear collision checks.
3.1 CSS Selector Matching Evaluation
Cilk++: Overall 13x and 14.8x with and without Gmail Intel TBB: Overall 55.2x and 64.8x with and without Gmail
Workstation: 204ms -> 3.5ms Handheld: 3000ms ->50ms
3.2 Box and text layout Input: HTML tree nodes with symbolic constraint attributes Output: actual layout details (size, shape, position) waiting to be painted into pixels
Layout constraints input Layout constraints output
Unfortunately, it is hard to optimize, because CSS • Informal written and cross-cutting, e.g. infinite loops • Confusing for webpage designers • Need standards-compliant engines
Berkeley Style Sheets (BSS) A new, more orthogonal, concise, well-defined intermediate layout language • Transformed from CSS • Specified with an attribute grammar (chances
for parallelization) • BSS0 (vertical and horizontal boxes), BSS1
(BBS0+shrink-to-fit sizing), BSS2 (BBS1+left floats)
BSS0 (vertical and horizontal boxes)
Attribute Grammars Potential for parallelization attrA
attrB attrC
attrD attrE attrF attrG
attrA
attrB attrC
attrD attrE attrF attrG
IattrA IattrA
IattrB IattrA
IattrB IattrA
IattrB IattrA
IattrB IattrA
attrA
S1 S2
S3 S4 S5 S6
S3 S4 S5 S6
attr: attribute Iattr: inherited attribute S: synthesized attribute
S3 S4 S5 S6
S7 S8
S9
calcInherited()
calcSynthesized()
O(log|tree|)
n1 n2 n3
n4 n5 n6 n7
3.2 Layout Constraint Solving Evaluation
Slashdot.org, BSS1, Cilk++: 3x~4x
Till now, the size and position of texts have been calculated. How to render these texts?
3.3 Glyph Rendering
requests request groups pull and render
Parallel and locality benefits
Evaluation
FreeType2 font library, TBB: 3x~4x
3.3 Glyph Rendering
4 Conclusion
Address three bottlenecks of loading a page 1. CSS selector matching • Pre-built hash tables, map-reduce
2. Box and text layout solving • Specify layout as attribute grammars
3. Glyph rendering • Combine requests to groups and render
in parallel Milestone in building a parallel and mobile browser
Thanks~