fast and parallel webpage layout

28
Fast and Parallel Webpage Layout Leo A. Meyerovich, Rastislav Bodik University of California, Berkeley CPSC 722: Advanced Systems Seminar Presenter: Tian Pan

Upload: tian-pan

Post on 10-May-2015

1.626 views

Category:

Documents


2 download

DESCRIPTION

CS722 Advanced System Topics

TRANSCRIPT

Page 1: Fast and Parallel Webpage Layout

Fast and Parallel Webpage Layout � Leo A. Meyerovich, Rastislav Bodik

University of California, Berkeley

CPSC 722: Advanced Systems Seminar Presenter: Tian Pan

Page 2: Fast and Parallel Webpage Layout

NYTimes: Facebook to rewrite their iOS app BBC: Facebook recodes iOS mobile app to address speed complaints Guardian: Facebook doubles iPhone app speed by dumping HTML5 for native code …

Let’s get started with a story… in June, 2012 Facebook…

Page 3: Fast and Parallel Webpage Layout

There are 85,000 + iPhone applications in the same situation: refactoring existing UI / rewrite clients completely + downloaded over 2 billion times - cover less than 1% of online content

Page 4: Fast and Parallel Webpage Layout

So we still need: A browser supporting emerging and diverse class of mobile devices

A fast and parallel mobile browser

However, - limited CPU computational resources. - The power wall forces hardware architects to apply increases in transistor counts towards improving parallel performance, not sequential performance.

Page 5: Fast and Parallel Webpage Layout

1.  Problem and background 2.  Challenges

3.  Solutions 4.  Conclusion

Outline

Page 6: Fast and Parallel Webpage Layout

Data flow in a browser

Page 7: Fast and Parallel Webpage Layout

Lower bounds on CPU times for loading popular pages (Laptop)

Where are the bottlenecks in loading a page?

Page 8: Fast and Parallel Webpage Layout

Where are the bottlenecks in loading a page?

Layout matching and rendering (34%)

Lower bounds on CPU times for loading popular pages (Laptop)

Page 9: Fast and Parallel Webpage Layout

Input HTML tree

CSS

Fonts

Absolute element positions

Output

Layout matching and rendering (34%)

Page 10: Fast and Parallel Webpage Layout

Layout matching and rendering steps

Categories I.  Selector matching

step 1 II.  Box and text layout

step 2, 4, 5, 6 III.  Glyph handling

step 3 IV.  Painting or rendering

step 7

Page 11: Fast and Parallel Webpage Layout

Where are the bottlenecks in layout matching and rendering?

3 < 2 < 1 Challenges:

1. CSS selector matching 2. Box and text layout solving 3. Glyph rendering

Page 12: Fast and Parallel Webpage Layout

1.  Problem and background 2.  Challenges

3.  Solutions 3.1. CSS selector matching 3.2. Box and text layout 3.3. Glyph rendering

4.  Conclusion

Outline

Page 13: Fast and Parallel Webpage Layout

3.1 CSS Selector Matching Match CSS rules with HTML nodes

Style constraints p img { margin: 10px; } Selector

<p> <img blahblah></p>

DOM node with CSS rules

Page 14: Fast and Parallel Webpage Layout

id hash table

attributes rules id1 r1 id2 r2 … …

CSS a list of selector{rules}

Selector {Rules} …id1 r1 …id2 r2 …class1 r3 …tag1 r4 …class2 r5 …class3 r6 … …

attributes rules class1 r3 class2 r5 class3 r6 … …

attributes rules tag1 r4 … …

class hash table

tag hash table

Page 15: Fast and Parallel Webpage Layout

attributes rules id1 r1 id2 r2 … …

attributes rules class1 r3 class2 r5 class3 r6 … …

attributes rules tag1 r4 … …

node attributes

n1 id2 class2 class3 tag1

n2 id1 tag1

n3 class1 … …

HTML nodes

Map

node rules n2 r1 n1 r2 … …

… …

… …

n3 r3 n1 r5 n1 r6 … …

… …

… …

n1 r4 n3 r4

node rules

n1 r2 r5 r6 r4

n2 r1 r4

n3 r4 … …

Reduce

Page 16: Fast and Parallel Webpage Layout

Optimizations adopted by WebKit: •  Hashtables. [×] check CSS repeatedly for every node

[√] read only once, build hashmap, and check hash •  Right-to-left matching. Most selectors can be matched

by only examing a short suffix of the path. Other Optimization: •  Hash Tiling. partition the hashtable to idHash,

classHash, tagHash, … for reducing cache misses. (Also could have been parallel.)

•  Tokenization. store attributes as int of tokens instead of string to save cache and comparison time.

•  Random load balancing. Allocate selectors matching randomly instead of sequentially as origin.

Page 17: Fast and Parallel Webpage Layout

Other Optimization: •  Result pre-allocation. Pre-allocate space for popular

sites. •  Delayed set insertion. Preallocate a vector with a size

of potential matches. •  Non-STL sets. Create the vector with a size of

potential matches, add matches one by one and do linear collision checks.

Page 18: Fast and Parallel Webpage Layout

3.1 CSS Selector Matching Evaluation

Cilk++: Overall 13x and 14.8x with and without Gmail Intel TBB: Overall 55.2x and 64.8x with and without Gmail

Workstation: 204ms -> 3.5ms Handheld: 3000ms ->50ms

Page 19: Fast and Parallel Webpage Layout

3.2 Box and text layout Input: HTML tree nodes with symbolic constraint attributes Output: actual layout details (size, shape, position) waiting to be painted into pixels

Layout constraints input Layout constraints output

Page 20: Fast and Parallel Webpage Layout

Unfortunately, it is hard to optimize, because CSS •  Informal written and cross-cutting, e.g. infinite loops •  Confusing for webpage designers •  Need standards-compliant engines

Page 21: Fast and Parallel Webpage Layout

Berkeley Style Sheets (BSS) A new, more orthogonal, concise, well-defined intermediate layout language • Transformed from CSS • Specified with an attribute grammar (chances

for parallelization) • BSS0 (vertical and horizontal boxes), BSS1

(BBS0+shrink-to-fit sizing), BSS2 (BBS1+left floats)

Page 22: Fast and Parallel Webpage Layout

BSS0 (vertical and horizontal boxes)

Page 23: Fast and Parallel Webpage Layout

Attribute Grammars Potential for parallelization attrA

attrB attrC

attrD attrE attrF attrG

attrA

attrB attrC

attrD attrE attrF attrG

IattrA IattrA

IattrB IattrA

IattrB IattrA

IattrB IattrA

IattrB IattrA

attrA

S1 S2

S3 S4 S5 S6

S3 S4 S5 S6

attr: attribute Iattr: inherited attribute S: synthesized attribute

S3 S4 S5 S6

S7 S8

S9

calcInherited()

calcSynthesized()

O(log|tree|)

n1 n2 n3

n4 n5 n6 n7

Page 24: Fast and Parallel Webpage Layout

3.2 Layout Constraint Solving Evaluation

Slashdot.org, BSS1, Cilk++: 3x~4x

Page 25: Fast and Parallel Webpage Layout

Till now, the size and position of texts have been calculated. How to render these texts?

3.3 Glyph Rendering

requests request groups pull and render

Parallel and locality benefits

Page 26: Fast and Parallel Webpage Layout

Evaluation

FreeType2 font library, TBB: 3x~4x

3.3 Glyph Rendering

Page 27: Fast and Parallel Webpage Layout

4 Conclusion

Address three bottlenecks of loading a page 1. CSS selector matching •  Pre-built hash tables, map-reduce

2. Box and text layout solving •  Specify layout as attribute grammars

3. Glyph rendering •  Combine requests to groups and render

in parallel Milestone in building a parallel and mobile browser

Page 28: Fast and Parallel Webpage Layout

Thanks~