interactive latency in big data visualization
DESCRIPTION
Interactive Latency in Big Data Visualization Zhicheng "Leo" Liu, Research Scientist at the Creative Technologies Lab at Adobe Research January 22nd, 2014 Reducing interactive latency is a central problem in visualizing large datasets. I discuss two inter-related projects in this problem space. First, I present the imMens system and show how we can achieve real-time interaction at 50 frames per second for billions of data points by combining techniques such as data tiling and parallel processing. Second, I discuss an ongoing user study that aims to understand the effect of interactive latency on human cognitive behavior in exploratory visual analysis. Big Data Visualization Meetup - South Bay http://www.meetup.com/Big-Data-Visualisation-South-Bay/TRANSCRIPT
Interactive Latency in Big Data Visualization
Zhicheng “Leo” Liu Jan 22, 2014
Latency: a measure of time delay experienced in a system
rotational latency
network latency
query latency
interactive latency
Questions
How to reduce interactive latency in big data visualization? How does interactive latency affect user behavior?
Questions
How to reduce interactive latency in big data visualization? How does interactive latency affect user behavior?
Reducing Latency
More memory in-memory data store
Clever indexing cube representation schemes
Parallel processing multicore, GPGPU, distributed platforms
imMens: a holistic approach
Perceptual scalability Binned aggregation as primary data reduction strategy Interactive scalability Multivariate data tiles Parallel query processing and rendering on GPU
[Liu et. al. 2013]
imMens: a holistic approach
Perceptual scalability Binned aggregation as primary data reduction strategy Interactive scalability Multivariate data tiles Parallel query processing and rendering on GPU
[Liu et. al. 2013]
Guiding Principle
Perceptual & interactive scalability should be limited by the chosen resolution of the visualized data,
not the number of records.
10
Data
11
Data
Alpha-blending
12
Data
13
Data Sampling
14
Data Sampling
Modeling
15
Data Sampling
Modeling Binned Aggregation
Google Fusion Tables: Sampling
16 Sampling
17 Aggregation
Binned Plots: Design Space
18
numeric ordinal/categorical temporal geographic
1D
2D
imMens: a holistic approach
Perceptual scalability Binned aggregation as primary data reduction strategy Interactive scalability Multivariate data tiles Parallel query processing and rendering on GPU
[Liu et. al. 2013]
Demo
Multivariate Data Tiles
21
Projections / Materialized database views
Provide data for dynamic visualization
Much faster than a traditional data cube
22
Brush & Link: A Naïve Approach
23
X!
Y!
256
…
767
512 1023 …
Day!
Hour!
Month!
23 …
0 1 … 30
0 …
11
1
23 …
0 …
11
0 1 … 30 0 1 … 30
0
23 …
0
11
1
0
…
1
0
12 x 31 x 24 x 512 x 512 = ~2.3 billion cells
Brushing Over January
24
X!
Y!
256
…
767
512 1023 …
Day!
Hour!
Month!
23 …
0 1 … 30
0 …
11
1
23 …
0 …
11
0 1 … 30 0 1 … 30
0
23 …
0
11
1
0
…
1
0
31 x 24 x 512 x 512 = ~195 million cells
Sum Along Day
25
X!
Y!
256
…
767
512 1023 …
[ 0 – 30 ] Day!
Hour!
Month!
23 …
0 …
11
1
23 …
0 …
11
[ 0 – 30 ] [ 0 - 30 ]
0
23 …
0
11
1
0
…
1
0
24 x 512 x 512 = ~6 million cells
Sum Along Hour
26
X!
Y!
256
…
767
512 1023 …
[ 0 – 30 ] Day!
Hour!
Month!
[ 0 – 23 ] 0
… 11
0 …
11
[ 0 – 30 ] [ 0 - 30 ]
[ 0 – 23 ] 0
11 …
[ 0 – 23 ]
512 x 512 cells
Decomposing a Data Cube
27
For any pair of 1D or 2D binned plots, the maximum number of dimensions needed to support brushing & linking is 4.
full 5-D cube!
Day!
Hou
r!
Month!
0 1 … 30
0 …
11
Y!
Hou
r!X!
512 513 … 1023
256 … 767
Y!
Day!
X!
512 513 … 1023
256 … 767
Y!
Mon
th!
X!
512 513… 1023
256 … 767
3-D !cubes!
23 …
1 0
23 …
1 0
30 …
1 0
11 …
1 0
Σ Σ Σ Σ
28
Tiles
29
X: 256-511 X: 512-767
Y: 5
12-7
67
Y: 7
68-1
023
Day: 31 bins
Y: 512 -‐ 1023
day: 0 -‐ 31
From Datacube to Data Tiles
30
512 513 … 767
256 …
511
30 …
1 0
512 513 … 767
512 …
767
30 …
1 0
768 769 … 1023
256 …
511
30 …
1 0
768 769 … 1023
512 …
767
30 …
1 0
Data Tiles
31
x1-y1-month
32
x1-y1-day
33
x1-y1-hour
34
x1-y2-month
35
x1-y2-day
36
x1-y2-hour
37
x2-y1-month
38
x2-y1-day
39
x2-y1-hour
40
x2-y2-month
41
x2-y2-day
42
x2-y2-hour
43
month-day-hour
44
45
imMens Architecture
46 SciDB, Postgres
Client
Server
UI control VisualizaHon
specify
brush & link
zoom & pan
Client-Side Processing
47
0
1
… 11
768 769 … 1023
512 513
… 767
R G B A
R G B A
… … … …
R G B A
data Hles
query fragment shader
Y [768-‐1023]
X [512-‐767]
{ 0
1
…
11
Pass 1 projecHons off-‐screen FBO
render fragment shader
Pass 2 canvas
Pack data Hles as images (352KB for Brightkite) Bind to WebGL context as textures
48
Simulate brush & linking across plots in a scatter plot matrix imMens vs. full data cube 60 synthesized datasets
Parameters bin count per dimension (10,20,30,40,50)
number of records (10K, 100K, 1M, 10M, 100M, 1B)
number of dimensions (4,5)
Performance Benchmarks
49
Google Chrome v.23.0.1271.95 on a quad-core 2.3 GHz MacBook Pro (OS X 10.8.2) with per-core 256K L2 caches, shared 6MB L3 cache and 8GB RAM. PCI Express NVIDIA GeForce GT 650M graphics card with 1024MB video RAM.
51.9 52.3 51.6 52.0 53.2 52.1
5.5 3.0 2.2
50
Google Chrome v.23.0.1271.95 on a quad-core 2.3 GHz MacBook Pro (OS X 10.8.2) with per-core 256K L2 caches, shared 6MB L3 cache and 8GB RAM. PCI Express NVIDIA GeForce GT 650M graphics card with 1024MB video RAM.
51.9 52.3 51.6 52.0 53.2 52.1
5.5 3.0 2.2
51
Google Chrome v.23.0.1271.95 on a quad-core 2.3 GHz MacBook Pro (OS X 10.8.2) with per-core 256K L2 caches, shared 6MB L3 cache and 8GB RAM. PCI Express NVIDIA GeForce GT 650M graphics card with 1024MB video RAM.
51.9 52.3 51.6 52.0 53.2 52.1
5.5 3.0 2.2
50fps querying and rendering of 1B data points
Speed of Thought?
Questions
How to reduce interactive latency in big data visualization? How does interactive latency affect user behavior?
Newell (1994): Unified Theories of Cognition
Newell (1994) Card et al (1983) Example Time Range
deliberate act perceptual fusion recognize a pattern, track animation
~100 milliseconds
cognitive operation unprepared response click a link, select an object
~1 second
unit task unit task edit a line of text, make a chess move
~10 seconds
~300ms: The Embodiment Level
Deictic Strategy
Pointing movements bind objects in the world
Small changes in cost of binding cause different cognitive behavior
Latency affects high-level/longitudinal strategies
Block-copying Ballard et al (1995, 1997)
8-puzzle solving O’Hara and Payne (1998, 1999)
Search Brutlag (2009)
Exploratory Visual Analysis?
Operation Low High brush & link ~20ms ~20ms + 500ms
select ~20ms ~20ms + 500ms
pan ~100ms ~100ms + 500ms
zoom ~1000ms ~1000ms + 500ms
Latency Conditions
Datasets
Study Design
16 participants, 32 observations 2 X 2 between subject interaction logs audio transcripts
Log Events
System and Mouse Events brush, select, zoom, pan, clear, color slider, log scale tiles cached, mouse down, mouse up, mouse move
Trigger vs. Processed System Events debouncing keeps system usable timestamp, event type, parameters
Normalized Processed Events
How to Evaluate Performance?
The purpose of visualization is insight, not pictures.
Counting Insights
What is an insight?
"many new airlines emerged around year 2003”
"HP started in 2001, AS in 2003, PI in 2004, OH in 2003”
“OH started in 2003, and they are doing pretty well in terms of delays”
Questions
How to reduce interactive latency in big data visualization?
imMens: a system supporting real-time interaction binned aggregation for perceptual scalability multivariate data tiles & GPU processing for low latency How does interactive latency affect user behavior?
Comparative study: quantitative & qualitative analysis
Questions
How to reduce interactive latency in big data visualization?
imMens: a system supporting real-time interaction binned aggregation for perceptual scalability multivariate data tiles & GPU processing for low latency How does interactive latency affect user behavior?
Questions
How to reduce interactive latency in big data visualization?
imMens: a system supporting real-time interaction binned aggregation for perceptual scalability multivariate data tiles & GPU processing for low latency How does interactive latency affect user behavior?
User study: quantitative & qualitative analysis
Acknowledgment
Jeffrey Heer Biye Jiang
Thank You