scalable and private media consumption with...

25
Scalable and private media consumption with Popcorn Trinabh Gupta The University of Texas at Austin

Upload: others

Post on 08-Aug-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Scalable and private media consumption with Popcorn

Trinabh GuptaThe University of Texas at Austin

Page 2: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

90 minutes/day The Godfather

give me The Godfather

database of request trace, movie ratings, etc.

User media consumption has increased …

… leading to large centralized datasets …

??

movie 1

movie 2

movie 3

movie 1

movie 2

movie 3

researcher

anonymized dataset of movie ratings

de-anonymized dataset of movie ratings

… subject to risks such as server hacks, accidental disclosures, etc.

Page 3: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each
Page 4: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

How can we build a Netflix-like system that

a) provably hides media diet,

b) has low dollar cost, and

c) is compatible with commercial media streaming?

Page 5: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Private Information Retrieval (PIR) provably

hides requests but …

wants

The Godfather The Godfather [hidden]

give me [hidden]

• Each request must touch the entire library.

• There is a tension between overhead and content protection.

• PIR assumes fixed-size objects, but media sizes vary.

Page 6: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Its per-request dollar cost is 3.87x times that of a non-private baseline.

Popcorn tailors PIR for media to meet our three requirements.

6

Page 7: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Rest of this talk

• Background on PIR.

• Challenges of using PIR (in detail).

• Design (tailoring of PIR) and evaluation of Popcorn.

7

Page 8: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Pick a subset of

{1, 2, 3, 4, 5}

randomly

No

collusion

Background on information-theoretic PIR (ITPIR)

Server1

Client

M1 = Reply1 Reply2

Reply1 =

M2 M4

M1

M2

M3

M4

M5

01111001…....

010111000….

10101011……

11100000……

0011000.…….

M1

M2

M3

M4

M5

01111001…....

010111000….

10101011……

11100000……

0011000.…….

Ex: {3, 4}Ex: {1, 2, 4, 5}

Server 2

{2, 4}

Reply 2 =

M1 M2 M4

wants M1

Page 9: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Computational PIR (CPIR) from 10,000 feet

Client

M1

M2

M3

M4

M5

01111001…....

010111000….

10101011……

11100000……

0011000.…….

• one server

• instead of XORs, expensive server-side cryptographic operations

Page 10: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Given these, how can we build a system that controls content and is low cost?

cheap operations (XORs)

but process entire library

per request

CPIR

expensive operations and

process entire library per

request

ITPIR

assumes fixed-size objects assumes fixed-size objects

content can disseminate in

an uncontrolled manner

content disseminates

in a controlled manner

Challenges of using PIR

Page 11: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Server 1

(library owner)

Client

Server 2

ITPIR

CPIR

Popcorn composes ITPIR and CPIR to get desirable properties from both

Enc(K1, M1)

Enc(K2, M2)

Enc(K3, M3)

Enc(K4, M4)

Enc(K5, M5)

Enc(K1, M1)

Enc(K2, M2)

Enc(K3, M3)

Enc(K4, M4)

Enc(K5, M5)

different

administrative domainsEnc(K1, M1)

K1

K1 K2 K3 K4 K5

Key library

encrypted movie

key to decrypt movie

Page 12: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

cheap operations (XORs)

but process entire library

per request

CPIR

expensive operations and

process entire library per

request

ITPIR

assumes fixed-size objects assumes fixed-size objects

content can disseminate in

an uncontrolled manner

content disseminates in

a controlled manner

Popcorn

Challenges of using PIR

Page 13: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Reply = M1 M3 M4 M5

Observation: Very similar disk I/O for each request!

Benefits of batching:

• Disk I/O transfers are amortized.

• CPU cycles are reduced as matrix multiplication algorithms

exploit cache locality.

Popcorn batches requests to amortize the overhead of ITPIR

{1, 3, 5}

{1, 3, 4, 5}

{2, 4}

Reply = M1 M3 M5

Reply = M2 M4

M1

M2

M3

M4

M5

01111001…....

010111000….

10101011……

11100000……

0011000.…….

Server1

Pick a subset of

{1, 2, 3, 4, 5}

randomly

Client 1Client 2

Client 3

Page 14: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

time

client A client B client C

Straw man: Group requests that arrive during an epoch

epoch

client Await for

server to

form batch

start handling

A, B, C

client A’s playback buffer

client perceived delay = epoch + epsilon

first chunk

of movie

Client’s view:

Page 15: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

time

client A client B client C

Straw man: Group requests that arrive during an epoch

epoch start handling

A, B, C

Small batch, small delay Large batch, large delay

Issue: Hard to get both small delay and large batch

Server’s choices:

Page 16: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

t = times at which a client needs movie chunks

Popcorn exploits streaming to form large batches with small startup delay

t = 0

chunks of a movie

t =

t = 2

= time it takes

to consume a

single chunk

Observation: Client needs only the first chunk immediately.

t

t

tt = 3t

Page 17: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

2nd library

column

1st library

column

0101111101101101001010010010010111001101111

Narrow first

column => small

startup delay

3rd library

column

0101111 10110110100 1011010010010010111001101111

1011011

1001001

00100010011

11100011101

Movie 1

Movie 2

Movie 3

0000100011110001110100100100

0011111000000011010101010111

1001001 11100011101Movie 4 0011111000000011010101010111

Wider columns => longer

processing times …

… but bigger batches

Movie 1

Page 18: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

ITPIR CPIR Popcorn

content can

disseminate in an

uncontrolled manner

content disseminates

in a controlled

manner

content disseminates

in a controlled

manner

cheap operations

but process entire

library per request

expensive operations,

process entire

library per request

cheap operations,

process entire

library per batch

assumes fixed-size

objects

assumes fixed-size

objects ?

Page 19: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Popcorn exploits compression to address fixed-size requirement

Length

Pad

• Small variations in bitrate have limited impact on user satisfaction [SIGCOMM 11, LANC 11, CCNC 12].

• 85% of movies close to the average size.

Length of Oavg

Page 20: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Outline

Background on PIR.

Design (tailoring of PIR) of Popcorn.

• Evaluation of Popcorn.

Popcorn

Page 21: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Experiment methodBaselines:

• Non-private system (Apache server)

• State-of-the-art CPIR [XPIR PETS16]

• State-of-the-art ITPIR [Percy++]

• ITPIR++: ITPIR extended with the straw man batching scheme

Netflix-like library: 8000 movies, 90 minutes, 4Mbps

Workload: 10K clients arrive within 90 minutes according to a

Poisson process

Estimate per-request dollar cost using Amazon’s pricing model• CPU: $0.0076/hour

• Disk I/O bandwidth: $0.042/Gbps-hour

• Network: $0.006/GB

Page 22: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

System # of

CPUs

Disk I/O

(Gbps)

Network

(relative to

non-private)

$ relative

to non-

private

Non-private 0 0 1x 1x

CPIR 11.6 64 5x 265x

Popcorn

(delay 15s)

0.74 0.23 2x 3.87x

ITPIR 3.1 64 2x 256x

ITPIR++

(delay 15s)

0.65 3 2x 14x

Page 23: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Popcorn is private and affordable but …

• Assumes that the ITPIR servers do not collude.

• Incurs costs that are linear in the size of the library.

• Does not support recommendations, aggregate

view statistics.

Solution: Use prior work [Canny S&P ’02, Toubiana et al. NDSS

‘10]

Page 24: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Related work

• Improving performance of PIR. • Distributing work [FC13, TDSC12], cheaper crypto [PETS16, ESORICS14,

ISC10, TKDE13, WEWoRC07], bucketing [DBSec10, PETS10 ], batching [FC15,

JoC04], secure co-processors [PET03, FAST13, NDSS08, IBM Systems Journal01]

• Protecting library content in ITPIR [RANDOM98, S&P07, WPES13]

• Handling variable-sized objects [CCSW14, NDSS13]

• Prior PIR implementations [Percy++, PETS16, CCSW14]

• Video-on-demand [MMCN95]

Page 25: Scalable and private media consumption with Popcorndimacs.rutgers.edu/Workshops/BigDataHub/Slides/gupta-popcorn.pdf · The Godfather The Godfather [hidden] give me [hidden] •Each

Take-away points from Popcorn

• It is possible to build a private, functional, and low-cost media delivery system …

• … by tailoring PIR to media delivery.

• The per-request cost in Popcorn is 3.87x that of a non-private baseline.