privacy-aware regression modeling of participatory sensing data

32
Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora Privacy-aware Regression Modeling of Participatory Sensing Data

Upload: renate

Post on 24-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

Privacy-aware Regression Modeling of Participatory Sensing Data. Hossein Ahmadi , Nam Pham, Raghu Ganti , Tarek Abdelzaher , Suman Nath , Jiawei Han Pallavi Arora. Outline. Introduction Problem Formulation Linear regression Privacy Filter Application Server Model Construction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Privacy-aware Regression Modeling of Participatory Sensing Data

Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han

Pallavi Arora

Privacy-aware Regression Modeling of Participatory

Sensing Data

Page 2: Privacy-aware Regression Modeling of Participatory Sensing Data

IntroductionProblem Formulation

Linear regressionPrivacy FilterApplication Server

Model ConstructionPrivacy AnalysisCase StudyDiscussionRelated WorkConclusion

Outline

Page 3: Privacy-aware Regression Modeling of Participatory Sensing Data

Crowdsource aka Participatory SensingPredict Statistics or Extrapolate from collected

data approach in paperPrivate data Public model

Private Data Samples Population density + Eco-friendly behavior Pollution

Model (Public)

Predict Pollution elsewhere.

Introduction

Page 4: Privacy-aware Regression Modeling of Participatory Sensing Data

Analyzes relationship between two variables, X and y

Error (Zero mean const variance)

Output Input Regression CoefficientsGiven X and y estimate β.Regression Model

Data (combination of X and y) Model (β)Given X and β predict y.

Linear Regression

Page 5: Privacy-aware Regression Modeling of Participatory Sensing Data

Private PublicUsage of electricity + Time of year Energy

consumption (Model)Given usage pattern predict energy consumption.Help users save on energy cost.

How much gas a vehicle will spend on a given route? How much energy a household will save if they

installed motion-activated light controls?How much weight a 300lb person might lose if

engaged in a particular diet and exercise routine?

Example

Page 6: Privacy-aware Regression Modeling of Participatory Sensing Data

Ensure anonymitySecurity mechanism users modify data,

PerturbationIrrecoverably alter data Approach in paper.

Sharing private data

Page 7: Privacy-aware Regression Modeling of Participatory Sensing Data

Problem Formulation

Page 8: Privacy-aware Regression Modeling of Participatory Sensing Data

Data (time series) output variables (e.g., household energy consumption)+ input variables (good predictors of output).

Data Neutral FeaturesReconstruction

Compute private data from features.Higher reconstruction error higher privacy.

Problem Formulation

Page 9: Privacy-aware Regression Modeling of Participatory Sensing Data

The model relating user inputs to the outputs is public.

Each data sample collected by an individual is private and may not be revealed.

The models used in the service are linear in coefficients.

The time-series data can be packed into uncorrelated data samples by aggregation (over time for example).

Assumptions

Page 10: Privacy-aware Regression Modeling of Participatory Sensing Data

Minimize the modeling errorAccuracy = No Alteration Accuracy.Perfect modeling

Maximize the reconstruction (breach) errorPerfect Neutrality

Information with shared data = information w/o shared data

Design Goals

Page 11: Privacy-aware Regression Modeling of Participatory Sensing Data

Data SegmentationAggregation over time to remove correlation

Sum/average.Length of time interval a day? a month?

Large enough to remove correlation.Result in accurate prediction.Usable by participatory sensing application.Depends on application.

Privacy Filter

Page 12: Privacy-aware Regression Modeling of Participatory Sensing Data

Segmentation n data points with d input values.Time independent data.yi to denote the value of the output attribute in the

ith segmentxij to denote the value of jth input of segment iEstimate yi using

Does not prevent privacy appliance usage + temperature inside a house each

month show whether a residence is occupied or not in a particular month.

Segmentation

Page 13: Privacy-aware Regression Modeling of Participatory Sensing Data

Input variableOutput variablePredictor variable and

denote

Model of system

Neutral Features

Page 14: Privacy-aware Regression Modeling of Participatory Sensing Data

Neutral Features correlations of data

Size of data independent of number of samples n.

Large n larger privacy.

Neutral Features

Constant O(n2)

Vector of length k O(kn2)

Matrix of size k*k O(k2n2)

Page 15: Privacy-aware Regression Modeling of Participatory Sensing Data

Construct regression modelLeast Square Estimator (LSE)

Let u1, . . . ,um be the m users of the participatory sensing application and provide

Let

The Application Server

Page 16: Privacy-aware Regression Modeling of Participatory Sensing Data

Define

The Application Server

Page 17: Privacy-aware Regression Modeling of Participatory Sensing Data

Model coefficients

Only uses the neutral features….YEAHExact model construction.

Regression Error

Error using neutral features

The Application Server

Page 18: Privacy-aware Regression Modeling of Participatory Sensing Data

Reconstruction Error

Reconstruction Error of mean values

Effective reconstruction If reconstruction err < 1

Privacy Enabling TransformationsIf reconstruction err > 1

Privacy Analysis

Segmented data

Reconstructed data

Variance of reconstructed data

Page 19: Privacy-aware Regression Modeling of Participatory Sensing Data

Optimal Reconstruction find the values Yu and Wu that produce the

given transformed matrices ρu, νu, Θu while maximizing the joint probability of observing such values.

Probability of observing values (known to attacker)

Privacy Enabling Properties

Page 20: Privacy-aware Regression Modeling of Participatory Sensing Data

Constraints and data points If data points < constraints 100%

reconstruction 0% privacyIf n infinity, Optimal solution difficult to construct private data. Constraints ≠ Affine non- convex optimization NP hard Exponential time in number of variables.

Inaccuracy and Inefficiency of Reconstruction

Page 21: Privacy-aware Regression Modeling of Participatory Sensing Data

Assumption Maximum likelihood is obtained if solution is close to the expected value also n is known.

KNITRO non-linear solver.

Conditions to Protect Privacy

Page 22: Privacy-aware Regression Modeling of Participatory Sensing Data

Best value of n?? Number of constraints = number of variables

Simulationn >

k high reconstruction errorn

< k

sin

gle

feas

ible

sol

utio

n

Page 23: Privacy-aware Regression Modeling of Participatory Sensing Data

Vertical correlation correlation among different attributes

Horizontal correlationcorrelation within a single attribute

Correlation

Page 24: Privacy-aware Regression Modeling of Participatory Sensing Data

Conjecture: If n > 2k error 1.

Page 25: Privacy-aware Regression Modeling of Participatory Sensing Data

Predict fuel efficient routeCompare

White noise Perturbation techniqueProposed method

Case Study

Page 26: Privacy-aware Regression Modeling of Participatory Sensing Data

ClientC++Data trace file

Location trace from GPSConfiguration file

Unique application IDSegmentation intervalSegmentation attributes(e.g.

time) Euclidean distance between

valuesPredictor function map X W. Feature Matrices

Transferred as XML to server

Case Study• Server• C++• List of models with unique application ID• Create aggregation matrices

Page 27: Privacy-aware Regression Modeling of Participatory Sensing Data

Data 16 users (different cars), different cars, 3 monthsGeo-tagged engine sensor measurement650 segments each ~ 2miles. Input

w1 = m(ST +v TL) m and v Mass and Velocity of vehicle ST Number of stop signs TL Number of traffic lights

w2 = m v2

w3 = mw4 = Av2 A frontal area of car

Output Fuel consumption

Case Study

Page 28: Privacy-aware Regression Modeling of Participatory Sensing Data

Reconstruction error

Case Study

Page 29: Privacy-aware Regression Modeling of Participatory Sensing Data

Dependence on number of samples

High error for n > 2k

Case Study

Page 30: Privacy-aware Regression Modeling of Participatory Sensing Data

Case Study

Page 31: Privacy-aware Regression Modeling of Participatory Sensing Data

RandomizationPerturbationDifferential PrivacyError in modeling

k-anonymityLoss of useful information

Distributed privacy preservationHorizontal or vertical partition aggregate featuresFine grained control to user to prevent his privacy.

Cryptographic techniquesHomographic encryptionComputationally expensive Limited scope

Related work

Page 32: Privacy-aware Regression Modeling of Participatory Sensing Data

Regression model same as from private data.Derive a safe number of samples.Study privacy.Neutral features high Reconstruction error .Quantification of privacy does not capture all

privacy breachesDistribution of original data is narrowHigher correlation easy reconstruction.

Can not guarantee privacy in theory.

Conclusion