evaluation of iot privacy using bayesian networks · • among personal iot devices, fitness...

Post on 22-May-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MLCI 2017 Software Project

O d n a n Re f S a n c h e z

U n i v e r s i t y O f G e n o a , I t a l y

Te l e c o m m u n i c a t i o n N e t w o r k s a n d Te l e m a t i c s L a b o r a t o r y ( T N T ) , ( D I T E N )

A r t i f i c i a l I n t e l l i g e n c e L a b o r a t o r y, ( D I B R I S )

A short presentation on:

DITEN

Evaluation of IoT Privacy using Bayesian Networks

Introduction• For a user, the difficulty in managing her/his personal data introduced by the

complex combination of her/his number of devices and third party apps.• It has been foreseen that an individual would have an average of approximately 3.4 networked devices

by the year 2020 [1].

• non-Hispanic white users have an average of 25.9 third party apps (least) installed in their phone and30.3 (most) for African-American.

• Among personal IoT devices, fitness trackers are those having the most number of sensors and beingcapable of collecting the most sensitive information.

• In the field of privacy and security, protecting users from external attackers orprotecting against the vulnerability of the operating systems are majorly studied.• new risks have emerged related to threats that come from trusted service providers/third-party

applications that are granted by the user's authorization.

• The goal of the project: aid users with the complexity of IoT paradigm and alertthem for possible risk of privacy breaches• Inference-related privacy risks were computed through a Bayesian Network Inference graph

Framework Overview

PDM Notification

User accept/refuse

PDM Statement

Check

AID-S Inference

Check

Accept

Negotiation

No

Yes

Both Parties

OK? Yes

No

IoT-Tpi app installation

D(accept)< D(request)Request?

AID-S Transformation

General system workflow

Data Sets• Two dataset available on the Web for a total 19,817 user data samples from 49 users.

• 14 Fitbit users from the Open Humans Foundation

• 35 from the crowd-sourced Fitbit dataset generated from the respondents to a distributed survey via Amazon Mechanical Turk

• p-values < 2.2e-16.

• The datasets consist of time series data regarding the user's

• number of steps

• distance traveled

• minutes of activity

• floors taken

• Elevation

• Activity

• Calories

• Weight

• minutes of sleep

• heartrate information

Overview of the Methods• R Language was used

• The package bnlearn was used for learning the Bayesian Network

• Prior probabilities were estimated from the dataset using the MaximumLikelihood

• Gaussian Log-Likelihood (also known as negative entropy or negentropy)was used as the loss function for the validation

• Mean Square Error was used to compute the accuracy for thecontinuous variable case

• Prediction Classification error used to compute the accuracy for thediscrete variable case

• Interval discretization was used for the discretization of continuousvariables

Bayesian Network• The validation was executed for 100 times

• The expected loss measures are reported in the Table with their corresponding error variation, 𝜎.

• constraint-based structure learning algorithms: Max-Min Parents and Children (MMPC), Semi-Interleaved Hiton-PC (SI-HITON-PC), Chow-Liu (CHOW-LIU), and Aracne (ARACNE)),

• score-based structure learning algorithms (i.e., Hill-Climbing (HC), Tabu Search (Tabu))

• Hybrid structure learning algorithms (i.e., Max-Min Hill Climbing (MMHC), General 2-Phase Restricted Maximization (RSMAX2)

𝜎

Bayesian NetworkThe chosen Bayesian Network generated from CHOW-LIU algorithm

Prediction (Continuous Case)• The Mean Squared Error (MSE) was used to compute the difference between the

observed and predicted values for each node in the network.

• the floors node is almost perfect as expected due to the data correlationexclaimed above that has an almost perfect linear correlation with elevation

Prediction (Discrete Case)• The frequentist prediction error for a single node in a discrete network

was used.

• values of the target node are predicted using only the informationpresent in its local distribution (from its parents).

• the best case for our variables is to be represented by 4 states

Thank You!

top related