- squelette -2.pdf
TRANSCRIPT
On a eu un cours dessus
Lectures de Stanford Super, baser l'exposé dessus
A brief tour of differential privacy (slides) début amusant
Big data and attacks on privacy Exemples d'attaques + extension aux réseaux sociaux
A simple and practical algorithm for differential privacy data release [Ligett, Hardt, McSherry]
Un algo
Data mining with differential privacy (slides) Joli, doit être bien fait
Differential privacy [Dwork] Moyen de mettre un peu de contexte et d'histoire
RQ sur la doc
In an ideal world, we would not need to worry about these issues. But unfortunately, we do not live in such a world, so these are issues which are important to discuss.Previous attacks (cf. class) : slides 11 - 15 de Big Data and Attacks on PrivacyNetflix (Austin, mettre une photo de moi à UT)
Motivations
Initialement appelée "indistinguishability"Compromis entre une demande sociétale et économique d'information précise et les obligations éthico-légales de protection de la vie privée des personnes physiques et morales
Formalisation du problème
Recherche Google : terminologie très utilisée par les nations uniesMotivation de survie pour le collecteur de données
Celle du propriétaire de la donnéeCelle de l'utilisateur
2 autres dimensions de vie privée (plutôt propriété)
Qui cela concerne-t-il ?
-> query processing over encrypted datafully homomorphic encryption
Untrusted server -> encrypt data
An organization has a database which is trusted, and also runs computation which is trusted. An adversary can only observe auxiliary information in the world and can obviously learn the result of any computation run by the trusted party, but is not allowed to see the entire database contents
Trusted server (notre cas) : Secure release of aggregate information
Permettre de maximiser la pertinence de requêtes au sein de "statistical databases" tout en minimisant la possibilité d'identifier ses "individual records" (enregistrements unitaires?)
Bdd utilisée à des fins d'analyse statistiqueConcrètement : restreindre les requêtes à des données agrégées, pas d'accès aux "individual records"
Statistical database
Concrètement
Modèle de la menace
Contexte de la DP
Differential Privacy : principes, méthodes et algos, applications, extensionssamedi 21 février 2015 14:29
Ateliers Page 1
Difficult problem, since intelligent users can use a combination of aggregate queries to derive information about a single individual.Examples ?
d'accès aux "individual records"
OLAP vs. OLTPQuels sont les autres types de bases de données ?
ExemplePropriétés
Anonymisation
Problème NP-difficilek-anonymat
Quelques (fausses) solutions naïves
Differential Privacy is a condition on the release mechanism (and not on the dataset)
only allowing aggregate queries (SUM, COUNT, AVG, STDEV, etc.)rather than returning exact values for sensitive data like income, only return which partition it belongs to (e.g. 35k-40k)return imprecise counts (e.g. rather than 141 records met query, only indicate 130-150 records met it.)don't allow overly selective WHERE clausesaudit all users queries, so users using system incorrectly can be investigateduse intelligent agents to detect automatically inappropriate system use
Common approaches (pour traiter les bases de données statistiques)
Generally speaking, differential privacy is an area of research which seeks to provide rigorous, statistical guarantees against what an adversary can infer from learning the results of some randomizedalgorithm.Users are ε times less likely to be identified if they chose not to participate in the database
intuitive
Cf. lecture b de stanfordRandomized algorithm?
Mathématique
Définition
SensitivityComposition between systems
This means that no adversary with arbitrary auxiliary information can know if one particular participant submitted his information.
In general, ε-differential privacy is designed to protect the privacy between neighboring databases which differ only in one row.
Group privacy
Propriétés
adding controlled noise - Laplace NoiseMécanisme de Laplace
MWEM (cf. doc A simple and practical algorithm for differential privacy data release [Ligett, Hardt, McSherry])+ The algorithmic foundations of differential privacy [Dwork, Roth]On va devoir réfléchir à deux sur cette partie, assez dense
Algorithmes
Showing in general, securing statistical databases was an impossible aimhttp://en.wikipedia.org/wiki/Statistical_databasedoi:10.1145/320128.320138 - Dorothy E. Denning, Jan Schlörer, A fast procedure for finding a tracker in a statistical database, ACM Transactions on Database Systems, Volume 5, Issue 1 (March 1980) . Pages: 88 - 102
Research in this area has largely stalledBilan
-differential privacy
Principe
Ateliers Page 2
on Database Systems, Volume 5, Issue 1 (March 1980) . Pages: 88 - 102
Other
Ateliers Page 3