big data and privacy

13
Big Data and Privacy Paul A. Pavlou Milton F. Stauffer Professor Fox School of Business Temple University 70th Annual Meeting of the ORAU Council of Sponsoring Institutions

Upload: orau

Post on 16-Jul-2015

245 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Big Data and

Privacy Paul A. Pavlou

Milton F. Stauffer Professor Fox School of Business

Temple University

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

Big Data is Everywhere!

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

Big Data Institute @ Temple

• Objectives • Connect multiple disciplines, such as computer science, statistics,

engineering, medicine, computational social sciences, biology, neuroscience and business to harness the potential of big data

• Foster synergies among these disciplines and reach a critical mass necessary to pursue external funding opportunities and create technology commercialization opportunities

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

Centers under Big Data Institute

• Center for Data Analytics and Biomedical Informatics • Center for High Dimensional Statistics • Center for Web and Social Media Analytics • Center for Big Data in Oncology • Center for Big Data in Health Sciences

• Affiliated Centers

• Center for Neural Decision Making • Cancer Genome Institute

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

NSF Workshop on: ‘Privacy in an Era of Big Data’

April 22-23, 2015

• Tradeoff between benefits of big data and privacy protection

• Legal, public policy and regulatory issues on privacy

• Social, behavior and economics approaches to encouraging individual privacy protection

• Privacy protection technologies

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

• Diverse perspectives, including law, computer science, social,

behavioral, and economic sciences, and business, • Goal to foster a cross-disciplinary approach into the tradeoff between

big data and privacy • Enabling big data to transform organizations and markets, while

respecting people’s privacy rights • Inter-disciplinary collaboration among academia, industry, and

government with emphasis on rigor and real-world relevance

NSF Workshop on ‘Privacy in an Era of Big Data’

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

• Project 1: Understanding Information Privacy Concerns in Social Advertising: Eye Tracking and fMRI Study

• Using neurophysiological methods (i.e., eye tracking and fMRI) to investigate how social advertising affects consumers’ privacy concerns by capturing their affective and cognitive reactions.

Current Research Projects at Big Data Institute

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

• Project 2: Privacy Concerns in Targeted Social Advertising: A Randomized Field Experiment

• A large-scale randomized field experiment to examine differently structured social advertising and test their impact on privacy concerns.

• Whether highly relevant ads can mitigate privacy concerns regarding the use of sensitive personal information and whether privacy concerns reduce the ad’s appeal.

• Project 3: Website Registration on PC versus Mobile • How consumers’ website registration behavior differs between

PC and mobile interfaces, and how it is influenced by privacy statements and advertising messages.

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

• Project 4: What Drives User’s Website Registration? Network Externalities versus Information Privacy Dilemma • Main finding: Network externalities benefits from displaying

website popularity information and Word of Mouth (WOM) information outweigh negative effects of privacy concerns

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

Challenge 1: Semi-Supervised Learning for Structured Regression on Partially Observer Attributed Graphs

Problem: Attribute prediction (yellow) on evolving labelled graphs with data sharing restriction (blue) • Noise not distributed uniformly • Current SOA imputes missing data or does not use graph structure

true y

Solution (m-GCRF method): Combine graph structure AND extract information about missing labels from neighboring structures

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

m-GCRF Temple-GRAPHS

i-GCRF Temple-GRAPHS

NN

HGF-GCRF

Training data with 80% missing values

SOA

Applications: • Tested on NOAA’s National Climate Data Center Data for precipitation

• Reduce data sources by 50% while maintaining predictability

Reference: Stojanovic, J., Gligorijevic, Dj., Obradovic, Z. Proc. 2015 SIAM Int’l Conf. Data Mining, May 2015

3x

Results: • Quality of regression remains virtually unchanged up to 60% of missing data • At 80% of missing data regression up to 3x better than SOA for synthetic data

(explained >70% of variance) • Validated against 500 spatio-temporal graphs with up to 80% missing values, and 7 missing data models

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

Challenge 2: How to learn from multiple unreliable experts whose performance is task dependent (such as crowdsourcing)? Solution: Selective aggregation to provide good forecast and estimating sensitivities and specificities of experts at each task (aggregating experts and filtering novices) Reference: Zhang, P., Cao, W., Obradovic, Z. “Learning by Aggregating Experts and Filtering Novices: A Solution to Crowdsourcing Problems in Bioinformatics,” BMC Bioinformatics,14, 2013.

70th Annual Meeting of the ORAU Council of Sponsoring Institutions

Challenge 3: How to build a forecasting model when (privacy-related) data is distributed, cannot be shared, and sites may have heterogeneous database schema? Solution: Get statistics about the pertinent data from distributed sites and build the prediction model that can accommodate hybrid data fragmentation without using a priori knowledge of data schema in participating sites Reference: Mathew, G., Obradovic, Z. "A Distributed Decision Support Algorithm that Preserves Personal Privacy," Journal of Intelligent Information Systems., 2014.

Horizontal, vertical and hybrid data fragmentation