big data and privacy
TRANSCRIPT
Big Data and
Privacy Paul A. Pavlou
Milton F. Stauffer Professor Fox School of Business
Temple University
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
Big Data Institute @ Temple
• Objectives • Connect multiple disciplines, such as computer science, statistics,
engineering, medicine, computational social sciences, biology, neuroscience and business to harness the potential of big data
• Foster synergies among these disciplines and reach a critical mass necessary to pursue external funding opportunities and create technology commercialization opportunities
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
Centers under Big Data Institute
• Center for Data Analytics and Biomedical Informatics • Center for High Dimensional Statistics • Center for Web and Social Media Analytics • Center for Big Data in Oncology • Center for Big Data in Health Sciences
• Affiliated Centers
• Center for Neural Decision Making • Cancer Genome Institute
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
NSF Workshop on: ‘Privacy in an Era of Big Data’
April 22-23, 2015
• Tradeoff between benefits of big data and privacy protection
• Legal, public policy and regulatory issues on privacy
• Social, behavior and economics approaches to encouraging individual privacy protection
• Privacy protection technologies
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
• Diverse perspectives, including law, computer science, social,
behavioral, and economic sciences, and business, • Goal to foster a cross-disciplinary approach into the tradeoff between
big data and privacy • Enabling big data to transform organizations and markets, while
respecting people’s privacy rights • Inter-disciplinary collaboration among academia, industry, and
government with emphasis on rigor and real-world relevance
NSF Workshop on ‘Privacy in an Era of Big Data’
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
• Project 1: Understanding Information Privacy Concerns in Social Advertising: Eye Tracking and fMRI Study
• Using neurophysiological methods (i.e., eye tracking and fMRI) to investigate how social advertising affects consumers’ privacy concerns by capturing their affective and cognitive reactions.
Current Research Projects at Big Data Institute
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
• Project 2: Privacy Concerns in Targeted Social Advertising: A Randomized Field Experiment
• A large-scale randomized field experiment to examine differently structured social advertising and test their impact on privacy concerns.
• Whether highly relevant ads can mitigate privacy concerns regarding the use of sensitive personal information and whether privacy concerns reduce the ad’s appeal.
• Project 3: Website Registration on PC versus Mobile • How consumers’ website registration behavior differs between
PC and mobile interfaces, and how it is influenced by privacy statements and advertising messages.
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
• Project 4: What Drives User’s Website Registration? Network Externalities versus Information Privacy Dilemma • Main finding: Network externalities benefits from displaying
website popularity information and Word of Mouth (WOM) information outweigh negative effects of privacy concerns
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
Challenge 1: Semi-Supervised Learning for Structured Regression on Partially Observer Attributed Graphs
Problem: Attribute prediction (yellow) on evolving labelled graphs with data sharing restriction (blue) • Noise not distributed uniformly • Current SOA imputes missing data or does not use graph structure
true y
Solution (m-GCRF method): Combine graph structure AND extract information about missing labels from neighboring structures
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
m-GCRF Temple-GRAPHS
i-GCRF Temple-GRAPHS
NN
HGF-GCRF
Training data with 80% missing values
SOA
Applications: • Tested on NOAA’s National Climate Data Center Data for precipitation
• Reduce data sources by 50% while maintaining predictability
Reference: Stojanovic, J., Gligorijevic, Dj., Obradovic, Z. Proc. 2015 SIAM Int’l Conf. Data Mining, May 2015
3x
Results: • Quality of regression remains virtually unchanged up to 60% of missing data • At 80% of missing data regression up to 3x better than SOA for synthetic data
(explained >70% of variance) • Validated against 500 spatio-temporal graphs with up to 80% missing values, and 7 missing data models
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
Challenge 2: How to learn from multiple unreliable experts whose performance is task dependent (such as crowdsourcing)? Solution: Selective aggregation to provide good forecast and estimating sensitivities and specificities of experts at each task (aggregating experts and filtering novices) Reference: Zhang, P., Cao, W., Obradovic, Z. “Learning by Aggregating Experts and Filtering Novices: A Solution to Crowdsourcing Problems in Bioinformatics,” BMC Bioinformatics,14, 2013.
70th Annual Meeting of the ORAU Council of Sponsoring Institutions
Challenge 3: How to build a forecasting model when (privacy-related) data is distributed, cannot be shared, and sites may have heterogeneous database schema? Solution: Get statistics about the pertinent data from distributed sites and build the prediction model that can accommodate hybrid data fragmentation without using a priori knowledge of data schema in participating sites Reference: Mathew, G., Obradovic, Z. "A Distributed Decision Support Algorithm that Preserves Personal Privacy," Journal of Intelligent Information Systems., 2014.
Horizontal, vertical and hybrid data fragmentation