a model for sharing of confidential provenance information in a query based system

20
DOE Scientific Data Management Center – Scientific Process Automation A Model for Sharing of Confidential Provenance Information in a Query Based System Meiyappan Nagappan Mladen A. Vouk North Carolina State University IPAW 2008 June 17 th , 2008 IPAW 2008 1

Upload: galena

Post on 06-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

A Model for Sharing of Confidential Provenance Information in a Query Based System. Meiyappan Nagappan Mladen A. Vouk North Carolina State University. June 17 th , 2008 IPAW 2008. Agenda. Problem Motivation A scenario: Sharing Provenance Research Objective Implementation Model - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

A Model for Sharing of Confidential

Provenance Information in a Query

Based System

Meiyappan Nagappan

Mladen A. Vouk

North Carolina State University

IPAW 2008

June 17th, 2008

IPAW 2008

1

Page 2: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Agenda

Problem Motivation A scenario: Sharing Provenance Research Objective Implementation Model Discussions Conclusion Future Work

IPAW 2008 2

Page 3: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Problem Motivation Provenance is increasingly being used as part of analyses

to speed-up the process, extend its scope beyond raw data, and enable handling of very large data sets.

Attendant problem: Sharing of provenance information Keeping this information appropriately but selectively

confidential/protected

Confidentiality: “Ensuring that information is accessible only to those authorized to have access” – ISO/IEC - 17799

IPAW 2008 3

Page 4: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Unauthorized access of provenance could be used to Reverse engineer a process Compromise the privacy of the user Etc.

On the other hand, lack of sharing for the sake of confidentiality could hinder scientific discovery

Frequent current solution: export and mail the data that is to be shared Duplication of data – large meta-data sets and growing

A typical simulation may generate ~ 1GB of meta data Cannot revoke access

Problem Motivation

IPAW 2008 4

Page 5: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Scenario: Sharing Provenance

A

B

C

R1

R2

S11S12

S21

R3S31

S32

S33

IPAW 2008 5

Page 6: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Research Goal

The goal of current work is to develop a model, in the context of provenance for scientific simulations that Enables easy sharing of provenance data

Allows for dynamic changes in the confidentiality levels to serve multiple and different users

Does not compromise the confidentiality of the provenance data (including privacy)

IPAW 2008 6

Page 7: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Implementation Model - Architecture

Super Computer running

Simulations

Laptop running Kepler

Provenance Store

Web Interface to Query Provenance

API

API

QueryRecord

Authorization Service

MGMT. API

IPAW 2008 7

Page 8: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Sub Goals

Sub Goal 1: Person who generates simulation data – owner of original provenance data

Sub Goal 2 : Users cannot edit/delete Administrator can but must leave audit trail

Sub Goal 3: Owner can annotate their data Sub Goal 4: Owner can choose collaborators Sub Goal 5: Auditors have full read only access

Goal is to build a model that enables sharing provenance in an environment where the confidentiality level changes dynamically

We attempt to achieve the Goal through the following 5 objectives (sub-goals)

IPAW 2008 8

Page 9: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

What? Person who generates simulation data is owner of original

provenance data

Why? Each dataset is clearly traced to one owner

What is the risk? Dispute on who has the authority to share the data in the first place

Implementation? 3 Tiered: Client – Application Logic – Database Approach

Sub Goal 1

IPAW 2008 9

Page 10: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

What? Editing and Audit Trail No edits/deletes by owner, collaborator, other users Administrator can edit, but must leave audit trail

Why? Consistency of data (particularly shared data) Auditing

Risk? Each time the collaborator may get different results

How? Restrict privileges at DB level Log all super user actions

Sub Goal 2

Provenance Store

MGMT. API

IPAW 2008 10

Page 11: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Sub Goal 3 What?

Data Annotation

Why? User specified meta data Collaborator may have different interpretation

Risk? Loss of valuable meta data about provenance Cannot flag inaccurate data – therefore need delete privileges

How? Annotation field in all tables of schema. Through WI, annotate Provenance Data and Saved Queries

Provenance Store

WIAPI

Query

IPAW 2008 11

Page 12: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Sub Goal 4

What? Data Sharing with dynamically changing confidentiality levels

Why? To share data on “What You See Is What You Want To Share” basis Each time a different subset of the data

Risk? Share entire data set or nothing Disk space wasted for saving a separate copy of subset

How? Query Sharing

IPAW 2008 12

Page 13: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

User Authorization API DB

UsernamePassword

AuthenticateRequest Data Execute Query

Return Data

Save data for Collaborator Save the Query

View Queries Saved for me by other Collaborators

View Data Saved in Query for me by other Collaborators

Query Table

•Query ID•Saved by•Saved for•Query•Timestamp•Allow Cascading•Revoke Active

Sub Goal 4(contd.)

Annot Table

•Query ID•User ID•Annotation•Viewable

Annotate the Query

IPAW 2008 13

Page 14: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Why Query Sharing

Dynamically decide what to share

Size of the set of information to be shared is large

Subset of information rather than individual records

Sub Goal 4(contd.)

IPAW 2008 14

Page 15: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

What? Data Audit and Verification

Why? Prevent tampering by malicious users Maintain Accuracy

Risk? Collaborators may try to break system Administrators may misuse super user privileges

How? Authorized and authenticated auditors Full Read only access to – Original data, Provenance data, Annotations Edit trails and logs of super user actions

Sub Goal 5

IPAW 2008 15

Page 16: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Issues

The model is Query Centric Automatic run time collection of provenance data required. Restricted to provenance data from scientific workflow

systems. Collaborator can annotate shared subset only as a whole. Does not address issues in long term storage and

scalability

IPAW 2008 16

Page 17: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Conclusion

With increase in emphasis on provenance data collection in scientific workflows, the issue of its confidentiality becomes more important

Not much research done in this area of provenance This model addresses the confidentiality in a collaborative

environment. Tradeoff – Disk Space:Time :: Query Sharing:Data Sharing

IPAW 2008 17

Page 18: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Validating our model against other solutions using different threat scenarios

Responsibility of sharing data is with user Privacy of user is at stake Tools required to foresee inferences from provenance data

Large data sets: Provenance data and shared queries grow steadily in size Accessing them will be difficult Tools required to improve the HCI aspect

Future Work

IPAW 2008 18

Page 19: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Questions?

IPAW 2008 19

Page 20: A Model for Sharing of Confidential Provenance Information in a Query Based System

DOE Scientific Data Management Center – Scientific Process Automation

Related Work: References[1] Hasan, R., Sion, R. and Winslett, M.: Introducing secure provenance: problems and

challenges Proceedings of the 2007 ACM workshop on Storage security and survivability, ACM, Alexandria, Virginia, USA, (2007). pp 13-18

[2] Griffiths, P.P. and Wade, B.W.: An authorization mechanism for a relational database system. ACM Transactions on Database Systems,(Sep 1976)., 1 (3). 242-255.

[3] Sandhu, R. and Samarati, P. 1996.: Authentication, access control, and audit. ACM Computer Survey 28, 1 (Mar. 1996), 241-243. DOI = http://doi.acm.org/10.1145/234313.234412

[4] Tan, V., Groth, P., Miles, S., Jiang, S., Munroe, S., Tsasakou, S. and Moreau, L.: Security Issues in a SOA-Based Provenance System. LNCS, Volume 4145 (Provenance and Annotation of Data). pp. 203-211. Springer Berlin / Heidelberg (2006)

IPAW 2008 20