dr. bhavani thuraisingham the university of texas at dallas (utd) june 2013
DESCRIPTION
Dr. Bhavani Thuraisingham The University of Texas at Dallas (UTD) June 2013. Assured Cloud Computing. Team Members. Sponsor: Air Force Office of Scientific Research The University of Texas at Dallas - PowerPoint PPT PresentationTRANSCRIPT
Dr. Bhavani ThuraisinghamThe University of Texas at Dallas (UTD)
June 2013
Assured Cloud Computing
Team Members• Sponsor: Air Force Office of Scientific Research• The University of Texas at Dallas
– Faculty: Dr. Murat Kantarcioglu; Dr. Latifur Khan; Dr. Kevin Hamlen; Dr. Zhiqiang Lin, Dr. Kamil Sarac
• Sub-contractors– Prof. Elisa Bertino (Purdue)– Ms. Anita Miller, Dr. Bob Johnson (North Texas Fusion Center)
• Collaborators– Dr. Steve Barker, Kings College, U of London (EOARD)– Dr. Barbara Carminati; Dr. Elena Ferrari, U of Insubria (EOARD)– Prof. Peng Liu, Penn State– Prof. Ting Yu, NC State
Outline• Objectives• Layered Framework• Data Security Issues for Clouds• Our Research
– FY11• Cloud-based Assured Information Sharing Demonstration• RDF-based Policy Engine on the Cloud• Secure Query Processing in Hybrid Cloud• CloudMask: Purdue University• Stream-based Malware Detection on the Cloud• Hypervisor (e.g., Xen) Integrity Issues and Forensics in the Cloud• Preliminary Investigation of Identity Management
– FY10• Secure Querying and Storing Relational Data with HIVE• Secure Querying and Storing RDF in Hadoop with SPARQL • XACML Implementation for Hadoop• Amazon.com Web Services and Security• Accountability and Access Control (Joint with Purdue)
• Acknowledgement: Research Funded by Air Force Office of Scientific Research
Objectives• Cloud computing is an example of computing in which dynamically scalable
and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them.
• Our research on Cloud Computing is based on Hadoop, MapReduce, Xen• Apache Hadoop is a Java software framework that supports data intensive
distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.
• XEN is a Virtual Machine Monitor developed at the University of Cambridge, England
• Our goal is to build a secure cloud infrastructure for assured information sharing applications
Information OperInformation Operaations Across Infospheres: tions Across Infospheres: Assured Information SharingAssured Information Sharing
Scientific/Technical ApproachConduct experiments as to how much information is lost
as a result of enforcing security policies in the case of trustworthy partners
Develop more sophisticated policies based on role-based and usage control based access control models
Develop techniques based on game theoretical strategies to handle partners who are semi-trustworthy
Develop data mining techniques to carry out defensive and offensive information operations
Accomplishments Developed an experimental system for determining
information loss due to security policy enforcement Developed a strategy for applying game theory for semi-
trustworthy partners; simulation results Developed data mining techniques for conducting
defensive operations for untrustworthy partners
Challenges Handling dynamically changing trust levels; Scalability
ObjectivesDevelop a Framework for Secure and Timely Data Sharing
across InfospheresInvestigate Access Control and Usage Control policies for
Secure Data SharingDevelop innovative techniques for extracting information
from trustworthy, semi-trustworthy and untrustworthy partners
Budget FY06-8: AFOSR $300K, State Match. $150K
ComponentData/Policy for Agency A
Data/Policy for Coalition
Publish Data/Policy
ComponentData/Policy for Agency C
ComponentData/Policy for Agency B
Publish Data/Policy
Publish Data/Policy
Incentive Issues in Assured Information SharingDoD MURI Project 2008 - 2013, AFOSR
Motivation• Misaligned incentives could be a significant problem in Information Security
– Software bugs vs. software companies’ incentives
• Incentive issues in information sharing have been explored to some extent– Incentive issues in file sharing p2p networks
• Assured information sharing creates new challenges– Security considerations vs. utility
Technical Approach• Verify that the other participants do not lie about their data
– If the data is revealed as it is• Trust but verify (Our initial results: DKE ’08 paper)
– If the data is not revealed (e.g., SMC techniques are used)• Non-cooperative computing• Mechanism design• SMC with rational adversaries
04/20/23 7
Layered Framework
User Interface
Hadoop/MapReduc/Storage
HIVE/SPARQL/Query
XEN/Linux/VMM
Secure Virtual Network Monitor
PoliciesXACML
Risks/Costs
QoSResource Allocation
Cloud Monitors
Figure2. Layered Framework for Assured Cloud
Secure Query Processing with Hadoop/MapReduce
• We have studied clouds based on Hadoop• Query rewriting and optimization techniques designed and
implemented for two types of data• (i) Relational data: Secure query processing with HIVE• (ii) RDF data: Secure query processing with SPARQL• Demonstrated with XACML policies• Joint demonstration with Kings College and University of Insubria
– First demo (2011): Each party submits their data and policies– Our cloud will manage the data and policies – Second demo (2012): Multiple clouds
Fine-grained Access Control with HiveSystem Architecture
Table/View definition and loading, Users can create tables as well as
load data into tables. Further, they can also upload XACML policies
for the table they are creating. Users can also create XACML
policies for tables/views. Users can define views only if
they have permissions for all tables specified in the query used to create the view. They can also either specify or create XACML policies for the views they are defining.
CollaborateCom 2010
ServerBackend
SPARQL Query Optimizer for Secure RDF Data Processing
Web Interface
Data Preprocessor
N-Triples Converter
Prefix Generator
Predicate Based Splitter
Predicate Object Based Splitter
MapReduce Framework
Parser
Query Validator & Rewriter
XACML PDP
Plan Generator
Plan Executor
Query Rewriter By Policy
New Data Query Answer
To build an efficient storage mechanism using Hadoop for large amounts of data (e.g. a billion triples); build an efficient query mechanism for data stored in Hadoop; Integrate with Jena
Developed a query optimizer and query rewriting techniques for RDF Data with XACML policies and implemented on top of JENA
IEEE Transactions on Knowledge and Data Engineering, 2011
Demonstration: Concept of Operation
User Interface Layer
Fine-grained Access Control with Hive
SPARQL Query Optimizer for Secure RDF Data
Processing
Relational Data RDF Data
Agency 1 Agency 2 Agency n
…
RDF-Based Policy Engine
Policies
Ontologies
Rules
In RDF
JENA RDF EngineRDF Documents
Inference Engine/Rules Processore.g., Pellet
Interface to the Semantic WebTechnologyBy UTDallas
RDF-based Policy Engine on the Cloud
Policy Transformation
Layer
Result Query
DB
DB
RDF
Policy Parser Layer
Regular Expression-Query
Translator Data Controller Provenance Controller
. . . RDF
XML
Policy / Graph Transformation Rules
Access Control/ Redaction Policy (Traditional Mechanism)
User Interface Layer
High Level Specification Policy
Translator
A testbed for evaluating different policy sets over different data representation. Also supporting provenance as directed graph and viewing policy outcomes graphically
Determine how access is granted to a resource as well as how a document is shared
User specify policy: e.g., Access Control, Redaction, Released Policy
Parse a high-level policy to a low-level representation
Support Graph operations and visualization. Policy executed as graph operations
Execute policies as SPARQL queries over large RDF graphs on Hadoop
Support for policies over Traditional data and its provenance
IFIP Data and Applications Security, 2010, ACM SACMAT 2011
Integration with Assured Information Sharing:
User Interface Layer
RDF Data Preprocessor
Policy Translation and Transformation Layer
MapReduce Framework for Query Processing
Hadoop HDFS
Agency 1 Agency 2 Agency n
…
RDF Data and Policies
SPARQL Query
Result
Secure Storage and Query Processing in a Hybrid Cloud: Problem Motivation
• The use of hybrid clouds is an emerging trend in cloud computing– Ability to exploit public resources for high throughput– Yet, better able to control costs and data privacy
• Several key challenges– Data Design: how to store data in a hybrid cloud?
• Solution must account for data representation used (unencrypted/encrypted), public cloud monetary costs and query workload characteristics
– Query Processing: how to execute a query over a hybrid cloud?• Solution must provide query rewrite rules that ensure the
correctness of a generated query plan over the hybrid cloud
Research Results• Data Design: A user submits data, a query workload, monetary and confidentiality
constraints – Solve the data partitioning problem in four phases
– Partition the data into several public (Ppu) and private (Ppr) components– For each partition, Ppu & Ppr, obtain their associated statistics– Estimate the execution cost of given query workload based on a user’s choice of
confidentiality level as well as the statistics associated with the partition– Select the best partition as the one that minimizes query workload execution cost without
violating monetary and confidentiality constraints • Query Processing: A user submits a query Q
• Solve the query processing problem in four phases– Query Rearrangement: Use query rewrite rules to transform an original query Q into public
(Qpu) and private (Qpr) query(ies)– Public Cloud Execution: Execute Qpu on public cloud– Private Cloud Execution: Execute Qpr on private cloud– Post-Processing: Combine the results of the execution of Qpu and Qpr into the final result
Hypervisor integrity and forensics in the Cloud
Cloud integrity & forensics
Hardware Layer
Virtualization Layer (Xen, vSphere)
Linux Solaris XP MacOS
Secure control flow of hypervisor code Integrity via in-lined reference monitor
Forensics data extraction in the cloudMultiple VMsDe-mapping (isolate) each VM memory from physical memory
Hypervisor
OS
Applications
integrityforensics
Cloud-based Malware DetectionDr. Mehedy
Benign
Buffer
Feature extraction and selection using
Cloud
Training & Model update
Unknown executable
Feature extraction
Classify
ClassMalware
Remove Keep
Stream of known malware or benign executables
Ensemble of Classification
models
Cloud-based Malware Detection• ACM Transactions on Management Information Systems• Binary feature extraction involves
– Enumerating binary n-grams from the binaries and selecting the best n-grams based on information gain
– For a training data with 3,500 executables, number of distinct 6-grams can exceed 200 millions
– In a single machine, this may take hours, depending on available computing resources – not acceptable for training from a stream of binaries
– We use Cloud to overcome this bottleneck
• A Cloud Map-reduce framework is used – to extract and select features from each chunk– A 10-node cloud cluster is 10 times faster than a single node– Very effective in a dynamic framework, where malware characteristics change rapidly
Fine-grained attribute-based privacy-preserving access control
•Fine-grained access control: different parts of the data can be covered by different access control policies•Attribute-based access control: access control policies are expressed in terms of identity attributes of subjects accessing the data•Privacy-preserving: the cloud does not learn anything about the contents of the data and the values of the identity attributes of users•System Developed is CloudMask•Joint Paper at CollobarateCom 2011
Key Features of CloudMask System: Key Features of CloudMask System: Elisa Bertino Purdue University andElisa Bertino Purdue University andMurat Kantarcioglu, UT DallasMurat Kantarcioglu, UT Dallas
Identity Management Considerations in a Cloud (with Penn State and NC State)
• Trust model that handles – (i) Various trust relationships, (ii) access control policies based on roles and
attributes, iii) real-time provisioning, (iv) authorization, and (v) auditing and accountability.
• Several technologies have to be examined to develop the trust model – Service-oriented technologies; standards such as SAML and XACML; and
identity management technologies such as OpenID.
• Does one size fit all? – Can we develop a trust model that will be applicable to all types of clouds
such as private clouds, public clouds and hybrid clouds Identity architecture has to be integrated into the cloud architecture.
Directions• Secure VMM (Virtual Machine Monitor) and VNM (Virtual
Network Monitor)– Exploring XEN VMM and examining security issues– Developing automated techniques for VMM introspection– Will examine VMM issues January 2012
• Integrate Secure Storage Algorithms into Hadoop (FY 2012)• Identity Management (FY 2012)• Technology Transfer through Knowledge and Security
Analytics, LLC