accumulo summit 2015: verifiable responses to accumulo queries [security]

of 27 /27
Verifiable Responses to Accumulo Queries Cassandra Sparks Robert K. Cunningham, Ariel Hamlin, Emily Shen, Mayank Varia, David A. Wilson, Arkady Yerukhimovich April 29, 2015 This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-002. Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government.

Author: accumulo-summit

Post on 15-Jul-2015




2 download

Embed Size (px)


PowerPoint Presentation

Verifiable Responses to Accumulo QueriesCassandra SparksRobert K. Cunningham, Ariel Hamlin, Emily Shen, Mayank Varia, David A. Wilson, Arkady YerukhimovichApril 29, 2015This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-002. Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government.

Verifiable Queries - #CS 04/29/15111Hi everyone, Im Cassandra Sparks, from MIT Lincoln Laboratory. There have been a lot of great talks here already, and some of them have highlighted that security is an important aspect of Accumulo. We have been looking at a different threat model for Accumulo, and using cryptography to protect against it. In this talk, Ill discuss our threat model, then talk about the cryptographic techniques we are using to secure against it and some details about the code we have written implementing them.

We are using cryptography to improve Accumulo by letting clients verify that query results have not been tampered with by the server. This talk will discuss the cryptographic techniques we use and some design & implementation details of the code.

Verification key, NOT verifying keyDont dive right in to talking about integrity & confidentiality---define them!Feedback from Mike:Streamline the signatures intro, if neededAt least show a picture of an authenticated skiplist to show what it looks like/what were usingQuestion from Michael Allen: performance effect on the Accumulo server itself?Need to figure out an answer to this question

Introduction to MIT Lincoln Laboratory

Established 1951

Lincoln Laboratory is a Department of Defense FFRDC operated by MIT FFRDC: Federally Funded Research and Development CenterVerifiable Queries - #CS 04/29/152Introduction to MIT and MIT Lincoln Laboratory. MIT is a premier engineering university dedicated to research and education. MIT Lincoln Laboratory extends these requirements to provide technology in support of national security.Technology in Support of National SecuritySensors Information Extraction CommunicationsIntegrated Sensing and Decision Support(Secure Countermeasure Resistant)PurposeCore Work AreasSpace ControlIntelligence,Surveillance, andReconnaissance Systems and TechnologyTactical SystemsAir and MissileDefense TechnologyHomeland ProtectionAir Traffic ControlCommunication SystemsAdvanced TechnologyCyber Security and Information SciencesEngineeringCurrent Mission AreasMIT Lincoln LaboratoryCyber Security and Information SciencesVerifiable Queries - #CS 04/29/153MIT Lincoln Laboratory is involved with a variety of mission areas through core work areas. Our purpose is to provide technology in support of national security

Common Big Data ArchitectureCommandersOperatorsAnalystsUsers







Ingest & EnrichmentIngest & EnrichmentIngest

This talk: cryptographically securing AccumuloVerifiable Queries - #CS 04/29/15Many government technologies rely on Accumulo; as a result, we want to make sure it is secure.

Image Sources:


Analysts: Comstock

Commanders: US Forces Korea General News

OSINT: Acrobat logo is Adobe Systems Inc., Twitter and Office 2011 logos are Microsoft

Weather: Rebecca van Ommen :

HUMINT: U.S. Dept of State


Ground: Staff Sgt. William Tremblay/U.S. Army via Wired :

Maritime: This file is a work of a sailor or employee of the U.S. Navy, taken or made as part of that person's official duties. As a work of the U.S. federal government, the image is in the public domain.

Air: This image or file is a work of a U.S. Air Force Airman or employee, taken or made as part of that person's official duties. As a work of the U.S. federal government, the image or file is in the public domain.

Space: This file is in the public domain because it was solely created by NASA. NASA copyright policy states that "NASA material is not protected by copyright unless noted".

Cyber: derrrek :

This graphic was previously approved for public release as MS-777054Threats to AccumuloOutsourced "cloud" serverLearn content of data/queriesMisattribute data to inserting clientsMalicious insider (likely a sysadmin)Learn/change data or queriesMisinform honest usersMalicious clientsMake unauthorized queriesLearn stored dataLearn other clients queriesExternal attackerInsert malware, hack, etcWe wont detect these, but our crypto provides resiliencyOur focus: security against the serverVerifiable Queries - #CS 04/29/15Note that the current PACE work is not protecting simultaneously against a malicious server and malicious client.5

Querying Clients Secure Accumulo OverviewHadoop Distributed FilesystemAccumuloZookeeperNetworkInserting Clients End-to-end signaturesAttribute-based access controlCell-level encryption

Verifiable query results

System administratorData at rest encryptionTLS encryptionAccumulo provides no safeguards!We improve the security of Accumulo with cryptographyVerifiable Queries - #CS 04/29/15High-level overview of Accumulo (how were thinking about it) to explain the threat model in more detail. Accumulo already has some security features, but others are lacking---we provide them. This talk is about the integrity component of signatures & verifiable query results.

Define integrity & confidentiality here6OutlineIntroduction

End-to-End SignaturesDigital SignaturesDesign OverviewImplementation Details

Verifiable Query Results

ConclusionVerifiable Queries - #CS 04/29/157AccumuloTabletTablet ServerTabletTablet ServerTabletTablet ServerInserts in AccumuloInserting ClientQuerying Client?

RowColumn FamilyColumn QualifierVisibilityTimestampValue Patient AHospital 1DiagnosesDoctor12349857Verifiable Queries - #CS 04/29/15Currently, there is no mechanism in place for Accumulo clients to verify query results.8A signature algorithm has three phases:MessageMessage

Key GenerationDigital Signatures


A signature scheme is secure if an adversary cannot forge a signature for a new message without having the signing key WrongMessage

Verification Verifiable Queries - #CS 04/29/15Phase 1: generate a random key pair. The signing key is private, and the verifying key is public.

Phase 2: generate a signature from a message & a signing key.

Phase 3: verify that a message matches its signature (or not).9

AccumuloTabletTablet ServerTabletTablet ServerTabletTablet ServerDigital Signatures in AccumuloQuerying ClientRowColumn FamilyColumn QualifierVisibility FieldTimestampValue Patient AHospital 1DiagnosesDoctor12349857

Inserting Client



Verifiable Queries - #CS 04/29/15We use signatures to add cell-level security to Accumulo without modifying the server.10Signature CodeImplemented in Python as a client-side wrapperUses the pyaccumulo library No server-side modifications neededCurrently in the process of being open-sourcedContact [email protected] for updatesSeveral interesting design choices:Where to store the signature metadata?There are many signature algorithmswhich one to use?Verifiable Queries - #CS 04/29/15The code exists and is in the process of being open-sourced.11Storing Signature MetadataHow do we store the signature of each cell?Option 1: Separate tableOption 2: Value fieldOption 3: Visibility FieldPro: original table is unmodifiedCon: twice as many reads & writesPro: value field is good at storing unstructured dataCon: interferes with iteratorsPatient RecordsPatient 1Flu shotPatient 2Broken kneePatient 3Chicken poxDoctorAdminAdminPatient Records SignaturesPatient 1

Patient 2

Patient 3

DoctorAdminAdminPatient RecordsPatient 1|Flu shotPatient 2|Broken kneePatient 3|Chicken poxDoctorAdminAdminPatient RecordsPatient 1Flu shotPatient 2Broken kneePatient 3Chicken poxDoctor|Admin|Admin|Pro: all Accumulo functionality still worksCon: interferes with visibility label evaluation optimizationsWe support all three optionsVerifiable Queries - #CS 04/29/15Showing the row, visibility label, and value of each cell.

There is no one-size-fits-all solution, so we provide an option between all three so the user can choose what works best for them.12Signature Algorithm OptionsWe support RSA and ECDSA signatures, and are investigating how to safely use MACsOption 1:RSA SignaturesOption 2:Elliptic CurveSignatures (ECDSA)Option 3:Message AuthenticationCodesFast signature verificationLarge signature & key sizeFast signature creationRelatively small signature & key sizes

Symmetric key---uses the same key for signing & verificationMuch faster than RSA and ECDSACon: one malicious client has more power to interfere with integrityVerifiable Queries - #CS 04/29/15Several signature options exist, with different performance characteristics.13Performance

(curve secp256r1)Benchmarked on a virtualized single-node Accumulo 1.7.0 instanceVerifiable Queries - #CS 04/29/15Benchmarks are on a virtualized server, but this (and the ensuing poor Accumulo performance) is irrelevant, since the code is all client-side.14Security Summary: SignaturesSignatures allow clients to verify data integrityMalicious server cannot modify or fabricate resultsSignatures cannot verify data completenessServer could omit both data & signature to avoid detectionModificationInsertion

OmissionSignatures can detect:Verifiable Queries - #CS 04/29/1515OutlineIntroduction

End-to-End Signatures

Verifiable Query ResultsMerkle Hash TreesAuthenticated Skip ListsConclusionVerifiable Queries - #CS 04/29/1516The digest is a small value (constant size) that represents the entire dataset

digestAuthenticated Data StructuresData structures that allow provably correct queriesCorrectness defined relative to a trusted, well-known sourceNeed to support range queriesVOInserting ClientAccumulo ServerQuerying Client?VO

ADSADS: Authenticated Data StructureVO: Verification ObjectVerifiable Queries - #CS 04/29/15ADSs can be used to verify the entirety of Accumulo queries.17

digestMerkle Hash Trees2468h(2)h(4)h(6)h(8)a = h(h(2), h(4))b = h(h(6), h(8))e = h(a, b)10121416h(10)h(12)h(14)h(16)c = h(h(10), h(12))d = h(h(14), h(16))f = h(c, d)root = h(e, f)Digest is the root nodes hash valueVerifiable Queries - #CS 04/29/15MHTs use hash functions to allow verified range queries over sorted data.18Merkle Hash Trees2468h(2)h(4)h(6)h(8)a = h(h(2), h(4))b = h(h(6), h(8))e = h(a, b)range(5, 9)10121416h(10)h(12)h(14)h(16)c = h(h(10), h(12))d = h(h(14), h(16))f = h(c, d)root = h(e, f)

Nave solution allows a malicious server to omit elements at the ends of rangesPart of the range returnedPart of the verification objectComputed based on returned informationVerifiable Queries - #CS 04/29/15A nave approach to VO generation still allows a server to omit some elements of the query without detection.19Nave Merkle Tree SecurityOmitting internal query resultsSignatures:Nave MHTs:

Solution: return boundaries of the rangeOmitting boundary query resultsVerifiable Queries - #CS 04/29/15Even nave MHTs are more powerful than just signatures, but we can do better if we return the elements just outside the range as part of the VO.20Merkle Hash Trees, Revisited2468h(2)h(4)h(6)h(8)a = h(h(2), h(4))b = h(h(6), h(8))e = h(a, b)range(5, 9)10121416h(10)h(12)h(14)h(16)c = h(h(10), h(12))d = h(h(14), h(16))f = h(c, d)root = h(e, f)

Part of the range returnedPart of the verification objectComputed based on returned informationVerifiable Queries - #CS 04/29/15Here is how the same range query works on a real MHT.21Security Summary: ADSsSignatures:Nave MHTs:MHTs:Omitting internal query results

Omitting boundary query results

Verifiable Queries - #CS 04/29/15MHTs (and ADSs in general) can prove if any elements were omitted.22Merkle Hash Tree DisadvantagesMostly used for static dataHow to insert elements into MHTs?Approach 1: Unbalanced InsertApproach 2: Balanced InsertLinear timeoperations!Linear timeinsert!Verifiable Queries - #CS 04/29/15MHTs are good for static data, and have poor performance when inserting data into them.23Authenticated Skip ListsO(log(n))O(log(n))(expected)

O(n)O(log(n))(expected)O(log(n))O(log(n))(expected)MHTSkip ListLookupInsertVerifyRandomized skip lists have empirically better performance than other tree-like data structuresVerifiable Queries - #CS 04/29/15A nave MHT has bad asymptotic behavior. We switch to Authenticated Skip Lists, instead, which have good asymptotics and empirically better performance than data structures such as red-black trees.24OutlineIntroduction

End-to-End Signatures

Verifiable Query Results

ConclusionVerifiable Queries - #CS 04/29/15Additional WorkConfidentiality to hide data from the server & unauthorized usersPer-cell encryption allows flexible encryption for different use casesCryptographically enforcing Accumulos visibility labels with key managementUsing HMACs for better performance without sacrificing securityKey management and distribution for all cryptographic componentsVerifiable Queries - #CS 04/29/15There are other components to our work securing Accumulo.26ConclusionSignatures for data tampering detectionCurrently implemented in PythonClient-side libraryContact [email protected] to be notified when the code is open-sourcedAuthenticated Data Structures for full query correctness checksWorking on embedding in Accumulo for greater efficiencyQuestions?Verifiable Queries - #CS 04/29/15Recap and questions.27