Transcript

Verifiable Responses to

Accumulo Queries

Cassandra Sparks

Robert K. Cunningham, Ariel Hamlin, Emily Shen,

Mayank Varia, David A. Wilson, Arkady Yerukhimovich

April 29, 2015

This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-002. Opinions, interpretations, recommendations

and conclusions are those of the authors and are not necessarily endorsed by the United States Government.

Verifiable Queries - 2

CS 04/29/15

Introduction to MIT Lincoln Laboratory

Established 1951

Lincoln Laboratory is a Department of Defense FFRDC operated by MIT

FFRDC: Federally Funded Research and Development Center

Verifiable Queries - 3

CS 04/29/15

Technology in Support of National Security

Sensors Information Extraction Communications

Integrated Sensing and Decision Support

(Secure – Countermeasure Resistant)

Purpose

Core Work Areas

Space Control

Intelligence,

Surveillance, and

Reconnaissance Systems

and Technology

Tactical Systems

Air and Missile

Defense TechnologyHomeland ProtectionAir Traffic Control

Communication Systems Advanced TechnologyCyber Security and

Information SciencesEngineering

Current Mission Areas

MIT Lincoln Laboratory

Cyber Security and

Information Sciences

Verifiable Queries - 4

CS 04/29/15

Common Big Data Architecture

CommandersOperators Analysts

Users

MaritimeGround SpaceC2 CyberOSINT

<html>

Data

AirHUMINTWeather

Analytics

A

C

D E

B

Computing

Web

Files

Scheduler

Ingest &

EnrichmentIngest &

EnrichmentIngest

This talk: cryptographically

securing Accumulo

Verifiable Queries - 5

CS 04/29/15

Threats to Accumulo

• Outsourced "cloud" server

– Learn content of data/queries

– Misattribute data to inserting clients

• Malicious insider (likely a sysadmin)

– Learn/change data or queries

– Misinform honest users

• Malicious clients

– Make unauthorized queries

– Learn stored data

– Learn other clients’ queries

• External attacker

– Insert malware, hack, etc

– We won’t detect these, but our crypto provides resiliency

Our focus: security against the server

Verifiable Queries - 6

CS 04/29/15

Querying

Clients

Secure Accumulo Overview

Hadoop Distributed Filesystem

Accumulo

Zookeeper

Network

Inserting

Clients

End-to-end

signatures

Attribute-based

access control

Cell-level

encryption

Verifiable

query

results

System administrator

Data at rest encryption

TLS encryption

Accumulo provides

no safeguards!

We improve the security of Accumulo with cryptography

Verifiable Queries - 7

CS 04/29/15

Outline

• Introduction

• End-to-End Signatures

– Digital Signatures

– Design Overview

– Implementation Details

• Verifiable Query Results

• Conclusion

Verifiable Queries - 8

CS 04/29/15

Accumulo

Tablet

Tablet Server

Tablet

Tablet Server

Tablet

Tablet Server

Inserts in Accumulo

Inserting

Client

Querying

Client

?Row Column

Family

Column

Qualifier

Visibilit

y

Timestamp Value

Patient A Hospital 1 Diagnoses Doctor 12349857 …

Verifiable Queries - 9

CS 04/29/15

• A signature algorithm has three phases:

MessageMessage

Key

Generation

Digital Signatures

Signing

A signature scheme is secure if an adversary cannot forge a

signature for a new message without having the signing key

Wrong

Message

Verification

Verifiable Queries - 10

CS 04/29/15

Accumulo

Tablet

Tablet Server

Tablet

Tablet Server

Tablet

Tablet Server

Digital Signatures in Accumulo

Querying

Client

Row Column

Family

Column

Qualifier

Visibility

Field

Timestamp Value

Patient A Hospital 1 Diagnoses Doctor 12349857 …

Inserting

Client

VerifSign

Verifiable Queries - 11

CS 04/29/15

Signature Code

• Implemented in Python as a client-side wrapper

– Uses the pyaccumulo library

– No server-side modifications needed

• Currently in the process of being open-sourced

– Contact [email protected] for updates

• Several interesting design choices:

– Where to store the signature metadata?

– There are many signature algorithms—which one to use?

Verifiable Queries - 12

CS 04/29/15

Storing Signature Metadata

• How do we store the signature of each cell?

Option 1: Separate table Option 2: Value field Option 3: Visibility Field

Pro: original table is

unmodified

Con: twice as many

reads & writes

Pro: value field is good at

storing unstructured data

Con: interferes with iterators

Patient Records

Patient 1 Flu shot

Patient 2 Broken knee

Patient 3 Chicken pox

Doctor

Admin

Admin

Patient Records Signatures

Patient 1 <signature 1>

Patient 2 <signature 2>

Patient 3 <signature 3>

Doctor

Admin

Admin

Patient Records

Patient 1 <signature 1>|Flu shot

Patient 2 <signature 2>|Broken knee

Patient 3 <signature 3>|Chicken pox

Doctor

Admin

Admin

Patient Records

Patient 1 Flu shot

Patient 2 Broken knee

Patient 3 Chicken pox

Doctor|“<signature 1>”

Admin|“<signature 2>”

Admin|“<signature 3>”

Pro: all Accumulo functionality

still works

Con: interferes with visibility label

evaluation optimizations

We support all three options

Verifiable Queries - 13

CS 04/29/15

Signature Algorithm Options

We support RSA and ECDSA signatures, and are investigating

how to safely use MACs

Option 1:

RSA Signatures

Option 2:

Elliptic Curve

Signatures (ECDSA)

Option 3:

Message Authentication

Codes

• Fast signature verification

• Large signature & key size

• Fast signature creation

• Relatively small signature & key sizes

• Symmetric key---uses the same key for signing & verification

• Much faster than RSA and ECDSA

• Con: one malicious client has more power to interfere with integrity

Verifiable Queries - 14

CS 04/29/15

Performance

(curve secp256r1)

Benchmarked on a virtualized single-node Accumulo 1.7.0 instance

Verifiable Queries - 15

CS 04/29/15

Security Summary: Signatures

• Signatures allow clients to verify data integrity

– Malicious server cannot modify or fabricate results

• Signatures cannot verify data completeness

– Server could omit both data & signature to avoid detection

Modification Insertion Omission

Signatures can detect:

Verifiable Queries - 16

CS 04/29/15

Outline

• Introduction

• End-to-End Signatures

• Verifiable Query Results

– Merkle Hash Trees

– Authenticated Skip Lists

• Conclusion

Verifiable Queries - 17

CS 04/29/15

The digest is a small

value (constant size)

that represents the

entire dataset

digest

Authenticated Data Structures

• Data structures that allow provably correct queries

– Correctness defined relative to a trusted, well-known source

– Need to support range queries

VO

Inserting Client

Accumulo Server

Querying Client?

VO

ADS

ADS: Authenticated Data Structure

VO: Verification Object

Verifiable Queries - 18

CS 04/29/15

digest

Merkle Hash Trees

2 4 6 8

h(2) h(4) h(6) h(8)

a = h(h(2), h(4)) b = h(h(6), h(8))

e = h(a, b)

10 12 14 16

h(10) h(12) h(14) h(16)

c = h(h(10), h(12)) d = h(h(14), h(16))

f = h(c, d)

root = h(e, f)Digest is the root

node’s hash value

Verifiable Queries - 19

CS 04/29/15

Merkle Hash Trees

2 4 6 8

h(2) h(4) h(6) h(8)

a = h(h(2), h(4)) b = h(h(6), h(8))

e = h(a, b)

range(5, 9)

10 12 14 16

h(10) h(12) h(14) h(16)

c = h(h(10), h(12)) d = h(h(14), h(16))

f = h(c, d)

root = h(e, f)

Naïve solution allows a malicious server

to omit elements at the ends of ranges

Part of the range returned

Part of the verification object

Computed based on returned

information

Verifiable Queries - 20

CS 04/29/15

Naïve Merkle Tree Security

Omitting internal

query results

Signatures:

Naïve MHTs:

Solution: return boundaries of the range

Omitting boundary

query results

Verifiable Queries - 21

CS 04/29/15

Merkle Hash Trees, Revisited

2 4 6 8

h(2) h(4) h(6) h(8)

a = h(h(2), h(4)) b = h(h(6), h(8))

e = h(a, b)

range(5, 9)

10 12 14 16

h(10) h(12) h(14) h(16)

c = h(h(10), h(12)) d = h(h(14), h(16))

f = h(c, d)

root = h(e, f)

Part of the range returned

Part of the verification object

Computed based on returned

information

Verifiable Queries - 22

CS 04/29/15

Security Summary: ADSs

Signatures:

Naïve MHTs:

MHTs:

Omitting internal

query results

Omitting boundary

query results

Verifiable Queries - 23

CS 04/29/15

Merkle Hash Tree Disadvantages

• Mostly used for static data

• How to insert elements into MHTs?

Approach 1: Unbalanced Insert Approach 2: Balanced Insert

Linear time

operations!Linear time

insert!

Verifiable Queries - 24

CS 04/29/15

Authenticated Skip Lists

O(log(n)) O(log(n))(expected)

O(n) O(log(n))(expected)

O(log(n)) O(log(n))(expected)

MHT Skip List

Lookup

Insert

Verify

Randomized skip lists

have empirically better

performance than other

tree-like data structures

Verifiable Queries - 25

CS 04/29/15

Outline

• Introduction

• End-to-End Signatures

• Verifiable Query Results

• Conclusion

Verifiable Queries - 26

CS 04/29/15

Additional Work

• Confidentiality to hide data from the server & unauthorized users

– Per-cell encryption allows flexible encryption for different use cases

– Cryptographically enforcing Accumulo’s visibility labels with key management

• Using HMACs for better performance without sacrificing security

• Key management and distribution for all cryptographic components

Verifiable Queries - 27

CS 04/29/15

Conclusion

• Signatures for data tampering detection

– Currently implemented in Python

– Client-side library

– Contact [email protected] to be notified when the code is open-sourced

• Authenticated Data Structures for full query correctness checks

– Working on embedding in Accumulo for greater efficiency

Questions?


Top Related