accumulo summit 2014: addressing big data challenges through innovative architecture, databases and...

42
Addressing Big Data Challenges through Innovative Architecture, Databases and Software UNCLASSIFIED Dr. Vijay Gadepally [email protected] Accumulo Summit College Park, MD June 12, 2014 This work is sponsored by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government

Upload: accumulo-summit

Post on 26-Jan-2015

104 views

Category:

Technology


0 download

DESCRIPTION

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. The Massachusetts Institute of Technology, Lincoln Laboratory (MIT LL) is not immune to these challenges and has developed a set of tools that address many of these challenges. Big data volume stresses the storage, memory, and compute capacity of a computing system and requires access to a computing cloud. Choosing the right cloud is problem specific. Currently, there are four multi-billion dollar ecosystems that dominate the cloud computing environment: enterprise clouds, big data clouds, SQL database clouds, and supercomputing clouds. Each cloud ecosystem has its own hardware, software, conferences, and business markets. The broad nature of business big data challenges make it unlikely that one cloud ecosystem can meet its needs and solutions are likely to require the tools and techniques from more than one cloud ecosystem. The MIT Supercloud was developed to address this challenge. To our knowledge, the MIT SuperCloud is the only deployed cloud system that allows all four ecosystems to co-exist without sacrificing performance or functionality. The velocity of big data velocity stresses the rate at which data can be absorbed and meaningful answers produced. Led by the NSA, a Common Big Data Architecture (CBDA) was developed for the U.S. government based on the Google Big Table NoSQL approach and is now in wide use. MIT/LL played a leading role in developing the CBDA and is a leader in adapting the CBDA to a variety of big data challenges. Big data variety may present the largest challenge and greatest opportunities. The promise of big data is the ability to correlate diverse and heterogeneous data to form new insights. The centerpiece of the CBDA is the NSA developed Apache Accumulo database (capable of millions of entries/second) and the MIT/LL developed D4M schema. These technologies allow vast quantities of highly diverse data (text, computer logs, and social media data, etc.) to be automatically ingested into a common schema that enables rapid query and correlation of every element. The talk will concentrate on how we utilize the aforementioned technologies in our mission to apply advanced technology to problems of national security.

TRANSCRIPT

Page 1: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Addressing Big Data Challenges through Innovative Architecture,

Databases and Software

UNCLASSIFIED

Dr. Vijay [email protected]

Accumulo SummitCollege Park, MD

June 12, 2014

This work is sponsored by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract #FA8721-05-C-0002.  Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government

Page 2: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 2

Acknowledgements

• Bill Arcand

• Bill Bergeron

• David Bestor

• Chansup Byun

• Matt Hubbell

• Jeremy Kepner

• Pete Michaleas

• Julie Mullen

• Andy Prout

• Albert Reuther

• Tony Rosa

• Charles Yee

And many more …

Page 3: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 3

Outline

• Introduction

• Cloud Computing and Challenges

• Innovative Architecture: MIT SuperCloud

• Innovative Databases: Apache Accumulo

• Innovative Software: D4M

• R&D Examples

• Conclusions

Page 4: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 4

Introduction to MIT Lincoln Laboratory

Established 1951

Lincoln Laboratory is a Department of Defense FFRDC operated by MIT

Page 5: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 5

Technology in Support of National Security

Sensors Information Extraction Communications

Integrated Sensing and Decision Support

(Secure – Countermeasure Resistant)

Purpose

Core Work Areas

Space Control

Intelligence,Surveillance, and

Reconnaissance Systems and Technology

Tactical Systems

Air and MissileDefense Technology

Homeland ProtectionAir Traffic Control

Communication Systems Advanced TechnologyCyber Security and

Information SciencesEngineering

Current Mission Areas

MIT Lincoln Laboratory

Page 6: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 6

MIT Lincoln Laboratory - Focus Areas -

Rapid Prototyping

Trusted Government Advisor

University AffiliationsUniversity Affiliations

SystemAnalysis

• Highly instrumented• Field / operational

testing

• Capabilities against existing & future threats

• Rapid development

• Operationallyrelevant

• Validatedby testing

Methodology Outputs

TestingTechnologyPrototyping

Assessmentsto Senior

Leadership

Architects andRequirements

Definition

AdvancedTechnologyPrototypes

Broad Multi-Mission Technology Strength

Architecture Analysis and Test

Architecture Analysis and Test

Conferences, WorkshopsOutreach

Conferences, WorkshopsOutreach

Page 7: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 7

Outline

• Introduction

• Cloud Computing and Challenges

• Innovative Architecture: MIT SuperCloud

• Innovative Databases: Apache Accumulo

• Innovative Software: D4M

• R&D Examples

• Conclusions

Page 8: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 8

WarfightersOperators Analysts

MaritimeGround SpaceC2 CyberText and Social Media

<html>

Data

AirHUMINTWeather

Gap

Common Big Data Challenge

Users

Year

Info

rma

tio

n S

tore

d

(MB

)

1986 1989 1992 1995 1998 2001 2004 2007 2010 2013

3 X 1014 MB

7 X 1012 MIPS

World TotalInformationStored

World TotalComputing Capacity

Millio

ns

of In

stru

ctio

ns

pe

r Se

co

nd

(MIP

S)

1014

1015

1016

1013

1012

1011

1014

1015

1016

1013

1012

1011

1010

Source: M. Hilbert and P. López, Science, Vol. 332 (2011) and associated online material

Rapidly increasing- Data volume- Data velocity- Data variety

Page 9: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 9

Common Big Data Architecture

WarfightersOperators Analysts

Users

MaritimeGround SpaceC2 CyberOSINT

<html>

Data

AirHUMINTWeather

Analytics

Computing

Web

Files

Scheduler

Ingest & EnrichmentIngest &

EnrichmentIngestDatabases

Page 10: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 10

Common Big Data Architecture- Data Volume: Cloud Computing -

WarfightersOperators Analysts

Users

MaritimeGround SpaceC2 CyberOSINT

<html>

Data

AirHUMINTWeather

Analytics

Computing

Web

Files

Scheduler

Ingest & EnrichmentIngest &

EnrichmentIngestDatabases

Operators

MITSuperCloud

Enterprise Cloud

Big Data Cloud Database Cloud

Compute CloudMIT SuperCloud merges four clouds

LLSuperCloud: Sharing HPC Systems for Diverse Rapid Prototyping, Reuther et al, IEEE HPEC 2013

Page 11: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 11

WarfightersOperators Analysts

Users

MaritimeGround SpaceC2 CyberOSINT

<html>

Data

AirHUMINTWeather

Analytics

Computing

Web

Files

Scheduler

Ingest & EnrichmentIngest &

EnrichmentIngestDatabases

Lincoln benchmarkingvalidated Accumulo performance

Common Big Data Architecture- Data Velocity: Accumulo Database -

Page 12: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 12

WarfightersOperators Analysts

Users

MaritimeGround SpaceC2 CyberOSINT

<html>

Data

AirHUMINTWeather

Analytics

Computing

Web

Files

Scheduler

Ingest & EnrichmentIngest &

EnrichmentIngestDatabases

D4M demonstrated auniversal approach to diverse data

columnsro

ws

Σ

raw

Common Big Data Architecture- Data Variety: D4M Schema -

intel reports, DNA, health records, publication citations, web logs, social media, building alarms, cyber, … all handled by a common 4 table schema

D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database, Kepner et al, IEEE HPEC 2013

Page 13: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 13

The Cloud within the Common Big Data Architecture

WarfightersOperators Analysts

Users

MaritimeGround SpaceC2 CyberOSINT

<html>

Data

AirHUMINTWeather

Analytics

Computing

Web

Files

Scheduler

Ingest & EnrichmentIngest &

EnrichmentIngestDatabases

The “Cloud”

Page 14: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 14

• Each cloud ecosystem supports many multi-$B industries• Each cloud ecosystem uses different software and hardware

Four Ecosystems Dominate Large Scale Cloud Computing

Enterprise

Big Data Database

Supercomputing

- Interactive- On-demand- Virtualization

- High performance- Scientific computing- Batch jobs

- Java- Distributed- Easy admin

- Indexing- Search- Atomic

Page 15: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 15

• MIT SuperCloud adds virtual machines and added security• Combines all four ecosystems without sacrificing performance

Enterprise

Big Data

MIT SuperCloud

Supercomputing

- Interactive- On-demand- Virtualization

- High performance- Scientific computing- Batch jobs

- Java- Distributed- Easy admin

- Indexing- Search- Atomic

Database

MIT SuperCloud

Page 16: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 16

• VMware is the main enterprise computing virtualization technology• Message Passing Interface (MPI) is the primary supercomputing API• System Query Lange (SQL) is the primary database API• Hadoop & Accumulo & D4M are core to government big data clouds

MIT SuperCloud

Enterprise

Big Data

- Interactive- On-demand- Virtualization

- Java- Distributed- Easy admin

Core Technologies

VMware

Hadoop

MPI

SQL

Database

Supercomputing

- High performance- Scientific computing- Batch jobs

- Indexing- Search- Atomic

D4M = Dynamic Distributed Dimensional Data Model

Page 17: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 17

Outline

• Introduction

• Cloud Computing and Challenges

• Innovative Architecture: MIT SuperCloud

• Innovative Databases: Apache Accumulo

• Innovative Software: D4M

• R&D Examples

• Conclusions

Page 18: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 18

MIT SuperCloud

• Developed to address the challenges associated with big data volume

• Cloud system allows all four ecosystems of the cloud to exist within the same computational architecture

• Key Innovations:– Shared HPC cloud capabilities– High performance– Reliable

• Brings the power of cloud computing to the HPC community

Page 19: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 19

• Allows different architectures to be dynamically combined and tested• Allows different architectures to be dynamically combined and tested

Cloud Ecosystems

Enterprise

Big Data

- Interactive- On-demand- Virtualization

- Java- Distributed- Easy admin

VMware

Hadoop

MPI

SQL

Database

Supercomputing

- High performance- Scientific computing- Batch jobs

- Indexing- Search- Atomic

MIT SuperCloud

Page 20: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 20

MIT SuperCloud

Network Storage

Scheduler

Monitoring System

Compute NodesService NodesClusterSwitch

LAN Switch

Interactive Compute Job

Interactive VM Job

Interactive Database Job

ProjectData

TX-E1

Page 21: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 21

Cloud Computing @ MIT

• The cloud computing infrastructure at Lincoln Laboratory is based on the MIT Supercloud infrastructure which allows the different cloud eco systems to co exist

• MIT SuperCloud architecture addresses the issues of big data volume

• Centerpiece of MIT SuperCloud: Accumulo database

Page 22: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 22

Outline

• Introduction

• Cloud Computing and Challenges

• Innovative Architecture: MIT SuperCloud

• Innovative Databases: Apache Accumulo

• Innovative Software: D4M

• R&D Examples

• Conclusions

Page 23: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 23

Apache Accumulo

• Highest performance open source database

• Contributed to Apache project by the NSA in 2011

• Used extensively for government applications

• Requires a schema for storing and organizing data to obtain full benefits

Page 24: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 24

Accumulo and the MIT SuperCloud

• Apache Accumulo is a high performance database used for a variety of purposes– Helps address the big data velocity challenge

• Accumulo is the centerpiece of the Common Big Data Architecture developed by MIT Lincoln Laboratory

• Key features:– Open Source– High Performance– Widely adopted – Vibrant Developer Community

• MIT Lincoln Laboratory has developed a set of tools – D4M to help researchers use Accumulo for novel research

Page 25: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 25

WarfightersOperators Analysts

Users

MaritimeGround SpaceC2 CyberOSINT

<html>

Data

AirHUMINTWeather

Analytics

Computing

Web

Files

Scheduler

Ingest & EnrichmentIngest &

EnrichmentIngest

Common Big Data Architecture- Data Velocity: Accumulo Database -

Databases

Page 26: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 26

Outline

• Introduction

• Cloud Computing and Challenges

• Innovative Architecture: MIT SuperCloud

• Innovative Databases: Apache Accumulo

• Innovative Software: D4M

• R&D Examples

• Conclusions

Page 27: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 27

High Level Language: D4Mhttp://www.mit.edu/~kepner/D4M

AccumuloDistributed Database

Query:AliceBobCathyDavidEarl

Associative ArraysNumerical Computing Environment

D4MDynamic Distributed Dimensional Data Model

A

C

DE

B

A D4M query returns a sparse matrix or a graph…

…for statistical signal processing or graph analysis in MATLAB

D4M binds associative arrays to databases, enabling rapid prototyping of data-intensive cloud analytics and visualization

Page 28: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 28

D4M

• The Dynamic Distributed Data Model– Supports database and computation systems that deal with

Big Data– Developed at Lincoln Laboratory

• Key Features:– Applies linear algebra and signal processing techniques to

databases through associative arrays– D4M data schema offers a one-stop solution for most types

of data source for any type of database– Low barrier to entry – API accessible to those even with

minimal database and/or big-data background

Page 29: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 29

Associative Arrays

• Extends associative arrays to 2D and mixed data types

A(’#a1',’#b2') = ’same_tweet'

• Key innovation: 2D is 1-to-1 with triple store(’#a1',’#b2',’same_tweet’)

• Enables composable mathematical operations

A + B A - B A & B A|B A*B

• Enables composable query operations via array indexing

A(’#al b2',:) A(’#al,',:) A(’#a* ',:)

A('#al: b2',:) A(1:2,:) A == #b2#a1

#b2

same_tweet#a1 #b2

Page 30: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 30

Data Schema

• A structure described in a language supported by the database management system

• Accumulo supports triples– How can we represent heterogeneous data types in a common data

schema?– Use the D4M schema

• D4M schema converts structured or unstructured raw data to the 3-tuple representation supported by Accumulo:

– row is a unique identifier (often some variation of a time stamp)– column is a unique representation of the data– value is typically just ‘1’

• Usually use a 4 table representation– The Edge Table, the Transpose Table, Degree Table, Raw Table

(row, column) value

Page 31: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 31

Exploded Table

row_num col1 col2 col3

001 row1col1 row1col2 word1 word2 word3

002 row2col1 row2col2 word2 word3

003 … … word1 word3

col1|row1col1 col1|row2col1 col2|row1col2 col2|row2col2 col3|word1 col3|word2 col3|word3

row_num|001 1 1 1 1 1

row_num|002 1 1 1 1

row_num|003 1 1

Use as row indices

Create columns for each unique type/value pair

col1|row1col1 col1|row2col1 col2|row1col2 col2|row2col2 col3|word1 col3|word2 col3|word3

Degree 1 1 1 1 2 2 3

  row_num|001 row_num|002 row_num|003

col1|row1col1 1    col1|row2col1      col2|row1col2 1 1  col2|row2col2   1  

col3|word1 1   1

col3|word2 1 1  col3|word3   1 1

text

row_num|001

word1 word2 word3

row_num|002

word2 word3

row_num|003

word1 word3

Tedge

TedgeDeg

TedgeT TedgeTxt

Page 32: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 32

• Key innovation: mathematical closure– All associative array operations return associative arrays

• Enables composable mathematical operations

A + B A - B A & B A|B A*B

• Enables composable query operations via array indexing

A('alice bob ',:) A('alice ',:) A('al* ',:)

A('alice : bob ',:) A(1:2,:) A == 47.0

• Simple to implement in a library (~3500 lines) in programming environments with: 1st class support of 2D arrays, operator overloading, sparse linear algebra

Composable Associative Arrays

• Complex queries with ~50x less effort than Java/SQL• Naturally leads to high performance parallel implementation

Page 33: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 33

Using D4M for Advanced Analytics

• D4M allows researchers to harness the versatility of the MIT SuperCloud architecture and speed of Apache Accumulo through the familiarity of high level languages such at MATLAB or GNU Octave.

• D4M schema provides an approach to mitigate challenges associated with big data variety

• D4M is used for a variety of applications across the Department of Defense and Intelligence Community

Page 34: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 34

• Introduction

• Cloud Computing and Challenges

• Innovative Architecture: MIT SuperCloud

• Innovative Databases: Apache Accumulo

• Innovative Software: D4M

• R&D Examples

• Conclusions

Outline

Page 35: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 35

Supporting National Security-Rapid Solution Prototyping-

336592592584179712 2013-05-20 21:21:42 20798128 kiefpief web 3b77caf94bfc81fe I am sending love to Oklahoma. And actually -- to everyone who may need it. You are loved. And you are not alone. Promise. #PrayforOklahoma336600956710027264 2013-05-20 21:54:56 35.99894978 -78.90660222 -8783842.7781526 4300476.86376416 22435220 RyanBLeslie Twitter for iPad348803787 bced47a0c99c71d0 @HaydenBigCntry RT @jiminhofe: The devastation in Oklahoma is …

Step 1: Start an instance of Accumulo and Ingest DataStep 2: Find all tweets with keyword:

>>A = Tedge(Row(Tedge(:, 'word|#prayforoklahoma,')),:);Step 3: Filter tweets by location:

>>B = A(:, 'latlon|+-003934,:,latlon|+-003979,’);Step 4: Visualize results:

>>Assoc2KML(B);

Page 36: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 36

Promoting big data discovery-Domain Agnostic Analytics-

NOISE

SIGNAL

N-D SPACE

Example background model:Power Law Graph

Goal: Find subgraph of interest using background model to identify noise

Model Background Data to Extract Signal from Observations

filter

filter

pass

dmax

- =

ObservedData

BackgroundModel of Data

ResidualData

Signal&

NoiseNoise Signal

Big Data Filtering and SamplingDetecting Subgraphs of Interest from Large Graphs

Page 37: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 37

Securing the Cloud-The Lincoln Secure and Resilient Cloud-

Analytics

Computing

Web

Files

Scheduler

Ingest & Enrichment

Ingest & EnrichmentIngest

Databases

Secure and Resilient

Communication+ Provenance

Secure andResilient

Storage

Secure and Resilient

Processing

• Big Data systems are vulnerable to a variety of attacks• Improve security of cloud systems by researching:

• Security in Communication and Provenance• Security in Data Storage• Security in Processing• Security in the underlying architecture

Page 38: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 38

Ensuring Privacy-Computing On Masked Data-

Big Data Veracity

<html>

ChallengesAnalysts

Analytics

ComputingScheduler

Ingest & EnrichmentIngest &

Enrichment

Remote Code Injection

Hypervisor Privilege Escalation

Cross VM Side Channels

Data Loss / Exfiltration

Data Integrity Attack

Current Approaches

<html>

Analytics

Computing

Files

Scheduler

Ingest & EnrichmentIngest &

EnrichmentIngest

Encryptedlink

EncryptedlinkEncrypted

storage

Encryptedstorage

Vision

<html>

Computing

Files

Scheduler

Ingest & EnrichmentIngest &

Enrichment

Compute on Encrypted

Data

Compute on Encrypted

Data

Compute on Encrypted Data

Step 1: Mask data and ingest into database>>put(Tedge, Mask(Aedge, maskcode));

Step 2: Query DB for results with masked queries>>Aedge_mt = Tedge(Row(Tedge(:,StrMask(‘word|bieber ‘, maskcode))),:);>>Atxt_mt = TedgeTxt(Row(Tedge(:,StrMask(‘word|bieber ‘, maskcode))),:);

Step 3: Unmask Results>>Aedge = Unmask(Aedge_mt, maskcode);>>Atxt = Unmask(Atxt_mt, maskcode);

Use D4M and CMD to protect the 4th V of Big Data – Veracity• Big Data systems are vulnerable to a variety of attacks• Currently encrypt data at rest but data in flight is in the clear• Compute on Encrypted Data: Data is always protected by

encryption through the system.

Page 39: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 39

Outline

• Introduction

• Cloud Computing and Challenges

• Innovative Architecture: MIT SuperCloud

• Innovative Databases: Apache Accumulo

• Innovative Software: D4M

• R&D Examples

• Conclusions

Page 40: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 40

Summary

Air and MissileDefense

HomelandProtection

Air TrafficControl

CommunicationSystems

AdvancedTechnology

SpaceControl

ISR Systemsand Technology Tactical Systems

Mission Areas:

Cyber Security

Engineering

• Lincoln Laboratory missions collect and process vast amounts of data from many sources

• MIT Lincoln Laboratory makes use of innovations in system architecture (MIT SuperCloud), database technologies (Apache Accumulo) and software (D4M) to develop technology in support of national security

Data Sources:

MaritimeGround SpaceC2 CyberOSINT

<html>

AirHUMINTWeather

Lincoln Laboratory is always interested in technical exchange with big data community!

Page 41: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 41

Backup

Page 42: Accumulo Summit 2014: Addressing big data challenges through innovative architecture, databases and software

Accumulo SummitVNG - 42

Cyber Security and Information Sciences

Human Language Technology

Cyber Security Metrics Anti-Tamper Hardware Cyber Situational Awareness

Correlation and visualization of cyber alert data makes it possible to detect and understand attacks on large, enterprise networks.

Lincoln Laboratory builds, supports, and uses cyber ranges to evaluate the performance of cyber security technology.

Metrics are defined and measured to estimate the defensive posture of enterprise-class networks.

Physically unclonable functions are used to embed cryptographic key material in a coating around a computing module permitting detection of tampering.

Net-Centric Operations

Cyber Testing and Range Development

Research and prototyping of Service-Oriented Architectures that enable the dynamic composition of systems involving complex sensors, processing and decision-support elements.

Algorithms are developed and implemented for speech and biometric applications, including language/speaker identification, machine translation, and face comparison.

S-13