data tactics unified dataspace architecture and description
Embed Size (px)
DESCRIPTION
TRANSCRIPT

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Data Tactics
Unified DataSpace

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Cloud

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
SYSTEMS ENGINEERING
• Data Ingestion Frameworks (structured, unstructured, semi-structured)
• Semantic DataSpace Enrichment
• Cloud Management Systems (CMS)
• Cloudbase/Accumulo
– Pig (Big Data) Plug-in
• Dissemination and Reporting Tools
• Data Mining, Exploitation, and Correlation Tools
Systems Engineering & Integration
SYSTEM INTEGRATION
• Ingestion
– Generalized Ingest / NiagraFiles
• Geospatial Capabilities
• Biometric Capabilities

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Cloud Experience
17 Enclaves at SECRET//NOFORN
• 3 in Tyson’s• 1 at GISA, Ft. Bragg• 2 in Hawaii• 2 in Germany• 7 at Aberdeen• 2 in Afghanistan
6 Enclaves at TS//SCI• AF TENCAP• NRL• DARPA• INSCOM• DCGS-A• DHS OI&A
4 Enclaves for NATO ISAF• 2 in Afghanistan• 1 at GISA, Fort Bragg• 1 in Germany
US BICES Cloud in GermanyOver a dozen at UNCLASS//FOUO
• Supporting real-world missions on contract
• At various levels of complexity
Cloud Domains is where we live
Data, is the Hard Problem

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Data – The Hard Part

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Data Tactics has delivered solutions that manage PETABYTES of data and provide mission relevant analytics, metrics and user interfaces • DESIGN, DEVELOPMENT AND INTEGRATION OF REFERENCE
ARCHITECTURES– Ghost Machine
– Stratus
• SECURE DATABASE ARCHITECTURES– Secure Entity Database (SED)
– Defense Cross-Domain Analytic Capability (DCAC)
• DATA MIGRATION, EXTRACTION, TRANSFORM AND PARSING
• FEDERATED DATA MANAGEMENT– Federated Search, Multi-Source / Multi-Vendor Integration
– Storage Cluster Management
• DATA MINING AND FORENSIC ANALYSIS
• SPATIAL, MULTI-DOMAIN, AND CLOUD DATA SERVICES
BigData Architecture

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Data Models
Unified DataSpaceThe Wild• Data sources with rich data & semantic context locked in domain silos• Data tightly coupled to data-models
• Data-models tightly coupled to storage models
Silos isolated by• Implementation technology
• Storage structure• Data representation
• Data modality
Segment 2 - Data Description
Segment 1 - Artifact Description
Segment 3 - Model Description
Unstructured Data
Rich semantic context
Rich data context
IntegrationEnrichmentExploitationExplorationAcross all sources
Structured Data

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Unified DataSpaceHigh-Level Conceptual Model of the DataSpace
and Ingest/Extraction Flows
Segment 1 - Artifact Semantics
. .
ARTIFACT
Segment 2 - Data Semantics .
. .
. .
.
TERM STATEMENT
Segment 3 - Model Semantics . .
.
. .
CONCEPT PREDICATE
UsesUses
ARTIFACT_ASSOCIATION
Segment 0 - Artifacts
Metadata
Data+
Metadata
Uses
Semantics+
Metadata
Ingest Extraction
22
CONCEPT_ASSOCIATION
2PREDICATE_ASSOCIATION
2
SOURCE

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Unified DataSpace
High-Level Conceptual Model of the DataSpace and Ingest/Extraction Flows
Segment 1 - Artifact Semantics
. .
ARTIFACT
Segment 2 - Data Semantics .
. .
. .
.
TERM STATEMENT
Segment 3 - Model Semantics . .
.
. .
CONCEPT PREDICATE
UsesUses
ARTIFACT_ASSOCIATION
Segment 0 - Artifacts
Metadata
Data+
Metadata
Uses
Semantics+
Metadata
Ingest Extraction
22
CONCEPT_ASSOCIATION
2PREDICATE_ASSOCIATION
2
SOURCE
•Segment 0 is an artifact store (i.e., binary representation of artifacts).
•Segment 1 represents artifact semantics and includes artifact metadata and associations between the artifacts. Indexing of Segment 1 supports search on text content, geospatial, and artifact meta data.
•Segment 2 represents data and semantics of structured data elements extracted from artifacts. Indexing of Segment 2 supports search on properties of entities (e.g., Person, Location) based on their properties and relationships.
•Segment 3 represents data-models extracted from artifacts and models used for aligning, disambiguating, and enriching the elements of Segments 1 and 2.

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
• DDF – looks at data in the following ways– Mention: A chunk of data, either physically located within a tangible
artifact, or contained within an analyst’s mind • “Washington” at offset x in file Y
– Sign: A representation of all disambiguated mentions that are identical except for their indexicality
• E.g., “Washington”
– Concept: An abstract idea, defined explicitly or implicitly by a source data-model
• E.g., City, Person, Name, Address, Photo
– Predicate: An abstract idea used to express a relationship between “things” • E.g., isCity, isPerson, hasName, hasAddress, hasPhoto
– Term: A disambiguated sign abstracted from the source artifact or asserting analyst
• E.g., Washington Person; Washington Location
– Statement: Encodes a binary relationship between a subject (term) and an object mediated by a predicate
• E.g.,[Washington, Person] hasPhoto [GeorgeWashingtonImage.jpg]
Data Description Framework

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Unified DataSpace

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Data Model Example

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
DataSpace Workbench

WWW.DATA–TACTICS.COM ARCHITECT – ENGINEER – INTEGRATE – SOLUTIONS © 2012 Data Tactics
Elastic Data Ingest
QueueLoader
Artifact Processor
Persistence Manager
Index Manager
Error Manager
Java Messaging Service
Artifact Processor Queue
Persistence Manager Queue
Index Manager Queue
Error Manager Queue
File System
UDS Components
Custom Components
Artifact Processor Modules
Persistence Manager Modules
Hadoop DFS
BigTable
Lucene