18th airborne big data analytics tech brief_june 2 2015
TRANSCRIPT
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 1
XVIIIth Airborne Corps -Enterprise Data Management
John Welby, CEO & Chief Strategist/Warfighter-Support, [email protected]
Mobile: +1 919/247.7891
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 2
Agenda• The “New Data World”• Current Systems (Discuss)• Define Types of Storage• Big Data Analytics (BDA) Primer• BDA for Strategic, Operations & Tactical Intelligence• Components of BDA/Enterprise Data Management
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 3
Army Challenges• Policy• Laws• Culture• Access to Resources from Secure Mobile Devices
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 4
Project Goals & Background• Design, test and implement a common user experience across
echelons, formations and phases [integrate w/ SOCOM’s TACLAN]• Solutions for supporting smaller combat teams • Extend services to tactical edge [integrate w/ Digital Edge
program]• Deploy Small Teams Anywhere in the World in Austere
Environments• Self-Defending Networks• Everything into the Cloud
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 5
Project Optimization w/ MAPR•Most Reliable Hadoop Solution• Unique Globalization Architecture• Scales in size for very large data center deployment [CENTCOM] to smaller
deployments [FOB] to very small [Forward Deployed Personnel]• Information is available to harness, store, analyze and use to increase mission
performance
• “The Perfect Big Data Platform” • Hadoop / NoSQL / SQL-on-Hadoop
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 6
Project Goals & Background• Network-Enable:
• 24/7 Situational Awareness• Reachback• “Project, People, & Technology”• Ramp Up to Support the Warfighter
• Codify Home Station Missions• Moving Mobility Down to the Field (e.g. A/D running in vehicles)• Level of Acceptable Risk Assessments• Always ON Global Infrastructure• Theater Intelligence Command (6) Combatant Commander Intel feeds to Home• Military Utility or Internet of Things (IoT) – sensors on everything
(vehicles/facilities/soldiers
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 7
System Requirements (per XVIIIth)Current Tactical Field Communications Kit Upgrade Req’s
More powerful / additional capabilities
Lighter (current system approx. 500lbs)
Support up to 20 paratroopers
Satellite communications
LMR voice
Active Directory
Storage
Self-Contained Power
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 8
Types of StorageDefinition of TermsBenefits to XVIIIth AirborneQuestions to Ask
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 9
“Hot” / “Warm” / “Cold” Storage• Hot storage is storage used for frequently accessed data that
can be accessed very quickly. An example is Flash Array Storage.•Warm storage is storage with medium IOPS & medium BW
such as hard disk drives.• Cold storage is storage used for infrequently accessed data.
An example is magnetic tape.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 10
What is Big Data?
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 11
Big Data is…• Big data is a broad term for data sets so large or complex that
traditional data processing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set.
Source: Wikipedia
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 12
Big Data Analytics• The Army, like any other entity, generates terabytes &
petabytes of data daily.• U.S. intelligence agencies and the military are increasingly
leveraging analytics platforms based on machine learning to sift through data sources like social media. In the vernacular of the Pentagon, these efforts are generally referred to as open source intelligence initiatives.• U.S. intelligence community is spending billions of dollars on
geospatial intelligence
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 13
Machine Data (aka log data)
Intelligence Data• Full-Motion Drop-Zone Video• Video Analytics• Logs• Image Processing• Geo-Spatial Processing• Graph Analytics• Text Processing• Sentiment Analysis
“Maintenance” Data• Hardware & Software Inventory• Software Version• Patch Updates• End-of-Life Information• Supply Levels• Vehicle Maintenance Records• Compliance Information
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 14
Gleaning Strategic, Operational & Tactical Intelligence from Machine Data• One of the biggest challenges for intelligence analysts is the
soaring volume of unstructured open source data as the bad guys resort to Facebook and Twitter to communicate and recruit.• Employ Splunk, MAPR on Cisco UCS to store, analyze,
package and disseminate timely, actionable intel to Commanders.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 15
Benefits of Next Generation Army Network• Turn data into actionable information – unified, timely information and
predictions• Assure the 18th’s creed, “America’s Contingency, Anywhere in 18 hours” is
fulfilled with maximum impact, consistency, transparency, reliability and effectiveness• Focus resources intelligently by putting them in the right place, on the right
day and at the right time• Results rigorously measured and commanders held accountable for their
performance• More effectively interface with Allies and Conventional ForcesSome info on following slides taken from CW5 Rick Pina’s Keynote Address at WWT Geek Day (May 2015)
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 16
Benefits of Next Generation Army Network• Create “Commanders Risk Reduction Dashboard”• consolidate info from multiple Army databases• Soldiers can’t move if “at risk1”
• Company & Battalion Commanders (28 feeds)• Cyber Network Security• capture every packet and analyze later
• Tactical Operational Center (TOC)• Mobile applications reside in TOC• Secure delivery of mobile applications• Working with DISA on Army App Store (DISA has its own App Store)
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 17
Benefits of Next Generation Army Network• DISA is partnering with Army and Air Force to change the way DoD
secures and protects information networks• Firewall/Intrusion Detection/Enterprise Management/VRF/Big Data Analytics
• Partner with and take advantage of DISA’s upgrade of DISN:• Global Infrastructure – 100Gb Fiber• All Installations = 10Gb connections• Major Installations = 20Gb connections
• Security Upgrades/Consolidation w/ Joint Regional Security Stacks [JRSS]• 25 Top-Level Architecture [TLA] Stacks (* future, now approx. 1000 stacks)
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 18
Benefits of Next Generation Army Network• Big Data Support Down-Range [adding additional capabilities leveraging existing infrastructure]• Commercial Satellites• Wideband Satellites• Line-of-Sight Microwave• Distributed Nodes• 4G Wireless
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 19
Questions to Ask• What Army programs are already in play / can we leverage?• Collaborate with partners that have performed in the past [IGOV]• Where is XVIIIth/Army in current program lifecycle?• What is the key mission challenge(s) to solve?• What if the XVIIIth had access to data it currently cannot access?• What “other” data will enhance XVIIIth’s mission?• What existing capabilities do we have now?• What is the state of my data, the data I want to “predict” from?
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 20
Questions to Ask•What do we get for the cost, what do we need?
[Spend more $$$ for pre-packaged or build what you need]
ADVANTAGE DIS-ADVANTAGE COMMENT
Pre-Packaged End-to-End Solution More $$$
Requires fewer in-house resources &
expertise
A Platform for BuildingLess $$$
More FlexibilityRequires resources to
build & customizeMore legwork but you are not paying for stuff
you don’t need
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 21
How Do We Harness BDA?• Turn data into actionable information – unified, timely
information and predictions• Help missions to have greater impact, consistency,
transparency, reliability and effectiveness
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 22
Enterprise Data Management Components
Cisco UCSHadoopMapRSplunk
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 23
Cisco UCS Reference Architecture
• UCS C220/C240 M4 Servers• Nexus 2232 Fabric Extenders• UCS 6200 Series Fabric Interconnect
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 24
MapR Technologies, Inc.
•MapR Distribution is the combination of a tremendous amount of innovation in which MapR participates as part of the Apache Open Source Community along with MapR’s innovative data platform and management control system.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 25
MapR Improvements
hadoop Distributed File System•Many limitations with HDFS• Java Virtual Machine (JVM)
issues• Single point-of-failure• Read-and-Append only file
systems (not R/W)
MapR-FS
• Native NFS support – any application that can read/write to an NFS mount can plug into this architecture• No single point-of-failure (C++)• Application data is
automatically replicated
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 26
MapR Security
• Pluggable security model• Linux pluggable authentication modules (PAMs)• Kerberos is an option1
1Not optimal for long-running jobs
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 27
MapR – Zeta Application Architecture• Simplifies• Data Protection Schemes• How to Backup Data• Failure Recovery• Running Multiple Instances of Software
• Better hardware utilization = lower OpEx• Google runs on a Zeta Architecture
(over 2 Billion container deployments/week)
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 28
MapR and hadoop•MapR works with and adopts open-source community
developments into an integrated solution offering • hadoop is a scalable centralized data hub / distribution solution• Runs same problem on multiple computers • Uses new more flexible tools and existing tools
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 29
Benefits of MapR and hadoop• Faster time-to-value• Smaller hardware footprint (COTS hardware)• Reliability • Real snapshots for data versioning, data protection & mirroring (DR)
• hadoop is a scalable centralized data hub• Runs work/same problem on multiple computers • Uses new more flexible tools and existing tools
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 30
MapR • “Traditional” Data
Warehouse accepts:• SQL data
•MapR w/ hadoop:• SQL• Machine-learned data• Video Analytic data• Relational Schemas• Files• Logs• Click Streams• Geo-Spatial data• Sentiment Analysis• WASP scanner data• KACE inventory data
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 31
This architecture allows scaling
up to Google’s level.
Google’s Example
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 32
MapR and Big Data Analytics in Action
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 33
AADHAAR
In the course of attaining the milestone of 600 million users, the Aadhaar technology backend has become the largest biometric identity repository in the world and the first to provide an online, anytime anywhere, multi-factor authentication service. A strong technology foundation based on open architecture enabled the rapid evolution of the Aadhaar system. It was important to document all aspects of Aadhaar technology and make it available in public domain. The three white papers published by the UIDAI Technology Centre fulfill this need.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 34
A partnership model – UIDAI approach leverages the existing infrastructure of government and private agencies across India. The UIDAI is the regulatory authority managing a Central Identity Data Repository (CIDR), which will issue Aadhaar numbers, update resident information, and authenticate the identity of residents as required. UIDAI partners with agencies such as central and state departments who are the 'Registrars' for the UIDAI. Registrars conduct the enrollment camps using UIDAI software and procedures, upload the encrypted enrolment data to the CIDR to de-duplicate resident information, and help seed the Aadhaar number into their beneficiary databases.
AADHAAR Strategy
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 35
• Process to ensure no duplicates – Registrars send the applicant's encrypted data packet to the UIDAI data centers for de-duplication. Aadhaar enrollment system performs a search on key demographic fields and on the biometrics for each new enrolment, to ensure uniqueness. • Process to keep data up to date – Incentives in the Aadhaar system are aligned
towards a self-cleaning mechanism. The existing patchwork of multiple databases in India gives individuals the incentive to provide different personal information to different agencies. Since de-duplication in the Aadhaar system ensures that residents have only one chance to be in the database, individuals are incentivized to provide accurate data. This incentive becomes especially powerful as benefits and entitlements are linked to the Aadhaar number. Regular usage of identity across many services naturally incentivizes the resident to keep Aadhaar system up to date.
AADHAAR Strategy
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 36
• Online authentication – UIDAI offers a strong form of online authentication. When residents wanting to avail a service require identity/address verification, agencies can compare demographic and biometric information of the resident with the record stored in the central database. • Technology undergirds the UIDAI system – Technology systems have a
major role across the UIDAI infrastructure. Large scale biometric de-duplication, online authentication, data security, analytics, etc require well designed, secure, and scalable systems.
AADHAAR Strategy
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 37
Splunk
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 38
GPS, RFID, Hypervisor, Web Servers, Email, Messaging, Clickstreams, Mobile, Telephony, IVR, Databases
Report and
analyze
Custom dashboard
s
Monitor and alert
Ad hoc search
Real-timeMachine Data
Sensors, Telematics, Storage, Servers, Security devices, Desktops, CDRs
DeveloperPlatform
External Lookups
Troop/Supply/
Geo-Spatial
Info
Network Segments
/ Honeypot
s
Datastores
Splunk, The Platform For Machine Data
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 39
Splunk App for Enterprise SecurityPre-built searches, alerts, reports, dashboards, threat intel feeds, workflow
Incident Investigations & Management
39
Dashboards and Reports
Statistical Outliers Asset and Identity Aware
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 40
Splunk• Provides a More Complete View of Threat Landscape• US Army Authorized Splunk with Certificate of Networthiness (CoN)• Real-time search and analysis of terabytes of data across the Army’s IT infrastructure. • Patented Time-Services Indexing Technology (borrowed from MapReduce1)
• The Army and approximately 70% of all US federal agencies rely on Splunk for real-time visibility of their IT data for security, compliance and application availability
• Splunk App for FISMA• Used by the DOJ and NASA
1Please reference Appendix for MapReduce information.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 41
Splunk on SIPR
•Army Enterprise Certificate of Net Worthiness • Networthiness Certification applies to all organizations fielding, using, or managing ISs on the Army Enterprise
Architecture/LandWarNet (LWN), to include Commercial Off-the-Shelf (COTS) and Government Off-the- Shelf (GOTS). • In accordance with AR 25-1, paragraph 6-8 activities must obtain a Certificate of Networthiness (CoN) before they
connect hardware/software to the LWN. • Therefore, Splunk does not need to go through JITC.
•Over 50 U.S. Army customers using Splunk
05/01/2023 ****** Warfighter-Support, LLC Confidential*******
Splunk FISMA App1
42
421Based on NIST 800-53 Rev 3
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 43
Licensing Splunk
• Based on how much [new] data Splunk indexes in a 24-hr period• Data is ingested into Splunk once and goes against data licensed; Data indexing, manipulation
and modeling thereafter is unlimited. There is only one charge per unit of data.• Overages do not “turn off” your system.• License Enforcement (30-day period)
• 1st overage message to admin• 2nd overage message to admin• 3rd overage message to admin• 4th overage message to admin• 5th overage correlator is turned off• Contact Splunk Account Manager or Systems Engineer to get a “reset key.” No charge, the intent is to spur a conversation
between 18th Airborne and Splunk regarding capacity planning.• Theoretically, can exceed license 48x/year w/out contacting Splunk.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 44
Leidos• Information Dominance / Command and Control • Consistently ranked among the top federal systems integration contractors, Leidos is the company
that "pulls it all together" for U.S. forces and allies. As the lead integrator for the Global Command and Control System (GCCS), for example, we help give warfighters an integrated picture of the battlespace and commanders greater capability to deploy a U.S. fighting force around the globe at any time and provide it with the information and direction to complete its mission.
• As the military's key command, control, computing, communications, and intelligence (C4I) system, the GCCS uses the Defense Information Infrastructure Common Operating Environment (DII COE) to support joint warfighting needs. Helping to ensure that C4I maintains its pace with technology, Leidos leads several significant projects to bring leading edge DII COE-compliant technologies to the GCCS community. These include Defense Advanced Research Projects Agency (DARPA) efforts supporting senior levels of command, such as the National Command Authority and Joint Staff, down through Joint Task Force Commanders and service components, such as the Marine Corps' Chemical/Biological Warfare Incident Response Force.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 45
Wrapping it Up…
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 46
The Solution
• Cisco UCS is the hardware platform•MapR – “hadoop in a box” w/ Zeta architecture• Hadoop [can be used to] provide the file system and
programming platform. • Splunk is the “search engine on steroids.” • Splunk and MapR “make Hadoop easier.”• Overcomes the Big Data skills gap
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 47
Use Cases
Combat & Command• Timely & Applicable Intel for
cutting orders• Policies• Blend troop movement w/
SitReps, historical intel, Sat images, UAV data and provide to commanders on a single pane of glass
Company• True Numbers• Data Validity
• Add Info
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 48
Wrapping it up…• Big Data Analytics will enable the 18th Airborne Corps to
gather, store, analyze machine data efficiently and effectively to increase mission success and potentially save lives.•Machine data such as UAV images, HUMINT and social
media data is stored in a Cloudera Enterprise Hub, extracted into Splunk [or Hadoop], Transformed in Splunk [or Hadoop], and searched [by Splunk] and presented in a usable form that becomes actionable intelligence.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 49
Thank You!
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 50
Additional Technologies
• LISP – Locator ID Separation Protocol• “MAC in MAC” routing• Natively supports Equal Cost Multi-Path routing (Dijkstra’s SPF algorithm)• Alternative to running Spanning Tree• IS-IS for Layer 2 switching – computes SPT• Splits Locator info from Identifier
• Locator Endpoint ID Overlay
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 51
Locator/ID Split and LISP• Addresses today combine location and identity semantics in a single 32-bit or 128-bit number• Separating Location and Identity changes this…
• Provide a clear separation at the Network Layer betweenwhat we are looking for vs. how best to get there
• Translation vs. Tunneling is a key question• Network Layer Identifier: WHO you are in the network
• long-term binding to the thing that they name, does not change often at all• Network Layer Locator: WHERE you are in the network
• Think of the source and destination “addresses” used in routing and forwarding• WHERE you are can change WHO you are should be the same
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 52
Appendix
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 53
Training Requirements
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 54
Training Recommendations - SplunkFunction Course Title Education /Experience Delivery Duration
(hours)
Administrator For Splunk Administrators
Some college preferred/some network administration
experience preferredeLearning/WBT/
Instructor-led 51.5
Architect For Splunk Architects
Associates Degree, Network/Programming
experience
eLearning/WBT/Instructor-led 78.5
Info Sec For Enterprise Security Customers
Associates Degree, Network/Security/Programming
experience
eLearning/WBT/Instructor-led 83.5
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 55
Training Recommendations - MapRFunction Course Title Education /Experience Delivery Duration
(hours)
AdministratorSome college preferred/some
network administration experience preferred
eLearning/WBT/Instructor-led
ArchitectAssociates Degree,
Network/Programming experience
eLearning/WBT/Instructor-led
Info SecAssociates Degree,
Network/Security/Programming experience
eLearning/WBT/Instructor-led
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 56
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 57
Digital Forensic Tools
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 58
Digital Forensic Tools• Dshell – Army internal analysis
framework using Python running on Linux.• Purpose – help analysts
investigate compromises within their environments
• Cisco’s OpenSOC Security Analytics Framework.• Designed to consume and
monitor massive amounts of network traffic and machine “exhaust” data of a Data Center.• Network analysis plug-in
available to analyze network traffic at multiple layers of the OSI stack.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 59
Digital Forensic Tools• AccessData’s Forensic Toolkit (FTK)• Support for Microsoft’s Volume
Shadow Copy (VSC)• Retrieve metadata for deleted files• Chronology of how documents,
user activity, programs changed over time• Geomapping – data virtualization
feature
Use Case:• Retrieving information after a
disk has been wiped clean by an anti-forensics tool• After cleaning the HD showed no
evidence of the proprietary data• Examining VSCs allowed
recovery of destroyed Registry files that proved the proprietary data had been accessed
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 60
Hadoop’s MapReduce Technology• Hadoop MapReduce is a software framework for easily writing
applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.• A MapReduce job usually splits the input data-set into
independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 61
Hadoop• A Distributed, Fault-Tolerant Framework for Storing and Analyzing Data.
Composed of:1. Hadoop File System (HDFS)2. MapReduce Application Engine / Programming Framework• Allows code to be written that Hadoop can process in a massively parallel way.
• Very broadly distributed, very efficient programming and storage of LARGE datasets.
• Hadoop does the heavy-lifting and batch processing of the MASSIVE amounts of data.
05/01/2023 ****** Warfighter-Support, LLC Confidential******* 62
Big Data Extract, Transform & Load (ETL)
KACE Inventory
Data
WASPScanner
Data