GPU Technical Conference: Spring 2018 – San Jose, CA
Speakers: Greg McCullough and Aaron Sant-Miller
MARCH 28, 2018
F IG HT ING DO MAIN G ENERAT IO N AL G O RIT HMS (DG AS) W IT H MACHINE L EARNING
Collaboration space, Alexandria, VA
CYBER ATTACKS ARE HARD TO DETECT AND REQUIRE MULTIPLE MODELS, INFORMED BY CYBER EXPERTISE
2Booz Allen Hamilton
The Challenges1. Increasing reliance on IT systems, and the development of new systems, expands the attack surface every day.2. The cyber domain and our adversaries are rapidly evolving, where the defenses of yesterday are quickly outdated.3. The technical depth of the domain is significant, demanding high end technical talent to just understand the problem.
Today, the average cyber breach is detected more 250 days after the intrusion. That leaves adversaries 250 days to steal data, compromise the network, and create more open attack vectors to disrupt the mission.
Booz Allen’s Cyber Precog: Network speed alerting through cyber-informed ML model ensembling
1. Optimized DL Edge Models – live at the edge, examine all DNS traffic, and flag logs that may have a malicious domain2. Bayesian Behavioral Models – develop behavioral baselines for endpoints, and alert analysts when an endpoint
navigates to a dangerous domain and deviates from its established behavioral baseline
Effective cyber defense with machine learning and automation is not built on data science skill alone. Cyber expertise must be fused with data science and software development tradecraft.
Proven, deployed, and operational capability and service offering.We’ll walk through an adware campaign we caught last week for one of our partners.
We can effectively fight cyber adversaries with intelligent automation and machine learning, decreasing the time to intrusion detection.
Our DGA use case:
AGENDA WHO WE ARE: BOOZ ALLEN CYBER AND DATA SCIENCE
THE CHALLENGES OF CYBER DEFENSE
DGAS AND AI-ENABLED DEFENSIVE TACTICS
DEEP LEARNING ON MALICIOUS DOMAINS
ADAPTIVE BAYESIAN LEARNING FOR BETTER ALERTING
CYBER PRECOG: VIDEO DEMONSTRATION
BOOZ ALLEN CYBER: OUR AI-ENABLED FUTURE STATE
Booz Allen Hamilton Internal 3
BOOZ ALLEN HAMILTON: WHO WE ARE
4Booz Allen Hamilton
Greg McCullough: Director of Cyber Machine Intelligence
Aaron Sant-Miller: Lead Data Scientist
Greg McCullough is the Director of Cyber Machine Intelligence Capability Development at Booz Allen Hamilton. He has over ten years of experience developing cyber capabilities across the Defense market, while building, deploying, and scaling government custom products and solutions focused on securing networks and IT systems. Most recently, he has driven compliance automation and key cyber integrations across the entire Federal market. He holds a BS in Computer Science from Butler University, a BS in Electrical Engineering from Purdue University, and an MS in Computer Science from George Washington University.
Aaron Sant-Miller is a Lead Data Scientist at Booz Allen Hamilton with a specialization in applied mathematics, machine learning, and statistical modeling. He has architected, developed, and deployed data science solutions and machine learning suites across a wide-range of domains, including tax fraud detection, climate science trend forecasting, cybersecurity risk scoring, and professional athlete performance prediction. Aaron’s current areas of research are focused on Bayesian modeling design, synthetic data generation, and neural network-based time series modeling. He holds a BS and an MS in Applied and Computational Mathematics and Statistics from the University of Notre Dame.
About Booz Allen Hamilton CyberFor more than 100 years, business, government, and military leaders have turned to Booz Allen Hamilton to solve their most complex problems. We are at the forefront of the cyber frontier, relentlessly pursuing innovative solutions that make the world a saf er place to live, serve, and do business. With decades of mission intelligence combined with the most advanced tools available, we prote ct industry and government against the attacks of today, and prepare them for the threats of tomorrow. To learn more, visit BoozAllen.com.
BOOZ ALLEN DELIVERS SOLUTIONS WITH A FUSION OF CYBER EXPERTISE AND DATA SCIENCE TRADECRAFT
5Booz Allen Hamilton
Analytics driven by statistical rigor
Computational optimization
Machine learning model engineering
Cyber defense operations
Cyber engineering and integration
Cybersecurity compliance
Booz Allen Cyber ML Capability Offerings
Cybersecurity Data science
Booz Allen works to fuse capability offerings across domains to maximize solution impact
EVOLVING CHALLENGES IN CYBERSECURITY DEMAND CREATIVE AND INTELLIGENT DEFENSIVE POSTURE
6Booz Allen Hamilton
Cyber attacks can cause significant damage
An Evolving Landscape of Challenges
Attack surfaces are rapidly expanding – growing dependence on IT systems and rapidly evolving novel technologies expose our networks in new ways while increasing our dependence on vulnerable systems
The work force is saturated – adding more bodies to defensive efforts no longer improves defense due to a lack of cyber talent and diminished returns from increased human labor and manual defensive tactics
Organizations are inundated with cyber tools – well-funded organizations have the money to buy new cyber tools and do so, but they are unable to effectively manage or integrate the capabilities of these tools
Attackers are talented and increasingly more sophisticated – adversaries are getting more creative, developing dynamic attacks that can circumvent existing rules-driven and structurally-defined cyber defenses
Cyber compromises are having real financial and physical impacts at an organizational and individual level.Creative adversaries have the ability to compromise an endpoint, access a network, steal and ransom data or accounts, and dangerously expose personal information to the open market. Many recent high profile attacks demonstrate this impact.
An evolving landscape demands innovation and creative, new defensive tactics to advance defensive posture in a challenging and impactful cyber warzone.
--- This is the Booz Allen Cyber Mission ---
7Booz Allen Hamilton
DGAS EXEMPLIFY TRANSFORMATIVE ADVERSARIAL TACTICS THAT DEMAND INNOVATIVE AND ADAPTIVE CYBER DEFENSE
X
New tactics demand new defenses
Adversaries have developed creative tactics that easily circumvent rules-based defenses. To counter more adaptive attack methods, we must develop our own adaptive and innovative techniques to prevent attacks that transform every minute.Machine learning and AI enable our defenses to evolve and react to new tactics in real time, hardening our defenses.
Adversaries AdversariesRulesCompromise AI Defense Security
Domain Generation Algorithms (DGAs) are algorithms that can rapidly create a large number of domain names that act as a midpoint between a user and malware.
➢ Ever-changing and adaptive: Algorithms can rapidly generate new domains of new structures with regularity➢ Inconspicuous at the surface-level: Algorithms can concatenate dictionary words or normative character patterns➢ Large in number and historically tagged: Large pools of known DGAs are available and have been reverse engineered
To defend against DGAs: • Defenses must understand underlying domain
characteristics, but also evolve and adapt rapidlyWe have at our disposal: • Large amounts of tagged data from uncovered
and reverse engineered DGAs
Adaptable defense counters adaptive offense
This is an ideal use case for AI-powered cyber defense
8Booz Allen Hamilton
Proven DL capabilities are the building blocksAcademic research and our Booz Allen deployments have proven the efficacy of these models in implementation and test.When trained at scale, deep neural networks can learn the underlying framework used by a DGA to build out a breadth of malicious domains, moving beyond memorization of “known bads” toward an understanding of adversarial toolkits
CNNS AND LSTMS ARE PROVEN SOLUTIONS, WHERE GPUS ENABLE INLINE MODEL INFERENCE AT NETWORK SPEED
1. Yu et al. (2017). “Inline DGA Detection with Deep Networks.” IEEE International Conference on Data Mining. http://doi.org/10.1109/ICDMW.2017.96
Proven Model Architectures1
Both the LSTM and the CNN use simple, lightweight architectures (see Yu et al 2007)• Capable of powerful
performance in holdout test• Simplicity allows for rapid
inference at network speed
Training ApproachFuses multiple approaches into a complete learning scheme1. Offline training: Bambenek
DGA Dataset (4M)2. Automated Update: Open-
web intel collection3. Network Tailoring
Optimized Hardware Deployment• Lives on one NVIDIA DGX-1, across 8 GPUs• Deployed and scaled using MXNet framework
• Proven to handle 3.5 GB/s throughput
Performance• 97 percent holdout balanced accuracy• Proven detection in network deployments
If an endpoint is compromised, its behavior will change as a result of the intrusion.Cyber MI must flag potential compromise and alert when behavioral models notice a simultaneous change.
9Booz Allen Hamilton
Network traffic off sensor
Database Layer (e.g. Timescale / PostgreSQL & MapD):All traffic is held for periods dependent on degree of connotated risk of malicious action
Analytic Layer (e.g. CNN and Behavioral Model):
Model outputs
Model alerts
Application Layer
Analyst inputs
Existing SIEM (e.g. Splunk)
DGA Detection:Models flag logs that reflect potential compromise
Behavioral Models:Bayesian models flag endpoints that break from endpoint norm
Cyber Precog allows analysts to investigate and flag legitimate alerts
All model outputs and Precog alerts integrate seamlessly with existing SIEMs
Flagged traffic
BOOZ ALLEN’S CYBER PRECOG COMBINES EDGE MODELS WITH ADAPTIVE BEHAVIORAL MODELS TO CURATE TAILORED ALERTS
High false positive rates have stigmatized ML in cyberHistorically, machine learning in cyber has been stigmatized due to high false positive rates of ML-enabled alerting systems.As the adversarial tactics are rapidly changing, models that train offline and are slow to update rarely perform well and often provide outdated and incorrect alerts. The prevalence of poorly deployed systems stigmatized ML among SMEs.
Booz Allen’s Cyber Precog: Combining network speed alerting with adaptive, endpoint behavioral learning1. Optimized Edge Models – live at the edge, examine all DNS traffic, and flag logs that may have a malicious domain2. Bayesian Behavioral Models – develop behavioral baselines for endpoints, and alert analysts when an endpoint traverses
to a flagged domain and deviates from its established behavioral baseline
DEMONSTRATION
Booz Allen Hamilton 10
11Booz Allen Hamilton
ADVERSARIES ARE SMART – COMBATTING DGAS IS ONLY ONE PIECE OF THE CYBERSECURITY PUZZLE THAT NEEDS ADAPTIVE DEFENSE
Beaconing
PowerShell Scripting
Network Scanning
DNS Exfiltration
Port/Protocol Anomalies Graph/Network
ConnectionAnomalies
DGA Detection
The solution demands many piecesComprehensive, adaptive cyber defensive posture requires collaborative work between ML engineers and SMEs.1. Cyber talent and domain expertise to shape MI solutions2. Rapid innovation to keep pace with adversarial advancement
Optimized, operationalized
capabilities
Identified gaps and needs,
responses to new adversary tactics
4. A Scaled, ML-enabled Cyber Defensive Suite (Illustrative)2. Rapid prototypingRequired to keep pace with adversaries
3. Network OptimizationTailored solutions on high velocity data
1. Cyber talent as the drivers and shapers of MI
AI Defense
Cyber Talent
Optimized, MI-informed cyber defensive posture:➢ Leverages MI and AI➢ Ensembles models in a cyber
informed manner➢ Demands domain acumen
for both analysis and design
Proven Prototypes
3. Network optimization for large, fast data4. Integrated, broad suite covering many diverse use cases
BACK UP
Booz Allen Hamilton 12
BOOZ ALLEN HAMILTON CYBER
13Booz Allen Hamilton
MODEL ARCHITECTURE DEEP DIVE
14Booz Allen Hamilton Internal
Model Architures
Model LayersEmbedding Layer – Learns a dense vector representation of vectorized domain names. Names that are more similar to each other are closer in vector space.Dropout Regularization Layer –Prevents model overfitting by setting a random subset of neurons to zero.Convolution Layer –Convolves filters over the embedded inputs to form an activation map, which represents the locations of discovered features in the data embeddingLong-short Term Memory (LSTM) Layer - Allows the model to learn relevant features (patterns of characters) from domain names and capture dependencies between non-adjacent characters.Dropout Regularization Layer –Prevents model overfitting by setting a random subset of neurons to zero.Dense – Connects all nodes in the preceding later (also used to perform final classification into two classes)Sigmoid Activation - Simple, classical transformation to assign a probability that a domain is malicious
Training ApproachAdam Neural Network Optimization –Exploits both the benefits of adaptive gradient optimization (per-parameter learning rate) and root mean square propagation (per-parameter learning rates are adapted based on the average of recent magnitudes of the weights). Does so by adapting the learning rate using the second moment of the gradient (i.e. variance) by calculating the exponential moving average of the gradient and squared gradient.
➢ CNN: Vectorized Domain -> Embedding -> Convolution (1D) -> Dropout -> Flatten -> Dense -> Dense -> Sigmoid Activation➢ LSTM: Vectorized Domain -> Embedding -> LSTM -> Dropout -> Dense -> Dense -> Sigmoid Activation