fighting domain generation algorithms (dgas) with machine...

14
GPU Technical Conference: Spring 2018 – San Jose, CA Speakers: Greg McCullough and Aaron Sant-Miller MARCH 28, 2018 FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE LEARNING

Upload: others

Post on 21-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

GPU Technical Conference: Spring 2018 – San Jose, CA

Speakers: Greg McCullough and Aaron Sant-Miller

MARCH 28, 2018

F IG HT ING DO MAIN G ENERAT IO N AL G O RIT HMS (DG AS) W IT H MACHINE L EARNING

Collaboration space, Alexandria, VA

Page 2: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

CYBER ATTACKS ARE HARD TO DETECT AND REQUIRE MULTIPLE MODELS, INFORMED BY CYBER EXPERTISE

2Booz Allen Hamilton

The Challenges1. Increasing reliance on IT systems, and the development of new systems, expands the attack surface every day.2. The cyber domain and our adversaries are rapidly evolving, where the defenses of yesterday are quickly outdated.3. The technical depth of the domain is significant, demanding high end technical talent to just understand the problem.

Today, the average cyber breach is detected more 250 days after the intrusion. That leaves adversaries 250 days to steal data, compromise the network, and create more open attack vectors to disrupt the mission.

Booz Allen’s Cyber Precog: Network speed alerting through cyber-informed ML model ensembling

1. Optimized DL Edge Models – live at the edge, examine all DNS traffic, and flag logs that may have a malicious domain2. Bayesian Behavioral Models – develop behavioral baselines for endpoints, and alert analysts when an endpoint

navigates to a dangerous domain and deviates from its established behavioral baseline

Effective cyber defense with machine learning and automation is not built on data science skill alone. Cyber expertise must be fused with data science and software development tradecraft.

Proven, deployed, and operational capability and service offering.We’ll walk through an adware campaign we caught last week for one of our partners.

We can effectively fight cyber adversaries with intelligent automation and machine learning, decreasing the time to intrusion detection.

Our DGA use case:

Page 3: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

AGENDA WHO WE ARE: BOOZ ALLEN CYBER AND DATA SCIENCE

THE CHALLENGES OF CYBER DEFENSE

DGAS AND AI-ENABLED DEFENSIVE TACTICS

DEEP LEARNING ON MALICIOUS DOMAINS

ADAPTIVE BAYESIAN LEARNING FOR BETTER ALERTING

CYBER PRECOG: VIDEO DEMONSTRATION

BOOZ ALLEN CYBER: OUR AI-ENABLED FUTURE STATE

Booz Allen Hamilton Internal 3

Page 4: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

BOOZ ALLEN HAMILTON: WHO WE ARE

4Booz Allen Hamilton

Greg McCullough: Director of Cyber Machine Intelligence

Aaron Sant-Miller: Lead Data Scientist

Greg McCullough is the Director of Cyber Machine Intelligence Capability Development at Booz Allen Hamilton. He has over ten years of experience developing cyber capabilities across the Defense market, while building, deploying, and scaling government custom products and solutions focused on securing networks and IT systems. Most recently, he has driven compliance automation and key cyber integrations across the entire Federal market. He holds a BS in Computer Science from Butler University, a BS in Electrical Engineering from Purdue University, and an MS in Computer Science from George Washington University.

Aaron Sant-Miller is a Lead Data Scientist at Booz Allen Hamilton with a specialization in applied mathematics, machine learning, and statistical modeling. He has architected, developed, and deployed data science solutions and machine learning suites across a wide-range of domains, including tax fraud detection, climate science trend forecasting, cybersecurity risk scoring, and professional athlete performance prediction. Aaron’s current areas of research are focused on Bayesian modeling design, synthetic data generation, and neural network-based time series modeling. He holds a BS and an MS in Applied and Computational Mathematics and Statistics from the University of Notre Dame.

About Booz Allen Hamilton CyberFor more than 100 years, business, government, and military leaders have turned to Booz Allen Hamilton to solve their most complex problems. We are at the forefront of the cyber frontier, relentlessly pursuing innovative solutions that make the world a saf er place to live, serve, and do business. With decades of mission intelligence combined with the most advanced tools available, we prote ct industry and government against the attacks of today, and prepare them for the threats of tomorrow. To learn more, visit BoozAllen.com.

Page 5: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

BOOZ ALLEN DELIVERS SOLUTIONS WITH A FUSION OF CYBER EXPERTISE AND DATA SCIENCE TRADECRAFT

5Booz Allen Hamilton

Analytics driven by statistical rigor

Computational optimization

Machine learning model engineering

Cyber defense operations

Cyber engineering and integration

Cybersecurity compliance

Booz Allen Cyber ML Capability Offerings

Cybersecurity Data science

Booz Allen works to fuse capability offerings across domains to maximize solution impact

Page 6: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

EVOLVING CHALLENGES IN CYBERSECURITY DEMAND CREATIVE AND INTELLIGENT DEFENSIVE POSTURE

6Booz Allen Hamilton

Cyber attacks can cause significant damage

An Evolving Landscape of Challenges

Attack surfaces are rapidly expanding – growing dependence on IT systems and rapidly evolving novel technologies expose our networks in new ways while increasing our dependence on vulnerable systems

The work force is saturated – adding more bodies to defensive efforts no longer improves defense due to a lack of cyber talent and diminished returns from increased human labor and manual defensive tactics

Organizations are inundated with cyber tools – well-funded organizations have the money to buy new cyber tools and do so, but they are unable to effectively manage or integrate the capabilities of these tools

Attackers are talented and increasingly more sophisticated – adversaries are getting more creative, developing dynamic attacks that can circumvent existing rules-driven and structurally-defined cyber defenses

Cyber compromises are having real financial and physical impacts at an organizational and individual level.Creative adversaries have the ability to compromise an endpoint, access a network, steal and ransom data or accounts, and dangerously expose personal information to the open market. Many recent high profile attacks demonstrate this impact.

An evolving landscape demands innovation and creative, new defensive tactics to advance defensive posture in a challenging and impactful cyber warzone.

--- This is the Booz Allen Cyber Mission ---

Page 7: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

7Booz Allen Hamilton

DGAS EXEMPLIFY TRANSFORMATIVE ADVERSARIAL TACTICS THAT DEMAND INNOVATIVE AND ADAPTIVE CYBER DEFENSE

X

New tactics demand new defenses

Adversaries have developed creative tactics that easily circumvent rules-based defenses. To counter more adaptive attack methods, we must develop our own adaptive and innovative techniques to prevent attacks that transform every minute.Machine learning and AI enable our defenses to evolve and react to new tactics in real time, hardening our defenses.

Adversaries AdversariesRulesCompromise AI Defense Security

Domain Generation Algorithms (DGAs) are algorithms that can rapidly create a large number of domain names that act as a midpoint between a user and malware.

➢ Ever-changing and adaptive: Algorithms can rapidly generate new domains of new structures with regularity➢ Inconspicuous at the surface-level: Algorithms can concatenate dictionary words or normative character patterns➢ Large in number and historically tagged: Large pools of known DGAs are available and have been reverse engineered

To defend against DGAs: • Defenses must understand underlying domain

characteristics, but also evolve and adapt rapidlyWe have at our disposal: • Large amounts of tagged data from uncovered

and reverse engineered DGAs

Adaptable defense counters adaptive offense

This is an ideal use case for AI-powered cyber defense

Page 8: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

8Booz Allen Hamilton

Proven DL capabilities are the building blocksAcademic research and our Booz Allen deployments have proven the efficacy of these models in implementation and test.When trained at scale, deep neural networks can learn the underlying framework used by a DGA to build out a breadth of malicious domains, moving beyond memorization of “known bads” toward an understanding of adversarial toolkits

CNNS AND LSTMS ARE PROVEN SOLUTIONS, WHERE GPUS ENABLE INLINE MODEL INFERENCE AT NETWORK SPEED

1. Yu et al. (2017). “Inline DGA Detection with Deep Networks.” IEEE International Conference on Data Mining. http://doi.org/10.1109/ICDMW.2017.96

Proven Model Architectures1

Both the LSTM and the CNN use simple, lightweight architectures (see Yu et al 2007)• Capable of powerful

performance in holdout test• Simplicity allows for rapid

inference at network speed

Training ApproachFuses multiple approaches into a complete learning scheme1. Offline training: Bambenek

DGA Dataset (4M)2. Automated Update: Open-

web intel collection3. Network Tailoring

Optimized Hardware Deployment• Lives on one NVIDIA DGX-1, across 8 GPUs• Deployed and scaled using MXNet framework

• Proven to handle 3.5 GB/s throughput

Performance• 97 percent holdout balanced accuracy• Proven detection in network deployments

Page 9: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

If an endpoint is compromised, its behavior will change as a result of the intrusion.Cyber MI must flag potential compromise and alert when behavioral models notice a simultaneous change.

9Booz Allen Hamilton

Network traffic off sensor

Database Layer (e.g. Timescale / PostgreSQL & MapD):All traffic is held for periods dependent on degree of connotated risk of malicious action

Analytic Layer (e.g. CNN and Behavioral Model):

Model outputs

Model alerts

Application Layer

Analyst inputs

Existing SIEM (e.g. Splunk)

DGA Detection:Models flag logs that reflect potential compromise

Behavioral Models:Bayesian models flag endpoints that break from endpoint norm

Cyber Precog allows analysts to investigate and flag legitimate alerts

All model outputs and Precog alerts integrate seamlessly with existing SIEMs

Flagged traffic

BOOZ ALLEN’S CYBER PRECOG COMBINES EDGE MODELS WITH ADAPTIVE BEHAVIORAL MODELS TO CURATE TAILORED ALERTS

High false positive rates have stigmatized ML in cyberHistorically, machine learning in cyber has been stigmatized due to high false positive rates of ML-enabled alerting systems.As the adversarial tactics are rapidly changing, models that train offline and are slow to update rarely perform well and often provide outdated and incorrect alerts. The prevalence of poorly deployed systems stigmatized ML among SMEs.

Booz Allen’s Cyber Precog: Combining network speed alerting with adaptive, endpoint behavioral learning1. Optimized Edge Models – live at the edge, examine all DNS traffic, and flag logs that may have a malicious domain2. Bayesian Behavioral Models – develop behavioral baselines for endpoints, and alert analysts when an endpoint traverses

to a flagged domain and deviates from its established behavioral baseline

Page 10: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

DEMONSTRATION

Booz Allen Hamilton 10

Page 11: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

11Booz Allen Hamilton

ADVERSARIES ARE SMART – COMBATTING DGAS IS ONLY ONE PIECE OF THE CYBERSECURITY PUZZLE THAT NEEDS ADAPTIVE DEFENSE

Beaconing

PowerShell Scripting

Network Scanning

DNS Exfiltration

Port/Protocol Anomalies Graph/Network

ConnectionAnomalies

DGA Detection

The solution demands many piecesComprehensive, adaptive cyber defensive posture requires collaborative work between ML engineers and SMEs.1. Cyber talent and domain expertise to shape MI solutions2. Rapid innovation to keep pace with adversarial advancement

Optimized, operationalized

capabilities

Identified gaps and needs,

responses to new adversary tactics

4. A Scaled, ML-enabled Cyber Defensive Suite (Illustrative)2. Rapid prototypingRequired to keep pace with adversaries

3. Network OptimizationTailored solutions on high velocity data

1. Cyber talent as the drivers and shapers of MI

AI Defense

Cyber Talent

Optimized, MI-informed cyber defensive posture:➢ Leverages MI and AI➢ Ensembles models in a cyber

informed manner➢ Demands domain acumen

for both analysis and design

Proven Prototypes

3. Network optimization for large, fast data4. Integrated, broad suite covering many diverse use cases

Page 12: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

BACK UP

Booz Allen Hamilton 12

Page 13: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

BOOZ ALLEN HAMILTON CYBER

13Booz Allen Hamilton

Page 14: FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH MACHINE ...on-demand.gputechconf.com/gtc/2018/presentation/s8985-cyber-def… · FIGHTING DOMAIN GENERATION ALGORITHMS (DGAS) WITH

MODEL ARCHITECTURE DEEP DIVE

14Booz Allen Hamilton Internal

Model Architures

Model LayersEmbedding Layer – Learns a dense vector representation of vectorized domain names. Names that are more similar to each other are closer in vector space.Dropout Regularization Layer –Prevents model overfitting by setting a random subset of neurons to zero.Convolution Layer –Convolves filters over the embedded inputs to form an activation map, which represents the locations of discovered features in the data embeddingLong-short Term Memory (LSTM) Layer - Allows the model to learn relevant features (patterns of characters) from domain names and capture dependencies between non-adjacent characters.Dropout Regularization Layer –Prevents model overfitting by setting a random subset of neurons to zero.Dense – Connects all nodes in the preceding later (also used to perform final classification into two classes)Sigmoid Activation - Simple, classical transformation to assign a probability that a domain is malicious

Training ApproachAdam Neural Network Optimization –Exploits both the benefits of adaptive gradient optimization (per-parameter learning rate) and root mean square propagation (per-parameter learning rates are adapted based on the average of recent magnitudes of the weights). Does so by adapting the learning rate using the second moment of the gradient (i.e. variance) by calculating the exponential moving average of the gradient and squared gradient.

➢ CNN: Vectorized Domain -> Embedding -> Convolution (1D) -> Dropout -> Flatten -> Dense -> Dense -> Sigmoid Activation➢ LSTM: Vectorized Domain -> Embedding -> LSTM -> Dropout -> Dense -> Dense -> Sigmoid Activation