automata processing: accelerating big data
DESCRIPTION
Many of today’s most challenging computer science problems—such as those involving very large data structures, unstructured data, random access or real-time performance requirements—require highly parallel solutions. The current implementation of parallelism can be cumbersome and complex, challenging the capabilities of traditional CPU and memory system architectures and often requiring significant effort on the part of programmers and system designers. For the past seven years, Micron Technology has been developing a hardware co-processor technology that can directly implement large-scale Non-deterministic Finite Automata (NFA) for efficient parallel execution. This new non-Von Neumann processor, currently in fabrication, borrows from the architecture of memory systems to achieve massive data parallelism, addressing complex problems in an efficient, manageable method. On September 17, the Wall Street Technology Association (WSTA) hosted a seminar for Financial IT professionals entitled Delivering Big Data. As part of that event, Micron’s Dan Skinner delivered an introductory on this revolutionary new technology, the growing ecosystem, as well as potential applications in the area of computational finance and data analytics.TRANSCRIPT
1 | ©2014 Micron Technology, Inc. September 26, 2014
• Presented by: Dan Skinner
• Director, Business Development
• Micron Technology, Inc.
Automata Processing: Accelerating Big Data
2 | ©2014 Micron Technology, Inc.
Customers demand high performance for analytics. Increasing levels of parallelism drive complexity in system
architectures.
Massive scale requires aggressive power targets.
Big Data Presents A Unique Challenge for Memory Systems
Five Big Technology Trends
September 26, 2014
BIG DATA CLOUD NETWORKING MOBILE
MACHINE TO
MACHINE
3 | ©2014 Micron Technology, Inc.
A Repetitive Cycle…
September 26, 2014
The Consistent Message
CPU Vendor System OEM
“Memory is the
bottleneck!”
“We need faster
memory!”
The Response
Memory Industry
“Sure, we can do that!”
1970 Today
Broadside Addressing
Multiplexed Addressing
Fast Page Mode
Extended Data Out
Synchronous DRAM
Innovations in memory interfaces…
… have been critical to improving performance.
4 | ©2014 Micron Technology, Inc. September 26, 2014
The New Standard for Memory Performance: Hybrid Memory Cube
OEM’s Enablers Tools
Micron’s revolutionary approach combines logic + memory; breaks through the “Memory Wall”
Provides 15X the bandwidth of a DDR3 module
Uses 70% less energy per bit than existing memory technologies
Reduces the memory footprint by nearly 90% compared to today’s RDIMMs
HMC Consortium: A Growing Ecosystem
5 | ©2014 Micron Technology, Inc. August 2011
Higher speed memory interfaces
Complex algorithms to minimize traffic
Multiple channel memory interfaces
Advanced high speed signaling techniques
And on, and on, and on…
Working harder and faster is the common approach to ‘getting over the wall’.
Hybrid Engine Store Becomes a
Flexible Computational
Engine
Input Input
The ‘Store’
Instructions, Data & Variables
The ‘Engine’
Fixed Computational
Pipeline
Memory Bottleneck
(Memory) (Processor)
The Memory Wall Keeps Getting Higher
6 | ©2014 Micron Technology, Inc.
Staying a Step Ahead Requires New Technologies
September 26, 2014
Fact: The ability to generate and transport information has vastly exceeded our capacity to analyze that same information.
Fast, accurate analysis of data provides the winning edge in financial markets
7 | ©2014 Micron Technology, Inc.
Swamped with Data: Three Examples
September 26, 2014
Processing complexity and throughput requirements prevent information from being analyzed.
Sentiment Analysis: (Speed)
Internet Wall Street
Bioinformatics: (Complexity)
DNA Database
Surveillance (Speed & Complexity)
Cameras Monitor
8 | ©2014 Micron Technology, Inc.
Breaking the Cycle
September 26, 2014
Big Data Pushes Memory to the Limit
CPU Vendor System OEM
“Memory is the
bottleneck!”
“We need faster
memory!”
New Response
Micron Technology
“Let’s rethink the problem”
The modern relationship between processor and memory was conceived to avoid complications associated with physical reconfiguration of ENIAC.
Since the mid 1940’s, most computer systems have been built on this basic architectural concept. The role of memory in systems was firmly cast.
Conclusion: important advancements can be made if we challenge this deeply rooted historical concept.
9 | ©2014 Micron Technology, Inc.
Introduction to Automata Processing
Hardware implementation of non-deterministic finite automata or NFA (with additional features)
A massively parallel, scalable, two dimensional fabric comprised of 48K processing elements per chip, each programmed to perform a pattern matching and activation task each cycle
Exploits the very high and natural level of parallelism found in memory devices
Addresses complex computational problems with unprecedented parallelism and performance
Deployable in single-chip, module, and multi-module forms
The Automata Processor (AP) is a programmable silicon device capable of performing very high-speed, comprehensive search and
analysis of complex, unstructured data streams.
10 | ©2014 Micron Technology, Inc.
What is an NFA?
• Finite automaton is a set of states and transition rules that respond to input.
Produces a unique computation (or run) of the automaton for each input string
Non-determinism allows multiple concurrent paths through the automaton.
This is very powerful, handles combinatorial problems
• Micron’s AP adds counters and Boolean elements to handle increased problem complexity without sacrificing capacity
11 | ©2014 Micron Technology, Inc.
Automata Equivalence
• Any nondeterministic machine can be modeled as deterministic at the expense of exponential growth in the state count.
Today’s supercomputers model NFA as a DFA, traversing every edge to find the solution. This creates an explosion in memory space.
• SNORT example: 100 NFA nodes replace 10,000 DFA nodes
^C U A ^C
*
Deterministic Finite Automaton (DFA)
Nondeterministic Finite Automaton (NFA)
^[AU]
A U A A
^A A ^[ACU]
C
U
A
A
^[AU]
U
C
A
C
C
^A
^[AC] A
^[AC]
^[AC]
12 | ©2014 Micron Technology, Inc.
Programmer Productivity
September 26, 2014
Pattern #1
Pattern #2
Pattern ’n’
Parallelization of automatons requires no special consideration by the user. Each automaton operates independently upon the input data stream.
.
.
.
.
.
.
.
.
.
13 | ©2014 Micron Technology, Inc.
GPGPU
CPU CPU
Structured Mathematical Floating Point
Unstructured Random Comparison
High Parallelism
Low Parallelism
Automata Processor Positioning
• The Automata Processor excels where the demand for highly parallel processing and unstructured data intersect
Example: String matching from data services (email, twitter, facebook, voice communications, etc.) to provide:
Sentiment Analysis (Financial Services)
Evidentiary finding (Legal Services)
Threat detection (Security Services)
September 26, 2014
14 | ©2014 Micron Technology, Inc.
Example: Bioinformatics
• Massively parallel problem space
Human genome mapping ~100 base pair reads to 3.2 billion base pair reference genome
Comparisons across genomes
Prosite protein sequence patterns mapped to Micron Automata Processor
Professor Srinivas Aluru is leading research on
Automata Processors in bioinformatics applications
15 | ©2014 Micron Technology, Inc.
Breakthrough Performance
Planted Motif Search Problem
Automata Processor UCONN - BECAT Hornet Cluster
Processors 48 (PCIe Board)+CPU 48 CPU (Cluster/OpenMPI)
Power 245W-315W1 >2,000W1
Cost TBD ~$20,0001
Performance (25,10) 12.26 minutes2 20.5 minutes
Performance (26,11) 13.96 minutes2 46.9 hours
Performance (36,16) 36.22 minutes2 Unsolved
1 Micron Technology Estimates, Not including Memory of 4GB DRAM /Core 2 Research conducted by Georgia Tech (Roy/Aluru)
Planted Motif Search - a leading “NP Complete” problem in bioinformatics
Solutions involving high match lengths and substitution counts are often presented to HPC clusters for processing
Independent research predicts the Automata Processor significantly outperforms a multi-core HPC cluster in speed, power and estimated cost
16 | ©2014 Micron Technology, Inc.
Problems Aligned with the Automata Processor
September 26, 2014
Applications requiring deep analysis of data streams containing spatial and temporal information are often impacted by the memory wall and will benefit from the
processing efficiency and parallelism of the Automata Processor.
Network Security: Millions of patterns Real-time results Unstructured data
Bioinformatics: Large operands Complex patterns Unstructured data
Video Analytics: Highly parallel operation Real-time results Unstructured data
Data Analytics: Highly parallel operation Real-time results Unstructured data
17 | ©2014 Micron Technology, Inc.
Automata Processor: Support & Tools
September 26, 2014
PCIe Development Board Industry Standard PCIe bus interface Capacity for up to 48 AP’s Large FPGA capacity DDR3 for local storage
Workbench Tool Converts schematic automata to Micron ANML description language
Software Development Kit AP Optimization, loading & debugging tools & compiler.