the fusion of aka. why we got tired of peter ungaro ... … · 2016 delivery: ~9,500 nodes of knl...
TRANSCRIPT
AKA. Why We Got
Tired of Hearing Talks
Like Katie Gave and What We
Are Doing About It!!
The Fusion of Supercomputing
and Big Data
Peter Ungaro President & CEO
1 ) Will analytics in the era of “big data” look like analytics has looked since the relational database was developed in 1 969 by Dr. Codd?
2) Will “big data” look the same in the
world of engineering and scientific research as it will in the world of
business?
2 Questions...
Cray’s Vision: The Fusion of Supercomputing and Big & Fast Data
Modeling The World
Cray Supercomputers solving “grand challenges” in science, engineering and analytics
Compute Store Analyze
Data Models
Integration of datasets and math models for
search, analysis, predictive modeling and
knowledge discovery
Math Models
Modeling and simulation augmented with data to
provide the highest fidelity virtual reality
results
Data-Intensive
Processing
High throughput event processing & data
capture from sensors, data feeds and
instruments
Cray Technology Innovations Interconnect and optimization software to address the data transfer bottleneck at large scale
Software to productively manage and extract performance out of thousands of processors used as a single system
Systems Management & Performance Software
Greenest x86 supercomputers with innovative cooling and upgradability to improve TCO Packaging
System Interconnect
Combines multiple processing architectures into a single,
scalable system Extending Adaptive Supercomputing
to Big Data Workloads
Solutions for Advanced Analytics
Data Warehouses +Extensions NoSQL Databases
Big Data Solutions
These solutions can overlap, but also can be very complementary as each has strengths & weaknesses
Hadoop / MapReduce Graph Analytics
System Architecture Differences… Supercomputing
• Scalable computing w/high BW, low-latency, global memory architectures • Tightly integrated processor-memory-interconnect & network storage • Minimize data movement – load the “mesh” into memory • Move data for loading, check-pointing or archiving • “Basketball court sized” systems
Large-scale Data Analytics
• Distributed computing at largest scale • Divide-and-conquer approaches on Service Orientated Architectures • Maximize data movement-- scan/sort/stream all the data all the time • Lowest cost processor-memory-interconnect & local storage • “Warehouse sized” clouds
Hadoop Distro
Supercomputing Performance
High Value
Hadoop
Step 1: Enabling Hadoop Solutions for Analytics
Launched Cray Framework for Hadoop on Cray XC30 & CS300
Supercomputers &
Analytic Appliances
Public & Private Clouds
Step 2: Enabling More Complexity & Capability …Big Data Fast Data
Global Memory + Fast
Interconnects
Fast Data
SAN Interconnects
Enterprise Data
(structured)
GRID
Big Data
CLOUD
LAN/WAN Interconnects
Big Data Appliance for Real-Time Data Discovery
Detecting cyber threats
Customers
Call Center Events
Work Orders
Call Escalations
Truck Rolls Set-Top Box feeds
Supervisor Intervention
3rd Party Service Tech
AVR Failure
CSR Resolution
Cabinet Failure
Residential Accounts
Web Service
Analyze customer churn Find new fraud patterns Discover new drug re-purposing opportunities
The Baseball World Has Changed…
Box Score
Play-by-Play
Pitch f/x
Source: MLB.com and Baseball-Reference 14
New (Big) Data + New Technology
• New Data
– 20X from Moneyball – Pitch f/x (20/Pitch) – Hit f/x (5/hit)
+ • New Technology
– Graph Analytics – Urika
(aka. supercomputer)
Evaluating Batter/Pitcher
Match Ups
Multi-disciplinary, multi-institutional
Data Intensive / Data Driven Science
Computational Simulation of Complex Phenomena
Theoretical Research
Evolution of Science and Knowledge Discovery
Experimental Research
Sensors, devices, simulations, social…
NERSC-8 “Cori” System ● To be installed in 2Q16 ●KNL many-core processor
● > 9,300 Compute nodes ● > 27 PF ● Data partition of 2,000 Haswell nodes
●Cray Lustre Filesystem ● 28 Petabytes ● 430 GB/sec performance
● Focus on design is sustained performance and perf/watt
●Cray-NERSC-Intel COE will focus on science applications
Trinity System at LANL/Sandia
● 2015 Delivery: ~9,500 Haswell nodes ● Large Sonexion/Lustre solution
● 80 PB capacity ● 1.7 TB/sec bandwidth
● Tightly-integrated Burst Buffer SSD nodes ● 3.7 PB capacity ● 3.3 TB/sec bandwidth
● 2016 Delivery: ~9,500 nodes of KNL ● Each with 16GB of fast on-package memory ● All integrated into a single 42 PF system
Integrated HPC Environments are the capability that will turn data in to insight and discovery
Analyze
Compute
Store
Our Vision…
Build a world-class integrated supercomputing environment that enables transformational computing
across a broad set of science, engineering and advanced analytics (big data) applications