chrec and novo-g12/alex award acceptance_jan 12...2 outline chrec center overview chrec sites,...

13
Alan D. George, Ph.D. Director, NSF CHREC Center Professor of ECE, University of Florida Herman Lam, Ph.D. Assoc. Professor of ECE, University of Florida 2012 NSF I/UCRC Annual Meeting CHREC and Novo-G: An Innovative and Synergistic Research Project and The World’s Most Powerful Reconfigurable Supercomputer Research highlighted in this presentation was supported in part by the I/UCRC Program of the National Science Foundation under Grant No. EEC-0642422. 12 Jan 2012

Upload: nguyenminh

Post on 23-May-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Alan D. George, Ph.D. Director, NSF CHREC Center

Professor of ECE, University of Florida

Herman Lam, Ph.D. Assoc. Professor of ECE, University of Florida

2012 NSF I/UCRC Annual Meeting

CHREC and Novo-G: An Innovative and Synergistic Research Project and

The World’s Most Powerful Reconfigurable Supercomputer

Research highlighted in this presentation was supported in part by the I/UCRC Program of the

National Science Foundation under Grant No. EEC-0642422.

12 Jan 2012

2

Outline

CHREC Center overview

CHREC sites, faculty, & students

CHREC members & memberships

Industry impact & technology transfer

Reconfigurable computing

Novo-G Overview

Machine architecture

Application acceleration

International Novo-G forum

Conclusions and Looking Ahead

I/UCRC grant originated in Sep. 2006

• 1 university site, 9 membership commitments

Strong growth in first 5 years (Phase-I)

• Grown to 4 university sites (UF, GW, BYU, VT)

• Grown to 29 members (aerospace, IT, etc.)

• Grown to 42 memberships (all full, $35K/ea)

Strong scholarship record

• >115 refereed journal & conference papers

• Several NSF CAREER awards

• Best-paper awards, keynotes, etc.

Strong graduation record

• Dozens of Ph.D. & M.S. graduates to date

• Many hired by CHREC members

• Dozens more served with members as interns

World-class facilities developed in-house

• Novo-G: world’s top reconfigurable computer

• HokieSpeed: GPU-centric supercomputer

• Pyramid: CPU-centric supercomputer

Center Mission and Theme

• R&D to advance S&T in nexus of reconfigurable,

high-performance, and/or high-performance

embedded computing (i.e., RC, HPC, HPEC)

• Computing performance, power, adaptivity,

scalability, productivity, cost, size, weight, etc.

• From space satellites to supercomputers!

4

University of Florida (lead) Dr. Alan D. George, Professor of ECE – Center Director

Dr. Herman Lam, Associate Professor of ECE

Dr. Ann Gordon-Ross, Assistant Professor of ECE

Dr. Greg Stitt, Assistant Professor of ECE

Dr. Jose Principe, Distinguished Professor of ECE and BME

Dr. Andy Li, Associate Professor of ECE

Dr. Vikas Aggarwal, Research Scientist in ECE

Brigham Young University Dr. Brent E. Nelson, Professor of ECE – BYU Site Director

Dr. Michael J. Wirthlin, Professor of ECE

Dr. Brad L. Hutchings, Professor of ECE

Dr. Michael Rice, Professor of ECE

George Washington University Dr. Tarek El-Ghazawi, Professor of ECE – GWU Site Director

Dr. Vikram Narayana, Assistant Research Professor in ECE

Virginia Tech Dr. Peter Athanas, Professor of ECE – VT Site Director

Dr. Wu-Chun Feng, Associate Professor of CS and ECE

Dr. Patrick Shaumont, Assistant Professor of ECE

Dr. Heshan Lin, Senior Research Associate in CS

Most importantly,

CHREC features an

exceptional team of

>40 graduate

students spanning

our 4 university sites.

CHREC Faculty

5

CHREC Members 1. AFRL Munitions Directorate (4)

2. AFRL Sensors Directorate

3. AFRL Space Vehicles Directorate (2)

4. Altera

5. AMD

6. Arctic Region Supercomputing Center (2)

7. Army RD&E Command

8. Boeing Research & Technology

9. GiDEL

10. Harris

11. Honeywell (2)

12. Intel

13. Lockheed Martin MFC

14. Lockheed Martin SSC

15. Lockheed Martin SVIL

16. Los Alamos National Laboratory (2)

17. Mentor Graphics

18. Monsanto

19. NASA Goddard Space Flight Center

20. NASA Marshall Space Flight Center

21. National Instruments (2)

22. National Security Agency (4)

23. Northrop-Grumman Aerospace Systems

24. Oak Ridge National Laboratory (2)

25. Office of Naval Research

26. Sandia National Laboratories

27. SEAKR Engineering

28. Veritomyx

29. Xilinx (2)

42 memberships ($35K/ea) from 29 members in 2011

Industry Impact & Tech Transfer

12 projects spanning broad areas of RC, HPC, HPEC Performance – optimizing speed, power, scalability, adaptability

Parallel algorithms, applications, architectures (FPGA, GPU, Manycore)

Productivity – reducing design complexity for developers and users

Design concepts, tools, modeling, middleware, compilation, integration

Aerospace – addressing unique needs in this key community

Space-based processing, reliable architectures, partial reconfiguration

Industry impact CHREC drives & influences many industry programs

Annual surveys routinely cite millions of $ per year in industry impact

Many very close relationships between sites & members

Technology transfer to date Dozens of industry personnel hires, dozens of internships

>115 new papers and >30 new tools crafted with/for members

6

7

What is Reconfigurable Computing? General characteristics:

Architecture adapts to match unique needs of each app

e.g., FPGA; “Custom Fit” usage strategy; reconfigurable by task or app

Relatively new and revolutionary paradigm of computing

Limited but growing list of available devices, tools, systems, and apps

Technical advantages:

GREAT performance when app not well suited to fixed processor

Why? Customized hardware parallelism (width, depth), data precision

(size, format), operations and units (type, quantity), memory structure, etc.

LOWER energy consumption than fixed processors (CPU, GPU)

Technical disadvantages:

Relatively new and immature paradigm of computing

Programming complexity with adaptive hardware

Causes: inherent with novelty of approach; “newness” of field and tools

8

What is Novo-G? Motivation

Growing computational demands in many science and

engineering domains becoming principal bottleneck

Scalable RC systems (e.g., Novo-G) uniquely capable

of both high performance and low energy, cooling, TCO

Goals: Investigate, develop, evaluate, & showcase:

Most powerful RC machine ever fielded for research

Innovative suite of productivity tools for app development

Impactful set of scalable kernels/apps in key science areas

Emphases

Performance (system), Productivity (concepts/tools), Impact (apps)

Theme

Novo-G is an RC-centric machine (not merely CPUs with accelerators!)

Features FPGA/RAM coupling (4.25 or 8.5 GB in 3 banks coupled to each FPGA)

Features FPGA/FPGA coupling (up to 8 coupled; e.g., systolic array, virtual FPGA)

CPUs and GPUs serve in supporting role (e.g., I/O, preprocessing, postprocessing)

9

Novo-G Machine

1 head-node server (1U) with:

• 2 Xeon E5520 2.26 GHz quad-core CPUs

• 24GB ECC DDR3, 3 x 1TB SATA2

24 compute servers (4U), each with:

• Xeon E5520 quad-core CPU

• 6GB ECC DDR3, 250GB SATA2

• 2 GiDEL ProcStar-III PCIe x8 cards, each

with 4 Stratix-III E260 FPGAs and

4x4.25 = 17GB RAM

6 compute servers (4U), each with:

• 2 Xeon E5620 2.4GHz quad-core CPUs

• 16GB ECC DDR3, 2TB SATA2

• 4 GiDEL ProcStar-IV PCIe x8 cards, each

with 4 Stratix-IV E530 FPGAs and

4x8.5 = 32GB RAM

• GTX-480 GPU

* Our cluster vendor is Ace Computers

Novo-G Annual Growth

2009: 96 top-end Stratix-III FPGAs,

each with 4.25GB SDRAM

2010: 96 more Stratix-III FPGAs,

each with 4.25GB SDRAM

2011: 96 top-end Stratix-IV FPGAs,

each with 8.5GB SDRAM

2012: 96 more Stratix-IV FPGAs,

each with 8.5GB SDRAM

192

FPGAs

96

FPGAs

10

Impactful Novo-G App Research: BioRC examples

Each 3D chart (for Smith-Waterman and Needleman-Wunsch) illustrates performance of a single FPGA under varying

input conditions. Each table shows scaling performance with varying number of FPGAs under optimal input conditions.

Jaguar supercomputer @ ORNL: 224,256 cores (2.4 GHz Hexacore Opterons) @ 6.95 MWs

K Computer in Japan (largest supercomputer in world): 548,352 cores ; “uses enough electricity to power almost 10,000 homes at a cost of about $10 million per year” (New York Times - 06/19/11)

Baseline: 192∙225, length 850 Sequence Comparisons

Software Runtime: 11,026 CPU∙hours on 2.4GHz Opteron

# FPGAs Runtime (sec) Speedup

1 47,616 833

4 12,014 3,304

96 503 78,914

128 391 101,518

192 (est.) 270 147,013

Needleman-Wunsch (NW)

By contrast, with 192+192 FPGAs (Summer 2012), for key BioRC

apps, Novo-G speedup approaching 500K cores @ <16KW

Baseline: Database length 226 Bases v 512, length 500 Seqs

Software Runtime: 7,126 CPU hours on 2.4 GHz Opteron

# FPGAs Runtime (sec) Speedup

1 25,927 989

4 6,482 3,958

96 271 94,639

128 206 124,710

192 (est.) 137 187,492

Smith-Waterman (SW)

with trace-back Optimal alg. for local alignment

of DNA and RNA sequences

Needleman-Wunsch (NW) Optimal alg. for global alignment

of DNA and RNA sequences

Novel systolic array

architecture Complex-controller performance

with simple-controller overhead

Extendable across FPGAs using

neighbor bus

Computation of trace-back for

SW overlapped with hardware

processing of next sequence

Smith-Waterman w/ trace-back

Data

bas

e L

en

gth

(N

ucle

oti

de

s)

Sp

ee

du

p

Smith-Waterman with trace-back

Sequence Length (Nucleotides)

Technology Transfer

CHREC BLAST Toolset (Monsanto) Computation demand in bioinformatics

becoming prohibitive bottleneck

Novo-BLAST: accelerates BLAST’s word

matching algorithm up to 19x on single

Stratix III

BLAST-wrapped SW: Smith-Waterman

core (previous slide) with BLAST

wrapper; SSEARCH-like accuracy with

BLAST-like performance

Code transfer & field test in 1st qtr. 2012

Isotopic Pattern Calculator

(Veritomyx) Dominating bottleneck in proteomics app

for cancer research

Measured up to 470x speedup for

single Stratix IV FPGA

Code transfer & field test in 1st qtr. 2012

11

Broad Range of Novo-G App Research Broad range of Novo-G research

BioRC Smith-Waterman (w/ or w/o

traceback), Needleman-Wunsch, Needle-Distance, Isoformic proteomics, BLASTp (collaboration with Boston University), CHREC BLAST Toolset (Novo-BLAST and BSW: BLAST-wrapped SW)

FinRC: e.g., Barrier options using Heston model

DSP: e.g., Information-Theoretic approach to image segmentation

Domain exploration in other science and engineering fields

Very promising results (speed, energy)

50x to 5000x speedup per FPGA

vs. fast CPU core

International Novo-G Forum

Founded in January 2010

International community research forum to explore performance,

productivity, and sustainability of RC at scale

Consists of 11 academic teams using common platform

Each team working on its own research apps and/or tools

Each team has one or more local Novo-G quad-FPGA boards

Remote access to big Novo-G @ Florida for large-scale runs

12

Boston University

Clemson University

Federal University of

Pernambuco (Brazil)

University of Florida

George Washington University

University of Glasgow (UK)

Imperial College (UK)

Northeastern University

University of South Carolina

University of Tennessee

Washington University in St. Louis

RC: revolutionary paradigm of computing Architecture adapts to match unique needs of each app

CHREC Novo-G reconfigurable supercomputer Most powerful RC machine ever fielded for research

World-class speedups for key apps in science and engineering

Rivaling the world’s largest conventional supercomputers

But at a tiny fraction of their size, power, cost, and weight

Synergistic activity

Leverages private, state, and federal funding resources

Close partnership with CHREC member organizations:

Altera, GiDEL, Monsanto, Veritomyx, et al.

Novo-G Forum: international team of 11 universities

Novo-G future: science and engineering domain exploration New RC-amenable apps in BioRC, DSP, and FinRC

Explore new promising domains e.g., computational chemistry, cryptanalysis

Conclusions and Looking Ahead

13

CORBI anyone?