high performance computing, clusters, and productivity

23
High Performance Clusters LNXI SOS8 Presentation - April 04 High Performance Computing, Clusters, and Productivity

Upload: fox

Post on 25-Feb-2016

43 views

Category:

Documents


1 download

DESCRIPTION

High Performance Computing, Clusters, and Productivity. HPC Market Reality. The market represented by high performance computing can represent a sustainable business provided effective leveraging of commodity technologies - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: High Performance Computing, Clusters, and Productivity

High Performance Clusters

LNXI SOS8 Presentation - April 04

High Performance Computing, Clusters, and Productivity

Page 2: High Performance Computing, Clusters, and Productivity

2LNXI SOS8 Presentation - April 04

HPC Market Reality The market represented by high performance computing can

represent a sustainable business provided effective leveraging of commodity technologies

The success of Linux clusters and their rapid growth is not only due to their use in the scientific community, but their growing use by commercial entities for day to day production

History has shown that special purpose architectures targeted solely for the HPC community are generally not widely adopted by the commercial world due their high price/productivity

System Productivity must be considered before, during and after installation - designed into the architecture and support models

Vendor involvement, especially for Linux clusters, is critical throughout life of system for maintaining productivity

Page 3: High Performance Computing, Clusters, and Productivity

3LNXI SOS8 Presentation - April 04

The technology was compelling…

BBC (03/02/69) - The supersonic airliner, Concorde, has made a "faultless" maiden flight. The Anglo-French plane took off from Toulouse and was in the air for just 27 minutes before the pilot made the decision to land.

Page 4: High Performance Computing, Clusters, and Productivity

4LNXI SOS8 Presentation - April 04

…but eventually the bottom line won

Associated Press (10/24/03) - It's both a technological marvel and financial failure and today the bottom line wins as the Concorde makes its final flight.

Page 5: High Performance Computing, Clusters, and Productivity

5LNXI SOS8 Presentation - April 04

On the other hand…

Page 6: High Performance Computing, Clusters, and Productivity

6LNXI SOS8 Presentation - April 04

Southwest is not blinking lights

Bob

Page 7: High Performance Computing, Clusters, and Productivity

7LNXI SOS8 Presentation - April 04

Two Different Approaches…Concorde Average Roundtrip Fares

$10,700

Avg. Customers Served/Year 93,000

Average Miles/Year Concorde: 11.1 million

Southwest Average Roundtrip Fares

$90.03

Avg. Customers Served/Year 45,200,000

Average Miles/Year 72 million

…Two Very Different Measurements of Productivity?

Page 8: High Performance Computing, Clusters, and Productivity

8LNXI SOS8 Presentation - April 04

Common Metrics

Top500.org (linpack) Performance does not equate to

productivity

TCO Something may be inexpensive to own but

may not be productive

What matters Price/Productivity: Effective productivity

over time for the total price involved

Page 9: High Performance Computing, Clusters, and Productivity

9LNXI SOS8 Presentation - April 04

Productivity Productivity is the ratio between the amount of

goods or services produced and the resource or expense that goes into producing them

Productivity implies the ratio of Price/Product

Product: A successful run of the code the system was purchased for

Price: The total cost of the system over its lifetime (including (software and hardware) acquisition, system cost, support, development, infrastructure, and labor)

Page 10: High Performance Computing, Clusters, and Productivity

10LNXI SOS8 Presentation - April 04

Focus on Productivity

Recent Panel at SC2003: HPC Productivity The productivity of a HPC system is measured by factors that may

not be associated with hardware speed. These factors include program execution time, as well as software development time and other direct and indirect costs.

David Kuck, Manager of the Software and Solutions Group at Intel, "PR (getting on to the top 500 list) tends to make people ignore real productivity issues.“

DARPA/IPTO’s High Productivity Computing Systems Goal:

Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2007 – 2010)

Page 11: High Performance Computing, Clusters, and Productivity

11LNXI SOS8 Presentation - April 04

It’s all about the application! The system maps to the application No system is “one size fits all”

Analysis of application(s) requirements Memory Inter-process communications Bandwidth Latency Floating Point/Integer needs Existing/New codes Parallelism ……

Identification of bottlenecks to generate optimal system design/selection/price

System design trades need to optimized for best resulting price/productivity

Cluster or SMP? Why not both!

System Architecture and Optimization

Page 12: High Performance Computing, Clusters, and Productivity

12LNXI SOS8 Presentation - April 04

Lawrence Livermore - MCR11.2 TFLOPSLinux Networx E22,304 Intel Processors

Los Alamos - Pink10 TFLOPSLinux Networx E22,048 Intel Processors

Los Alamos - Lightning11.26 TFLOPSLinux Networx Evolocity2,816 Opteron Processors

System Architecture: Linux Clusters being used in large scale Production Systems

Page 13: High Performance Computing, Clusters, and Productivity

13LNXI SOS8 Presentation - April 04

System Architecture: Linux Clusters Being Used in Multi-Application, Scientific Computing Production Environments

As part of the Technology Insertion 2004 (TI-04) program, the Department of Defense High Performance Computing Modernization Program (HPCMP) selects Linux Networx for the Army Research Laboratory Major Shared Resource Center’s (MSRC) 2,132-processor Linux cluster.

When the system is fully deployed in mid-2004, this Evolocity II cluster will be the HPCMP’s largest deployment of an Intel processor-based Linux cluster, and the solution will adopt Intel 64-bit extension technology.

Page 14: High Performance Computing, Clusters, and Productivity

14LNXI SOS8 Presentation - April 04

A company focused on scientific computing will need to leverage commodity technology to be viable

Market isn’t large enough to afford non-recurring engineering dedicated to advance niche technology

Government funding is not what it used to be Need to take advantages of economies of scale Focus development and resources to fill critical gaps and/or provide

the “glue”

When using commodity technologies, Systems Engineering is absolutely critical

Understanding of all subsystems Disciplined engineering approach System integration and test Keeping up with technology

Leverage both commodity hardware and software

System Architecture: Leveraging the Commodity

Page 15: High Performance Computing, Clusters, and Productivity

15LNXI SOS8 Presentation - April 04

Supporting Software and Tools In-house and vendor expertise Stability and supportability Integration and compatibility of tools

Versions and version compatibility Profilers and tools that provide insight enabling re-structuring of code

for optimal performance System management and administration toolsProgrammability

Programming a cluster is probably now as easy, if not easier than programming an SMP or vector machine

Programming tools and expertise yield large variations in Price/Productivity

Compilers Variations in compiling, both in compiler selection but also in configuration and

optimization, yield 5%-30% differences in performance Debugging

Having correct tools and proper insight minimizes time to production

Supporting Software and Tools

Page 16: High Performance Computing, Clusters, and Productivity

16LNXI SOS8 Presentation - April 04

Minimizing impact on ongoing operations

Seamless fit and integration with existing infrastructure

Facility Network Storage

Getting the system up, running, and producing

Meeting all acceptance criteria Example:

Vendor “deploys” system but takes 4 months to get working and producing output (4 months out of a 3 year life cycle = 11% reduction in production)

Installation and Acceptance

Page 17: High Performance Computing, Clusters, and Productivity

17LNXI SOS8 Presentation - April 04

Hardware and Software System Maintenance

Hardware Architecture should minimize downtime Rapid turnaround from vendor Upgrades Scalability

Software Operating System, Middleware, Drivers,

Application level Open Source

Linux offers no single “throat to choke” Upgrades

Double Edged Sword Performance Risks Downtime Revision synchronization

Most people can get the system to work the first time…

Page 18: High Performance Computing, Clusters, and Productivity

18LNXI SOS8 Presentation - April 04

Porting/Optimizing Applications In-house codes Third Party Commercial codes Performance Modeling

Examining algorithm structure to enable re-architecture of code Can yield significant performance deltas – Observed up to %1000

Optimization Re-structuring Compiling

Hardware/Operations/Applications Ratio: $1 / $0.75 / $1.75

Most people spend $$$ on system that could have been better spent on optimizing their software.

Page 19: High Performance Computing, Clusters, and Productivity

19LNXI SOS8 Presentation - April 04

Education Who

Administrators It is easier to teach Linux to a system administrator

than to teach system administration to a “Linux person”

Application Engineers End-Users

What Operating System Compilers Tools

When As soon as the decision is made to consider a

Linux cluster Why

Directly impacts productivity.

Page 20: High Performance Computing, Clusters, and Productivity

20LNXI SOS8 Presentation - April 04

Criticality of Vendor Involvement On Productivity

More important for HPC than many other market areas

Always pushing leading edge Mainly unique systems and applications

But it’s is essential for clusters It’s not just your vendor, it’s your vendor’s

vendors because clusters are assembly of many components

Knowledge, experience and relationship with all components critical

Hardware Interconnects, Storage, Processors

Software Operating system, Compilers, Tools, Management, Filesystems

Page 21: High Performance Computing, Clusters, and Productivity

21LNXI SOS8 Presentation - April 04

Criticality of Vendor Involvement On Productivity

Vendor needs to be engaged over the lifetime of the system

Pre-Installation Working with customer and component suppliers to provide

optimal architecture for target application Facility impact and design Project planning and Integration

Installation Minimize impact on ongoing operations Time to production

Post-Installation Training and Education Availability of experience and skills Ability to pull together component vendors to resolve issues and

provide service Vendor participation to ensure productivity (avoid dump and run)

Page 22: High Performance Computing, Clusters, and Productivity

22LNXI SOS8 Presentation - April 04

Linux Networx ValueLinux Networx provides cluster computing systems that deliver maximum sustained performance and high return on investment. We achieve high customer satisfaction by delivering Five Points of Proven Value:

Rigorous System Engineering, Q/A, and validation process with every cluster system

Full pre-ship system build up and testing, followed by rapid on-site installation

Delivering complete systems with optimized applications, the latest cluster technologies and open source tools

Total Cluster Management from one interface Cluster Services, support and Linux cluster training

Page 23: High Performance Computing, Clusters, and Productivity

23LNXI SOS8 Presentation - April 04

Conclusion Linux clusters’ rapid growth in the high

performance computing and commercial communities is due to high productivity at a low price point

System Productivity must be considered before, during and after installation - designed into the architecture and support models

High performance computing production class clusters require a vendor with strong systems engineering, firm component vendor relationships, and committed involvement throughout the life of the system