high performance clusters lnxi sos8 presentation - april 04 high performance computing, clusters,...
TRANSCRIPT
High Performance Clusters
LNXI SOS8 Presentation - April 04
High Performance Computing, Clusters, and Productivity
2LNXI SOS8 Presentation - April 04
HPC Market Reality The market represented by high performance computing can
represent a sustainable business provided effective leveraging of commodity technologies
The success of Linux clusters and their rapid growth is not only due to their use in the scientific community, but their growing use by commercial entities for day to day production
History has shown that special purpose architectures targeted solely for the HPC community are generally not widely adopted by the commercial world due their high price/productivity
System Productivity must be considered before, during and after installation - designed into the architecture and support models
Vendor involvement, especially for Linux clusters, is critical throughout life of system for maintaining productivity
3LNXI SOS8 Presentation - April 04
The technology was compelling…
BBC (03/02/69) - The supersonic airliner, Concorde, has made a "faultless" maiden flight. The Anglo-French plane took off from Toulouse and was in the air for just 27 minutes before the pilot made the decision to land.
4LNXI SOS8 Presentation - April 04
…but eventually the bottom line won
Associated Press (10/24/03) - It's both a technological marvel and financial failure and today the bottom line wins as the Concorde makes its final flight.
5LNXI SOS8 Presentation - April 04
On the other hand…
6LNXI SOS8 Presentation - April 04
Southwest is not blinking lights
Bob
7LNXI SOS8 Presentation - April 04
Two Different Approaches…
Concorde Average Roundtrip Fares
$10,700
Avg. Customers Served/Year 93,000
Average Miles/Year Concorde: 11.1 million
Southwest Average Roundtrip Fares
$90.03
Avg. Customers Served/Year 45,200,000
Average Miles/Year 72 million
…Two Very Different Measurements of Productivity?
8LNXI SOS8 Presentation - April 04
Common Metrics
Top500.org (linpack) Performance does not equate to
productivity
TCO Something may be inexpensive to own but
may not be productive
What matters Price/Productivity: Effective productivity
over time for the total price involved
9LNXI SOS8 Presentation - April 04
Productivity
Productivity is the ratio between the amount of goods or services produced and the resource or expense that goes into producing them
Productivity implies the ratio of Price/Product
Product: A successful run of the code the system was purchased for
Price: The total cost of the system over its lifetime (including (software and hardware) acquisition, system cost, support, development, infrastructure, and labor)
10LNXI SOS8 Presentation - April 04
Focus on Productivity
Recent Panel at SC2003: HPC Productivity The productivity of a HPC system is measured by factors that may
not be associated with hardware speed. These factors include program execution time, as well as software development time and other direct and indirect costs.
David Kuck, Manager of the Software and Solutions Group at Intel, "PR (getting on to the top 500 list) tends to make people ignore real productivity issues.“
DARPA/IPTO’s High Productivity Computing Systems Goal:
Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2007 – 2010)
11LNXI SOS8 Presentation - April 04
It’s all about the application! The system maps to the application No system is “one size fits all”
Analysis of application(s) requirements Memory Inter-process communications Bandwidth Latency Floating Point/Integer needs Existing/New codes Parallelism ……
Identification of bottlenecks to generate optimal system design/selection/price
System design trades need to optimized for best resulting price/productivity
Cluster or SMP? Why not both!
System Architecture and Optimization
12LNXI SOS8 Presentation - April 04
Lawrence Livermore - MCR11.2 TFLOPSLinux Networx E22,304 Intel Processors
Los Alamos - Pink10 TFLOPSLinux Networx E22,048 Intel Processors
Los Alamos - Lightning11.26 TFLOPSLinux Networx Evolocity2,816 Opteron Processors
System Architecture: Linux Clusters being
used in large scale Production Systems
13LNXI SOS8 Presentation - April 04
System Architecture: Linux Clusters Being Used in Multi-Application, Scientific Computing Production Environments
As part of the Technology Insertion 2004 (TI-04) program, the Department of Defense High Performance Computing Modernization Program (HPCMP) selects Linux Networx for the Army Research Laboratory Major Shared Resource Center’s (MSRC) 2,132-processor Linux cluster.
When the system is fully deployed in mid-2004, this Evolocity II cluster will be the HPCMP’s largest deployment of an Intel processor-based Linux cluster, and the solution will adopt Intel 64-bit extension technology.
14LNXI SOS8 Presentation - April 04
A company focused on scientific computing will need to leverage commodity technology to be viable
Market isn’t large enough to afford non-recurring engineering dedicated to advance niche technology
Government funding is not what it used to be Need to take advantages of economies of scale Focus development and resources to fill critical gaps and/or provide
the “glue”
When using commodity technologies, Systems Engineering is absolutely critical
Understanding of all subsystems Disciplined engineering approach System integration and test Keeping up with technology
Leverage both commodity hardware and software
System Architecture: Leveraging the Commodity
15LNXI SOS8 Presentation - April 04
Supporting Software and Tools In-house and vendor expertise Stability and supportability Integration and compatibility of tools
Versions and version compatibility Profilers and tools that provide insight enabling re-structuring of code
for optimal performance System management and administration toolsProgrammability
Programming a cluster is probably now as easy, if not easier than programming an SMP or vector machine
Programming tools and expertise yield large variations in Price/Productivity
Compilers Variations in compiling, both in compiler selection but also in configuration and
optimization, yield 5%-30% differences in performance Debugging
Having correct tools and proper insight minimizes time to production
Supporting Software and Tools
16LNXI SOS8 Presentation - April 04
Minimizing impact on ongoing operations
Seamless fit and integration with existing infrastructure
Facility Network Storage
Getting the system up, running, and producing
Meeting all acceptance criteria Example:
Vendor “deploys” system but takes 4 months to get working and producing output (4 months out of a 3 year life cycle = 11% reduction in production)
Installation and Acceptance
17LNXI SOS8 Presentation - April 04
Hardware and Software System Maintenance
Hardware Architecture should minimize downtime Rapid turnaround from vendor Upgrades Scalability
Software Operating System, Middleware, Drivers,
Application level Open Source
Linux offers no single “throat to choke” Upgrades
Double Edged Sword Performance Risks Downtime Revision synchronization
Most people can get the system to work the first time…
18LNXI SOS8 Presentation - April 04
Porting/Optimizing Applications
In-house codes Third Party Commercial codes Performance Modeling
Examining algorithm structure to enable re-architecture of code Can yield significant performance deltas – Observed up to %1000
Optimization Re-structuring Compiling
Hardware/Operations/Applications Ratio: $1 / $0.75 / $1.75
Most people spend $$$ on system that could have been better spent on optimizing their software.
19LNXI SOS8 Presentation - April 04
Education Who
Administrators It is easier to teach Linux to a system administrator
than to teach system administration to a “Linux person”
Application Engineers End-Users
What Operating System Compilers Tools
When As soon as the decision is made to consider a
Linux cluster Why
Directly impacts productivity.
20LNXI SOS8 Presentation - April 04
Criticality of Vendor Involvement On Productivity
More important for HPC than many other market areas
Always pushing leading edge Mainly unique systems and applications
But it’s is essential for clusters It’s not just your vendor, it’s your vendor’s
vendors because clusters are assembly of many components
Knowledge, experience and relationship with all components critical
Hardware Interconnects, Storage, Processors
Software Operating system, Compilers, Tools, Management, Filesystems
21LNXI SOS8 Presentation - April 04
Criticality of Vendor Involvement On Productivity
Vendor needs to be engaged over the lifetime of the system
Pre-Installation Working with customer and component suppliers to provide
optimal architecture for target application Facility impact and design Project planning and Integration
Installation Minimize impact on ongoing operations Time to production
Post-Installation Training and Education Availability of experience and skills Ability to pull together component vendors to resolve issues and
provide service Vendor participation to ensure productivity (avoid dump and run)
22LNXI SOS8 Presentation - April 04
Linux Networx Value
Linux Networx provides cluster computing systems that deliver maximum sustained performance and high return on investment. We achieve high customer satisfaction by delivering Five Points of Proven Value:
Rigorous System Engineering, Q/A, and validation process with every cluster system
Full pre-ship system build up and testing, followed by rapid on-site installation
Delivering complete systems with optimized applications, the latest cluster technologies and open source tools
Total Cluster Management from one interface Cluster Services, support and Linux cluster training
23LNXI SOS8 Presentation - April 04
Conclusion
Linux clusters’ rapid growth in the high performance computing and commercial communities is due to high productivity at a low price point
System Productivity must be considered before, during and after installation - designed into the architecture and support models
High performance computing production class clusters require a vendor with strong systems engineering, firm component vendor relationships, and committed involvement throughout the life of the system