1 11 linux high availability cluster selection tim burke [email protected]

34
1 1 1 inux High Availability Cluster Selecti Tim Burke [email protected]

Upload: kevin-sullivan

Post on 28-Dec-2015

224 views

Category:

Documents


2 download

TRANSCRIPT

1 11

Linux High Availability Cluster Selection

Tim Burke

[email protected]

1 11

Which cluster product is right for me ?

•There is no one size fits all winner

• Rapidly evolving marketplace

• The good news: There is a lot to choose from

•The bad news: There is a lot to choose from•Strategy - be an informed consumer

1 11

Selection Process / Presentation Outline

• Identify target applications - usage model• Identify required cluster feature set• Open source vs proprietary, product vs project• Cost factors• Vendor evaluation• OEM & ISV endorsements

1 11

Identify Target Applications

•Clustering Categories• High Availibility Clusters

• Database• Fileservers• Off the shelf applications

• Load Balancing Clusters• Dispatching web traffic

• High Performance Computing• Large computational problems

1 11

High Performance Computing

HPC, HPTC cluster attributes

1. Large # of systems working together to solve a common problem -scalability

2. Performance, not reliability is of utmost importance

3. Requires custom parallelized applications

4. Tends to be bleeding edge, early adopters

5. Example deployments: genetics, pharmacutical, weather, seismic analysis, modeling

1 11

Load Balancing Clusters• Front end dispatching node (or 2 for

redundancy)• Pool of inexpensive back end servers• Redirect transactions so no 1 system is

overloaded• Balancing algorithms: round robin,

weighted, load based• Typically used for web server traffic

(Apache front end)• Useful for static content• Not applicable for dynamic content

1 11

High Availability Clusters

• The need for high availability (HA)• Overview of high availability features

1 11

Reliability, Availability, Serviceability (RAS)

Users & businesses have high expectations

1. Reliability - high degree of protection for corporate data. Information is a crucial business asset.

2. Availability - near continuous data access

3. Serviceability - procedures to correct problems with minimal business impact

1 11

Sources of DowntimeThe Standish Group - 2001

Application bug or error

Main-system hardware failure

Database error

Main-server system bug

Network

Operator error

Other server's hardware failure

Other server's sys -tem bug

Environmental condi -tions

Planned outage

Other

1 11

Downtime Costs -The Standish Group

Electronic resource planning (ERP)

Supply chain man-agement

E-com-

Internet banking

Customer service center

Messaging0

100020003000400050006000700080009000

10000110001200013000

Cost per minute of downtime (dollars)

Column 2

1 11

No Single Point of Failure (NSPF)

Hardware Redundancy - increased overall reliability and availability

1. Multiple paths between systems

2. Storage - mirrored, RAID5

3. Multiple power sources

4. Multiple external networks

1 11

High Availability Clusters

• Redundancy for fault tolerance

• Failover - if 1 node shuts down or fails, another node takes over application load

• Facilitates planned maintenance

1 11

Failover

• Involves selecting a target node & moving resources - failover policies

• Example resource types

1. Physical disk ownership

2. Filesystems

3. Applications

4. Databases

5. IP addresses

1 11

Failover Configurations

•Active / Passive• 1 node runs application(s)• Other node on standby for takeover• Idle node can takeover with no performance degradation

•Active / Active• All nodes actively running application(s)• Workload moves to survivor on failure• Effectively utilizes capacity (TCO)

1 11

Data Integrity Provisions

•Crucial for safe failover of data centric services (filesystem / database)•In failure scenarios (eg hung node), ensure failed node can not access storage - I/O Barriers, I/O Fencing•Lack of I/O Fencing can result in

• Loss of data (backups ?)• System crashes

•Common mechanisms

• Power switches• SCSI reservations• Watchdog timers

1 11

Application Monitoring

•All HA clusters monitor node state•Most monitor key cluster resources - network, disk•Many monitor application health

• Process existence• Application check scripts

• HTTP get on web server• Record retrieval on database• Filesystem directory listing

1 11

Failover Times•Don't get too hung up on this•Remember that data integrity is paramount•Quoted failover times only include cluster overhead, don't include application recovery

• Application startup time• Filesystem consistency checks• Database recovery - transaction replay

•Example• Product literature cites 5 second failover time• Can be several minutes for database recovery (size & activity

dependent)

1 11

Open Source vs ProprietaryProject vs Product

• Open source facilitates self-support & customization

• Support is a key determinant• Products are generally well tested• Some products are also open source• If you care enough about high availability &

solution stacks, you're likely to go the product route

1 11

Heterogeneous HA Products

• Proprietary offerings that run on Linux, W2K, UNIX

• Unifies user training• May compromise flexibility, adaptability or data

integrity (ouch!)• Some are Linux products with GUIs that run on

other platforms• Virtually none allow heterogeneous platforms

within the same cluster

1 11

Cost Factors

•Beware of hidden charges• Product base fee• Application specific charges (Oracle, DB2, NFS, etc)• Support

•Some only come with bundled service offerings•Hardware requirements•Proprietary UNIX offerings typically cost several times more

1 11

Vendor Evaluation

• Company vision - do their cluster offerings complement or distract. Futures roadmap.

• Financial Stability• Ability to impact the marketplace• Responsiveness - ability to provide ongoing feature enhancements• Proprietary vs open source• Product integration - fit with distribution, kernel patches,

compatibility & support implications• New Linux technology vs large monolithic legacy ports• How long its been on the market

1 11

Open Source Projects•FailSafe - from SGI & SuSE

• Optional data integrity provisions (power switch)• Supports 16 nodes• Good set of application kits

•Red Hat Cluster Manager• Also offered as a product• Described later in presentation

1 11

HA Cluster Product Comparisons

•The ground rules• Trying to remain objective• Highlight product strengths• Listed in alphabetical order• Based on web site content as of 10/2002

1 11

HP - MC/Serviceguard

• Proprietary - Ported from HP/UX• Only supported on HP hardware • Dynamic online addition/removal of members• Worldwide support services• Quorum voting membership• Up to 8 nodes using FibreChannel storage, 2

nodes using SCSI• Compaq Alpha line targeted at HPC clusters

1 11

Legato - Availability Manager

•Proprietary•Heterogeneous (Linux, W2K, Solaris, HP-UX)•Strong data centric services

• Well integrated with SAN environments• Replication• Storage management, volume management, backup

•Application monitoring•Extensive set of application specific modules

1 11

PolyServe - Application Manager

• Proprietary• Application monitoring• Up to 16 nodes• Multiple platforms - Linux, W2K, Solaris• Doesn't require shared storage• Dynamic member addition/removal• Centralized management

1 11

PolyServe - Matrix Server

• Tailored for Oracle 9i Real Application Clusters• Concurrent read + write access to data on shared

storage SAN• Cluster filesystem with lock manager +

distributed cache• Allows incremental growth by adding servers +

storage• Proprietary

1 11

Red Hat - Cluster Manager•Bundled with RHL Advanced Server 2.1•Both open source & product•Data integrity provisions

• Power switches (optional)• Watchdog timer software

•Application monitoring•Heterogeneous fileserving via NFS + Samba•Web monitoring GUI•Also integrated Piranha load balancing cluster

1 11

Steeleye - LifeKeeper

• Proprietary - UNIX port• Multi-platform - Linux, W2K• Wide set of application kits (separately

purchaced)• Established OEM relationships• Data integrity provisions - via SCSI reservations,

requiring kernel patches• Application monitoring

1 11

IBM

•Focusing on HPC• Rackmounted Intel servers• Custom solutions• (older) XCAT software for management, parallel

operations, and installation• (newer) Cluster Systems Mgt (CSM) for Linux

• Remote monitoring, resets, bios console• Parallel shell• Requires IBM hardware for imbedded service processor

•High Availability via partnering

1 11

Veritas Cluster Server

•Recent Linux port•16 nodes, wide range of supported apps•Also runs on Windows, AIX, UNIX, Solaris•Integrates with their storage offerings (volume management, backup, data replication)•Proprietary

1 11

Other Vendors

•Dell• Strategic partnering for HA software

•Penguin Computing• HPC offering via partnership with Scyld Beowulf

1 11

Consolidated Solutions

•Egenera• BladeFrame hardware, backplane eliminates cabling• Management software, HA, provisioning

•Linux NetworX• Turnkey solution, preintegrated hardware + management tools• Custom hardware, dense racks

1 11

Summary

• Know what category of cluster is right for you• Be knowledgeable of required cluster features• Weigh your cost criteria• Chose a vendor you can trust to safeguard your

corporate assets• Be wary of marketing collateral