-- oss for high-availability april, 2005 linux in high-availability environments alan robertson ibm...

33
-- OSS for High- Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center [email protected]

Upload: scot-berry

Post on 30-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Linux in High-Availability Environments

Alan Robertson

IBM Linux Technology Center

[email protected]

Page 2: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

OSS in HA Environments

Why OSS for High Availability Environments?

What is High-Availability (HA) Clustering?

What can HA do for me?

DRBD Data Replication

The Linux Virtual Server Load Balancer

The Linux-HA project?

Linux-HA applications and customers

Thoughts about cluster security

Page 3: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Why OSS In High-Availability Environments?

Openness

Broad Range Of Environments

Breadth of Support Options

Lack of Vendor Lock-In

Page 4: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Openness

Extensive Peer Review System

Source code freely availableSource code reviewed by outside partiesChanges discussed openly – often in great detail

Ability to obtain uncensored product information

Mailing lists archives contain contain uncensored comments from

Users with deep expertiseUsers with little expertiseUsers who are very happyUsers with problems

Page 5: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Broad Range of Environments

OSS typically runs on many platforms, often on different OSes too

Users often find very creative uses for the software

Freedom to try something at low cost decreases perceived risks and encourages this behavior

Creative uses find their way into mailing list (archives) and sometimes into the OSS product

Users help with testing – providing more breadth in test environment than might otherwise occur

Page 6: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Support for OSS Systems

Mailing lists consist of hundreds to thousands of users who are very knowledgeable and helpful – usually regarded as very responsive – typically located in most time zones across the world

Can choose support vendor freely:

Hardware, OS or OSS supplier

Independent consulting/support organizations

In-house expertise (most motivated)

OSS mailing lists

Any combination of the above

Page 7: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

No Vendor Lock-In

Does not rely on a vendor's future plans being compatible with yours (risk mitigation)

Obsolescence more readily manageable

Does not rely on a single vendor in another company or country

Contributing to the product (or paying someone else to) provides you a voice in future direction

Compatibility with other systems typically better

Page 8: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

What Is HA Clustering?

A group of computers which cooperate and trust each other to provide a service even when cluster components fail

When one machine goes down, others take over its work

This involves IP address takeover, service takeover, etc.

New work comes to the “takeover” machine

Not primarily designed for high-performance

Page 9: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

What Can HA Clustering Do For You?

It cannot achieve 100% availability – nothing can.HA Clustering designed to recover from single faults

It can make your outages very short

From about a second to a few minutes

It is like a Magician's (Illusionist's) trick:

When it goes well, the hand is faster than the eye

When it goes not-so-well, it can be reasonably visible

A good HA clustering system adds a “9” or two to your availability

99->99.9, 99.9->99.99, 99.99->99.999, etc.

Complexity is the enemy of reliability!

Page 10: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

The Desire for HA systems

Who wants low-Who wants low-availability systems?availability systems?

Why are so few systems High-Availability?

Page 11: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Why isn't everything HA?

Cost

Complexity

Page 12: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Page 13: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Single Points of Failure (SPOFs)

A single point of failure is a component whose failure will cause near-immediate failure of an entire system or service

Good HA design eliminates of single points of failure

Page 14: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

How Does HA work?

Manage redundancy to improve service availability

Like a cluster-wide-super-init on steroids

Even complex services are now “respawn”

on node (computer) death

on “impairment” of nodes

on loss of connectivity

for services that aren't working (not necessarily stopped)

managing very complex dependency relationships

Page 15: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

DRBD – RAID over the LAN

Block-device (filesystem) level replication

Clever synchronization methods make resyncs faster, decrease latency, preserve integrity

Useful for both HA and Disaster Recovery

NO single point of failure

Extremely cost-effective$200 (max) instead of $20,000 (min) ($USD)

Probably not suitable for some high-end write-intensive applications

Supportable by IBM Support Line

Page 16: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Page 17: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

LVS – The Linux Virtual Server Project

LVS is the standard Linux Load Balancer

Called "ipvs" in the standard Linux kernel

Stable, fast, flexible

Especially suitable for large "server farms"

Page 18: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

LVS IN Action

Page 19: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

“Plays Well With Others”

Each of these independent services can work together to scale to large systems

All single points of failure can be eliminated

High-Availability, Load Balancing work together nicely

Page 20: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Linux Virtual Server, Linux-HA and DRBD

Page 21: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

The Linux-HA Project

Linux-HA is the oldest high-availability project for Linux, with the largest associated community

The core piece of Linux-HA is called “heartbeat”(though it does much more than heartbeat)

Linux-HA has been in production since 1999, and is currently in use on about ten thousand sites

Linux-HA also runs on FreeBSD and Solaris, and is being ported to OpenBSD and others

Linux-HA is shipped with every major Linux distribution except one.

Page 22: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Linux-HA Release 1 Applications

Database Servers

Load Balancers

Web Servers

Custom Applications

Firewalls, routers, DNS, DHCP

Retail Point of Sale Solutions

Authentication

File Servers

Proxy Servers

Medical ImagingAlmost any type server application you can think of – except SAP

Page 23: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Selected Linux-HA customersLos Alamos (US) National LabsLos Alamos (US) National Labs – linear accelerator badge reader

EmageonEmageon – medical imaging for hospitals and clinics

ISO New EnglandISO New England manages power grid using ≈ 20 Linux-HA clusters

Various Firewall, DNS, DHCP productsVarious Firewall, DNS, DHCP products use Linux-HA basically embedded

Karstadt, Circuit City, Autozone Karstadt, Circuit City, Autozone use Linux-HA in each of several hundred stores

MAN Nutzfahrzeuge AGMAN Nutzfahrzeuge AG – truck manufacturing division of Man AG

AutostradaAutostrada – 230 clusters across Italy

BBCBBC – Internet Infrastructure

Citysavings BankCitysavings Bank in Munich (infrastructure)

Bavarian Radio StationBavarian Radio Station (Munich) coverage of 2002 Olympics in Salt Lake City

The Weather ChannelThe Weather Channel (weather.com)

SonySony (manufacturing)

IncredimailIncredimail bases their mail service on Linux-HA on IBM hardware

University of Toledo (US)University of Toledo (US) – 20k student Computer Aided Instruction system

Page 24: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Linux-HA Release 1 capabilities

Supports 2-node clusters

Can use serial, UDP bcast, mcast, ucast comm.

Fails over on node failure

Fails over on loss of IP connectivity

Capability for failing over on loss of SAN connectivity

Limited command line administrative tools to fail over, query current status, etc.

Active/Active or Active/Passive

Simple resource group dependency model

Requires external tool for resource monitoring

SNMP monitoring

Page 25: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Linux-HA Release 2 capabilities

Built-in resource monitoring

Support for the OCF resource standard

Much Larger clusters supported (>= 8 nodes)

Sophisticated dependency model with rich constraint support (resources, groups, incarnations, master/slave) (needed for SAP)

XML-based resource configuration

Configuration and monitoring GUI

Support for GFS cluster filesystem

Multi-state (master/slave) resource support

Initially - no IP, SAN monitoring

Page 26: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Resource Objects in Release 2

Release 2 supports “resource objects” which can be any of the following:

Primitive ResourcesOCF, heartbeat-style, or LSB resource agent scripts

Resource Incarnations – need “n” resource objects - somewhere

Resource groups – a group of resources with implied co-location and linear ordering constraints

Multi-state resources (master/slave)Designed to model master/slave (replication) resources (DRBD, et al)

Page 27: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Basic Dependencies in Release 2

Ordering Dependencies

start before (implies stop after)

start after (implies stop before)

Mandatory Co-location Dependencies

must be co-located with

cannot be co-located with

Page 28: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Resource Incarnations

Resource Incarnations allow one to have a resource which runs multiple (“n”) times on the cluster

This is useful for managing

load balancing clusters where you want “n” of them to be slave servers

Cluster filesystems

Cluster Alias IP addresses

Page 29: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Security Considerations

Cluster: A computer whose backplane is the Internet

If this isn't scary, you don't understand...

You may think you have a secure cluster network

You're probably mistaken now

You will be in the future

Page 30: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Secure Networks are Difficult Because...

Security is not often well-understood by adminsSecurity is well-understood by “black hats”Network security is easy to breach accidentally

Users bypass it

Hardware installers don't fully understand it

Most security breaches come from “trusted” staffStaff turnover is often a big issue

Virus/Worm/P2P technologies will create new holes especially for Windows machines

Page 31: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Security Advice

Good HA software should be designed to assume insecure networks

Not all HA software assumes insecure networks

Good HA installation architects use dedicated (secure?) networks for intra-cluster HA communication

Crossover cables are reasonably secure – all else is suspect

Page 32: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

References

http://linux-ha.org/

http://linux-ha.org/download/

http://wiki.linux-ha.org/NewHeartbeatDesign

New Web site content (a work in progress)

http://wwnew.linux-ha.org/(prettier)

http://wiki.linux-ha.org/(editable)

http://wwnew.linux-ha.org/SuccessStories

www.linux-mag.com/2003-11/availability_01.html

http://www.linuxvirtualserver.org/

http://drbd.org/

Page 33: -- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh

-- OSS for High-Availability April, 2005

Legal Statements

IBM is a trademark of International Business Machines Corporation.

Linux is a registered trademark of Linus Torvalds.

Other company, product, and service names may be trademarks or service marks of others.

This work represents the views of the author and does not necessarily reflect the views of the IBM Corporation.