17/09/2004 john kewley grid technology group introduction to condor

13
17/09/2004 John Kewley Grid Technology Group Introduction to Condor

Upload: sharyl-jasmin-parker

Post on 25-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

17/09/2004

John Kewley

Grid Technology Group

Introduction to Condor

Page 2: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

Outline

o What is Condor?

o What can it be used for?

o Status of DL Condor Pool(s)

Page 3: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

What is Condor?

o A job submission framework which utilises spare computing power within a heterogeneous computer network (Condor pool)

o It supports High-Throughput Computing (HTC), maximising the amount of processing capacity that is utilised over long periods of time.

o Developed over many years at University of Wisconsin – Madison

Page 4: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

Basic Features

o A Condor pool is a set of resources (clusters, servers and networked workstations), managed by a Central Manager

o The Central Manager matches requests for resources with those resources available within the pool

o User does not need account on machine where job runs, but may submit jobs to the pool from his/her workstation.

o Highly extensible resource description and job requirements language which is used to classify/advertise the resources in the pool.

o Available on multiple platforms.

Page 5: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

Supported platforms

Architecture Operating System Hewlett Packard PA-RISC (PA7000 + PA8000) HPUX 10.20

Sun SPARC Sun4m,Sun4c, Sun UltraSPARC Solaris 2.6, 2.7, 2.8, 2.9

Silicon Graphics MIPS (R5000, R8000, R10000)

IRIX 6.5 (clipped)

Intel x86 Red Hat Linux 7.1, 7.2, 7.3, 8.0

Red Hat Linux 9

Windows 2000 Prof + Server, 2003 Server (clipped)

Windows XP Professional (clipped)

ALPHA Digital Unix 4.0

Red Hat Linux 7.1, 7.2, 7.3 (clipped)

Tru64 5.1 (clipped)

PowerPC Macintosh OS X (clipped)

AIX 5.2L (clipped)

Itanium Red Hat Linux 7.1, 7.2, 7.3 (clipped)

SuSE Linux Enterprise 8.1 (clipped)

Page 6: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

Execute MachineSubmit Machine

Job Startup

Submit

Schedd

Starter Job

Shadow CondorSyscall Lib

Startd

Central Manager

CollectorNegotiator

Slide courtesy of University of Wisconsin-Madison

Page 7: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

Additional Features

o Checkpointing and migration of jobso Shared filestore is not required, but can be utilisedo Interworking with Globus,o Security: GSI, Kerberoso Use of MPI and PVMo Workflow using DAGMan (Directed Acyclic Graph

Manager).o Windows + Unix + Linux + …

Page 8: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

Execution Environments

standard

o Must be relinked under condoro System calls occur on the

submitting resourceo Jobs may checkpoint and hence

be stopped and later restarted from its last checkpoint, and may migrate to another resource

o Not available on some platforms (e.g. Windows)

o Some restrictions on what can be run.

vanilla

o Any executable or script, no need for relinking or access to object files

o System calls happen on the executing resource

o No checkpointing, not so good for long-running jobs. If a job is stopped it will be rescheduled (i.e. compute time is lost).

o Works on all supported platforms (incl Windows)

o Some opening of file permissions may be required

Page 9: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

Possible Uses

o Use vanilla universe for jobs which comprise many small (comparatively), independent tasks.

o Use standard universe for jobs which will run for long periods.

o Utilise the “odds and ends” of the pool for compilation and build tests.

Page 10: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

Condor Pools at DLo Internal Pool

5 Windows• 3x Windows XP Professional• 2x Windows 2000 Professional

18 Linux• 6x SuSE Linux 9.0• 2x SuSE Linux 8.0• 5x White Box Enterprise Linux 3.0• 3x Red Hat Linux 9• 1x Mandrake Linux 10.0• 1x Gentoo Linux 1.4

o External Pool 6 Linux

• 2x Red Hat Linux 7.3• 4x White Box Enterprise Linux 3.0

Page 11: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

Build and Test

o Our External Pool is being used by the OMII (Open Middleware Infrastructure Institute) for building and testing their latest Grid middleware.

o We intend extending the use of this pool for use as a build and test pool for other Institutions on the UK Grid.

o Our internal users are also keen to utilise this build technology to build release packages of their software for many different platforms.

Page 12: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

User Status

We are currently at an early stage with our user community and are helping them setup their code so that it can be run conveniently under Condor.

These users are from the following computational science communities:

o CCP1 - The electronic structure of moleculeso CCP4 - Protein crystallography

Page 13: 17/09/2004 John Kewley Grid Technology Group Introduction to Condor

John Kewley

Grid Technology

17th September 2004

Summary

o Condor can utilise otherwise unused resources (e.g. Windows workstations overnight)

o Use vanilla universe for jobs which comprise many small (comparatively), independent tasks

o Use standard universe for jobs which will run for long periods (although not on Windows)

o Can be used for compilation and build tests