use of condor on the open science grid chris green, osg user group / fnal condor week, april 30 2008
Post on 02-Jan-2016
237 Views
Preview:
TRANSCRIPT
Use of Condor on the Open Science Grid
Chris Green, OSG User Group / FNAL
Condor Week, April 30 2008
April 30, 2008Condor Week
Chris GreenOSG User Group / FNAL
2
What is OSG?Links• OSG home page.• VORS resource map
and information.• VDT (Virtual Data
Toolkit) home page.• Current use of OSG.
• "Virtual Organizations" (VOs): trust point for authorization; role-based personalities.
• Works with multiple underlying batch systems (Condor, PBS family, LSF, SGE).
• Collection of mostly US-based scientific / academic sites sharing computing and storage resources via common software stack.
• Job submission and management based around Globus / CondorG.
April 30, 2008Condor Week
Chris GreenOSG User Group / FNAL
3
OSG facts and figures
• 83 registered computing resources.• 30 registered VOs.• Usage breakdown for 2008/04/19 – 2008/04/25:
Wall Time (d)
65014
30582
165
Condor PBS/LSF SGE
Computing Resource Batch Managers
534
22
2
Condor SGE PBS LSF
April 30, 2008Condor Week
Chris GreenOSG User Group / FNAL
4
Survey of Condor useon OSG
• Out of the box:CondorG for inter-site job transfer via
Globus/GRAM: GT2 submissions via CondorG still (by far) the most common method of grid job submission on OSG.
Task scheduling for site health monitoring.One of several batch systems supported on
OSG."ManagedFork" job management.
April 30, 2008Condor Week
Chris GreenOSG User Group / FNAL
5
Survey of Condor useon OSG
• External projectsGlidein / WMS: "pilot" job submission and management.FermiGrid: job forwarding, "campus grid" management.OSGMM / ReSS: job forwarding and attribute-based
matchmaking across multiple OSG sites."condorview:" enhanced job monitoring and control – not the web-based statistics client of the same name.
Complex workflows (eg LIGO: Pegasus/DAGMAN).Gratia: accounting system leverages features of condor
where available: condor_history, PER_JOB_HISTORY_DIR, DN.
April 30, 2008Condor Week
Chris GreenOSG User Group / FNAL
6
More detail: Glidein/WMS
• Workload Management System (Igor Sfiligoi, FNAL) uses Condor Glideins -- startd submitted as a grid job ("pilot") makes remote batch nodes look like local ones.
• Two main components:One or more glidein factories: manage available grid
sites and submit pilot jobs.One or more VO frontends: receive payload
submissions from users for distribution to sites.
• Pilots receive user payloads as distributed by VO frontends.
April 30, 2008Condor Week
Chris GreenOSG User Group / FNAL
8
More detail: Glidein/WMS
• Uses GCB for firewall / NAT management .• Intra-VO priority management.• Works with glExec: application running on
worker nodes which handles authorization and UID mapping for payloads – per user accountability to the site.
• Unaffected by grid site batch manager choice.• V1.0 released Dec.'07; v1.1 Jan'08.• In use by: CDF; Minos (FNAL); being
commissioned for CMS.
April 30, 2008Condor Week
Chris GreenOSG User Group / FNAL
9
More detail: "condorview"
• Michael Thomas, Caltech.
• Graphical tool for browsing and managing a condor queue.
• Hooks to vacate and kill jobs.
• Hooks to ssh into job directory on worker node and print out process tree.
• Uses condor_q, condor_config_val, and condor_fetchlog.
April 30, 2008Condor Week
Chris GreenOSG User Group / FNAL
12
Concluding statements
• Condor essential to the OSG.• Condor use underpins connectivity of sites within
the OSG.• Close ties: Miron is OSG PI; VDT team at
Wisconsin; new Condor features often a result of OSG needs.
• Widely used on OSG; many novel uses of and applications building on Condor features.
• More details in later talks!
top related