unclipped condor in windows via colinux unclipped condor in windows ® via colinux henry neeman,...

21
Unclipped Condor Unclipped Condor in Windows in Windows® via coLinux via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University of Oklahoma Sumanth J.V. University of Nebraska-Lincoln Condor Week, University of Wisconsin, Tuesday May 1 2007

Upload: roger-sherman

Post on 03-Jan-2016

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor Unclipped Condor in Windowsin Windows®

via coLinux via coLinuxHenry Neeman, Horst Severini,Chris Franklin, Josh Alexander

University of OklahomaSumanth J.V.

University of Nebraska-LincolnCondor Week, University of Wisconsin, Tuesday May 1 2007

Page 2: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 2

Condor: Linux vs Windows

Condor inside Linux: full featured Condor inside Windows®: “clipped”

No autocheckpointing No job automigration No remote system calls No Standard Universe

http://www.our-picks.com/archives/2006/10/page/2/

Page 3: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 3

Lots of PCs in IT Labs

At many institutions, there are lots of PC labs managed by a central IT organizations.

If the head of IT (e.g., CIO) is on board, then all of these PCs can be Condorized.

But, these labs tend to be Windows® labs, not Linux. So you can’t take the Windows® desktop experience away from the desktop users, just to get Condor.

So, how can we have Linux Condor AND Windows® desktop on the same PC at the same time?

Page 4: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 4

Solution Attempt #1: VMware

Attempted solution: VMware Linux as native host OS Condor inside Linux VMware inside Linux Windows® inside VMwareTested on ~200 PCs in IT PC labs (Union, library,

dorms, Physics Dept)In production for over a year

Page 5: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 5

VMware Disadvantages

Attempted solution: VMware Linux as native host OS Condor inside Linux VMware inside Linux Windows® inside VMwareDisadvantages VMware costs money! (Less so now than then.) Crashy VMware performance tuning (straight to disk) was unstable Sensitive to hardware heterogeneity Painful to manage CD/DVD burners and USB drives didn’t work in some PCs.

Page 6: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 6

A Better Solution: coLinux

Cooperative Linux (coLinux)

http://www.colinux.org/ FREE! Runs inside native Windows® No sensitivity to hardware type Better performance Easier to customize Smaller disk footprint and lower CPU usage in idle Minimal management required (~10 hours/month)

Page 7: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 7

Compatibility Issue

About 30 of the 200 lab PCs we installed coLinux on had problems with it, so those PCs now run a prerelease version of coLinux.

We have no idea why the production version of coLinux was a miserable failure on these 30 PCs, nor why the prerelease version succeeded.

Page 8: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 8

Preventing BSOD

The Data Execution Prevention feature inside Windows®, when running on some newer processors, can conflict with coLinux and cause system failure. The solution to this problem is to add the /NOEXECUTE switch to the Windows® boot.ini.

Page 9: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 9

Network Issue

Networking options Bridged: Each PC has to have a second IP address,

so the institution has to have plenty of spare IP addresses available. (Oklahoma solution)

NAT: The Condor pool requires a Generic Connection Broker (GCB) on a separate, dedicated PC (hardware $), and has some instability. Switched to OpenVPN.(Nebraska solution) Nebraska experimented with port forwarding in

Windows®, but abandoned it for OpenVPN because of security and usability.

Page 10: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 10

Traversing NATs and Firewalls What is GCB (Connection Broker)?

Socket level approach. A broker arranges connections between machines inside the firewall

and machines outside the firewall.

What is OpenVPN (Open Virtual Private Network)? A network within a network. Virtual network adapter. Virtual IP (static/dynamic). TCP within UDP. Client/Server architecture. All to All communication. All traffic is encrypted by default.

Page 11: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 11

OpenVPN When using GCB, each machine is represented by a unique port on the

broker. Central Manager sees all the machines as <GCB_IP:port 1>, <GCB_IP: port

2> etc. Only applications linked against GCB work. (Condor is already linked)

When using OpenVPN, each machine has a unique virtual IP address in the VPN.

Simplifies troubleshooting. Central Manager is also part of the OpenVPN and runs in server mode. ClientConnect.py

Determines Virtual IP of a new client based on its Real IP. E.g. node-25-55 has real IP129.93.25.55 gets virtual IP10.1.25.55

Pushes this configuration to the clients. Updates /etc/hosts.

OpenVPN lockups can be fixed with mssfix 1200 and fragment 1200 options?

Page 12: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 12

OpenVPN No modification of application required to use OpenVPN.

We have successfully mounted NFS shares (CMS stack). Inbound SSH access Since all-to-all communication is present, even MPICH works.

Remember all-to-all communication still has to go through the OpenVPN server.

Secure No firewall required in coLinux.

Page 13: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 13

Monitoring Issue

Condor inside Linux monitors keyboard and mouse usage to decide when to suspend a job.

In coLinux, this is tricky.

We had to set up a Visual Basic script on the Windows® side to send the keyboard and mouse information to coLinux.

UNL implements a similar idea in C++, and OU is now doing likewise.

UNL collects all the keyboard and mouse data on a server, while OU does it on each local machine. But the result is the same.

Page 14: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 14

Monitoring coLinux Labs How to determine whether all the machines in each lab are up an

running? condor_status only displays working machines. What about missing

machines? We need a list of expected but missing nodes per lab. We need a physical layout of the nodes in each lab. MYSQL database to store lab info.

Need to separately handle static and dynamic IP labs. Static IPs are easy to handle.

Store IP and relative co-ordinates of the node. Dynamic IP store a lambda function expressing how to determine if a

machine belongs to a lab. E.g. lambda x: '18-' in x - matches node-18-2, node-18-3 …

Store expected number of machines per lab, known hardware/software issues as notes per machine.

Compare output of condor_status and MYSQL database. Demo: http://mindspawn.unl.edu/condor/stats Web front-end developed for mod_python.

Page 15: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 15

How to Build a Multistate Grid

To make a prairie

It takes a clover and one bee.

One clover, and a bee, and reverie.

The reverie alone will do,

If bees are few.

– Emily Dickinson, 1858 http://magickcanoe.com/blog/2006/08/24/on-our-walk/

Page 16: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 16

OU’s NSF CI-TEAM Project

OU recently received a grant from the National Science Foundation’s Cyberinfrastructure Training, Education, Advancement, and Mentoring for Our 21st Century Workforce (CI-TEAM) program.

Objectives: Teach general HPC concepts to a broad audience Provide Condor resources to the national

community Teach users to use Condor and sysadmins to deploy

and administer it Teach bioinformatics students to use BLAST over

Condor

Page 17: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 17

OU NSF CI-TEAM Project

teach students and faculty to use FREE Condor middleware, stealing computing time on idle PCs;

teach system administrators to deploy and maintain Condor on PCs;

teach bioinformatics students to use BLAST on Condor;

provide Condor Cyberinfrastructure to the national community (FREE).

Condor pool of 750 desktop PCs (already part of the Open Science Grid);

Supercomputing in Plain English workshops via videoconferencing;

Cyberinfrastructure rounds (consulting) via videoconferencing;

drop-in CDs for installing full-featured Condor on a Windows PC (Cyberinfrastructure for FREE);

sysadmin consulting for installing and maintaining Condor on desktop PCs.

OU’s team includes: High School, Minority Serving, 2-year, 4-year, masters-granting; 11 of the 15 institutions are in 4 EPSCoR states (AR, KS, NE, OK).

Cyberinfrastructure Education for Bioinformatics and BeyondObjectives: OU will provide:

Page 18: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 18

OU NSF CI-TEAM ProjectParticipants at OU(29 faculty/staff in 16 depts) Information Technology

OSCER: Neeman (PI) College of Arts & Sciences

Botany & Microbiology: Conway, Wren Chemistry & Biochemistry: Roe (Co-PI),

Wheeler Mathematics: White Physics & Astronomy: Kao, Severini (Co-PI),

Skubic, Strauss Zoology: Ray

College of Earth & Energy Sarkeys Energy Center: Chesnokov

College of Engineering Aerospace & Mechanical Engr: Striz Chemical, Biological & Materials Engr:

Papavassiliou Civil Engr & Environmental Science: Vieux Computer Science: Dhall, Fagg, Hougen,

Lakshmivarahan, McGovern, Radhakrishnan Electrical & Computer Engr: Cruz, Todd,

Yeary, Yu Industrial Engr: Trafalis

OU Health Sciences Center, Oklahoma City Biochemistry & Molecular Biology: Zlotnick Radiological Sciences: Wu (Co-PI) Surgery: Gusev

Participants at other institutions(19 faculty/staff at 14 institutions) California State U Pomona (masters-granting,

minority serving): Lee Contra Costa College (2-year, minority

serving): Murphy Earlham College (4-year): Peck Emporia State U (masters-granting, EPSCoR):

Pheatt, Ballester Kansas State U (EPSCoR): Andresen, Monaco Langston U (masters-granting, minority

serving, EPSCoR): Snow Oklahoma Baptist U (4-year, EPSCoR): Chen,

Jett, Jordan Oklahoma School of Science & Mathematics

(high school, EPSCoR): Samadzadeh St. Gregory’s U (4-year, EPSCoR): Meyer U Arkansas (EPSCoR): Apon U Central Oklahoma (masters-granting,

EPSCoR): Lemley, Wilson U Kansas (EPSCoR): Bishop U Nebraska-Lincoln (EPSCoR): Swanson U Northern Iowa (masters-granting): Gray

E

E E

E

Page 19: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 19

How to Create a Multistate Grid? Grids aren’t primarily about technology! You need to recruit people, by offering them

more than you ask them to provide.1. Go to their institution.2. Give a really fun and interesting talk

about your stuff.3. Tell them that they can use your stuff

for free.4. Make them commit to using your stuff.5. Help them use your stuff.6. If possible, get them to visit you and see your

stuff.

Page 20: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Unclipped Condor in Windows via coLinuxCondor Week, Tuesday May 1 2007 20

OU NSF CI-TEAM ProjectParticipants at OU(29 faculty/staff in 16 depts) Information Technology

OSCER: Neeman (PI) College of Arts & Sciences

Botany & Microbiology: Conway, Wren Chemistry & Biochemistry: Roe (Co-PI),

Wheeler Mathematics: White Physics & Astronomy: Kao, Severini (Co-PI),

Skubic, Strauss Zoology: Ray

College of Earth & Energy Sarkeys Energy Center: Chesnokov

College of Engineering Aerospace & Mechanical Engr: Striz Chemical, Biological & Materials Engr:

Papavassiliou Civil Engr & Environmental Science: Vieux Computer Science: Dhall, Fagg, Hougen,

Lakshmivarahan, McGovern, Radhakrishnan Electrical & Computer Engr: Cruz, Todd,

Yeary, Yu Industrial Engr: Trafalis

OU Health Sciences Center, Oklahoma City Biochemistry & Molecular Biology: Zlotnick Radiological Sciences: Wu (Co-PI) Surgery: Gusev

Participants at other institutions(19 faculty/staff at 14 institutions) California State U Pomona (masters-granting,

minority serving): Lee Contra Costa College (2-year, minority

serving): Murphy Earlham College (4-year): Peck Emporia State U (masters-granting, EPSCoR):

Pheatt, Ballester Kansas State U (EPSCoR): Andresen, Monaco Langston U (masters-granting, minority

serving, EPSCoR): Snow Oklahoma Baptist U (4-year, EPSCoR): Chen,

Jett, Jordan Oklahoma School of Science & Mathematics

(high school, EPSCoR): Samadzadeh St. Gregory’s U (4-year, EPSCoR): Meyer U Arkansas (EPSCoR): Apon U Central Oklahoma (masters-granting,

EPSCoR): Lemley, Wilson U Kansas (EPSCoR): Bishop U Nebraska-Lincoln (EPSCoR): Swanson U Northern Iowa (masters-granting): Gray

E

E E

E

Page 21: Unclipped Condor in Windows via coLinux Unclipped Condor in Windows ® via coLinux Henry Neeman, Horst Severini, Chris Franklin, Josh Alexander University

Thanks for your attention!

Questions?