history of handling of the year 2000 problem at cern cern computer newsletter 228 (july-september)...

11
History of Handling of the “Year 2000” Problem at CERN CERN Computer Newsletter 228 (July-September) Problem stated by Mike Metcalf, CERN Y2K coordinator in 1997 Management Board of Sept. 1, 1997 M. Metcalf informs M.B. of action currently being taken at CERN to anticipate and pre-empt the impact of the Millennium Bug. At that time main activity is on administrative side, where AS started early to attack date-interval problems. Proposal to appoint Y2K coordinators for each division. Legal problems to be addressed. EP-division - questionnaire sent to all experiments. Optimistic Replies (Abstract ->Mike M. ) Experiments have time to solve problems during shutdown Nov. 99 -> Mar. 00 Provided that : CERN standard software is made Y2K compliant. Admin. Packages, CERN libraries (databases), Windows, Novell Network, OS9 expertise. Y2K Bulletin no. 1 July 1998 by Sverre Jarp (new coordinator) Implementation of WEBsite … + first pages with valuable information. Start of “monthly” meetings of the Y2K coordinators Sept. 28, 1998 Web-pages for each division (via “Y2K” on CERN home page ) These pages now contain very much information on the present situation in the divisions and are a good introduction to the problems encountered.

Upload: ian-simmons

Post on 28-Mar-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

History of Handling of the “Year 2000” Problem at CERN

• CERN Computer Newsletter 228 (July-September)– Problem stated by Mike Metcalf, CERN Y2K coordinator in 1997

• Management Board of Sept. 1, 1997– M. Metcalf informs M.B. of action currently being taken at CERN to anticipate and

pre-empt the impact of the Millennium Bug. At that time main activity is on administrative side, where AS started early to attack date-interval problems. Proposal to appoint Y2K coordinators for each division. Legal problems to be addressed.

• EP-division - questionnaire sent to all experiments. Optimistic Replies (Abstract ->Mike M. )

– Experiments have time to solve problems during shutdown Nov. 99 -> Mar. 00

– Provided that : CERN standard software is made Y2K compliant.• Admin. Packages, CERN libraries (databases), Windows, Novell Network, OS9 expertise.

• Y2K Bulletin no. 1 July 1998 by Sverre Jarp (new coordinator)– Implementation of WEBsite … + first pages with valuable information.

• Start of “monthly” meetings of the Y2K coordinators Sept. 28, 1998

• Web-pages for each division (via “Y2K” on CERN home page )– These pages now contain very much information on the present situation in the

divisions and are a good introduction to the problems encountered.

Page 2: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

1999: Handling of the “Year 2000” Problem at CERN

• Prof. Maiani started new mandate by discussing issue in Directorate on 8. Jan.– Management wishes to see all services and systems of each division/group listed on

Web with Y2K readiness easily high-lighted. 100% awareness to be achieved.

• Status Report to the Management Board on Feb. 22 (after preparation in Directorate) – (next slide)

• Short status message to the Finance Committee in March

• Full Status Report to the Finance Committee in June – Full planning of job to be done and how the divisions will go about it until 31.12.

Page 3: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

CERN policy (from talk given by S. Jarp (CERN Y2k Coordinator) to the Management Board on February 22)

• Three fundamental principles of global CERN policy:– The Divisions are fully responsible

• Each Service Unit must be requested to work on the Y2K issue with high priority

– The problems must be solved with existing resources

• Solving the Y2k issues may mean that other activities are pushed into next year.

– Highest priority must be given to:

• Safety of personnel

• General infrastructure

======================================================================

• It follows for the experiments of EP Division:

– The Experiments are fully responsible

– The problems must be solved with existing resources

• And for each Experiment:

– The Sub-units (online, offline, controls) are fully responsible

– The problems must be solved with existing resources======================================================================

• For EP services (this is a less voluminous activity):

– Highest priority must be given to Safety of personnel and general infrastructure

Page 4: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

Major CERN wide Activities

• Administrative Divisions– The Administrative Sector has bought a machine - similar to those

running the administrative tasks now - dedicated exclusively to testing problems connected with Y2K. Full-scaled simulation. This includes Purchasing, Salaries, Pension Fund, Claims, Human Resource management, etc. All their programs are being tested out and corrected.

• IT Division– A “Y2KPLUS cluster” is being set up with old workstations, and AFS

server, a UNIX time server, etc. This is to test intercommunication and interdependencies.

• EP Division

– as requested for this talk =>

Page 5: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

EP Division: Status

• General Situation: – As major end-user of CERN services rather than as a supplier, the division

is less vulnerable to immediate effects of the year 2000 roll-over.

• Awareness:– In January, mail to all experiments/units with request for “Y2K controller”

for each experiment. Have received nominations from most of the experiments that will be on the floor in 2000. Discussions going on. No replies from new experiments (or “data-taking finished”), but these are of less urgency at the moment.

• EP Division Web page:– First version available. To be enlarged when more data available.

• General Tendency:– Do not risk to advance dates in the online system. Get through 1999 first,

then get through 2000 (last year for Lep Exper. ) by installing patches, updates. But do testing and preparing things offline already now.

Page 6: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

EP Division: services EP-EOS and EP-PES

• GSS (general surveillance system) for LEP experiments [E. Sbrissa]: – data acquisition based on OS9, no date management. Date handled by the exploitation system

on 8 VAXes. Patch by DIGITAL, seems ok.

– Possible problem with expert system in connection with Oracle. Best solution(?): generate “final” system for 2000 by end 1999.

• Equipment Safety Crates (Aleph & Delphi):– Systems so old (6800 processors under FLEX) that they do not know how to do things with the

date; therefore Y2k-hard. But no recompilation!

• ALEPH and DELPHI Magnet controls:– Aleph: controls being made y2k compliant during 1998/99 maintenance. Delphi: no date=ok.

• PC support for EP Division [H. Wendler]– Clarifications needed from IT and from the PC shop concerning the acquired types and the

procedures needed for each PC coming from the PC shop to make it Y2k compatible (slide).

Page 7: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

EP Division: experiments

• ALEPH [W. Tejessy] – Optimistic view, mainly due to the major software developers still being members

of the collaboration. Aleph intends to first complete 1999 running and then concentrate efforts on the shutdown period 99/00 as the most cost/manpower efficient method to tackle the problem.

• NEVERTHELESS: possible problems are being analyzed.

• AND ALEPH relies on the IT/CO group for advice/help/furnishing of updates for OS9.

• DELPHI [ A. Augustinus]– Slow Controls: ACTIS G96: one crate ran in “March 2000” without problems.

– Equipment safety crates: moved one clock to “2000” without problems.

– Rutherford Solenoid crate: to avoid some database hick-up it will run in “1950”.

– DAQ: now run OS9 V2.4, will eventually revert to V2.9 or similar patch. They will rely on IT/CO group to furnish OS9 support.

– Offline: Into a raw data tape a date of “2000” was patched. Several bugs in processing chain found - formatting, “wrong-date” forks, etc - and repaired.

– Rich: This sub-detector must stay heated over New Year: must be guaranteed.

Page 8: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

EP Division: experiments, continued

• OPAL [Matthias Schroeder (Per Scharff-Hansen)]– Online computers (OS9 and HPUX) will be made Y2K compliant BEFORE this

year’s running. Tests have been undertaken on a separate OS9 server to evaluate the situation (they use MVME167, MVME147, MVME165 and FIC8230 boards).

– Search for Y2K bugs ins OPAL programs started.

– Offline computers: HPUX become compliant in 1999. (VMS cluster will be stopped end of 1999.)

• Other experiments– No up-to-date information has been provided to me as of today.

• Personal Conclusions:– LEP experiments are aware of the problems seem confident that they have sufficient

expertise to eradicate the bug during the shutdown. It is felt that it is one more problem of magnitude summer->winter-time that can be coped with.

– Other experiments - needs renewed effort on my part to increase awareness.

Page 9: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

PC Support for EP Division (H. Wendler)

• Some 3600 PCs have been sold by the CERN PC shop over the years, of these a substantial part is in EP division. Large diversity of systems. A plan is being discussed with IT to check the makes of these PCs and give some guidelines to users : what type of problems must they expect from their personal PC?

• Situation is not clear (to my understanding):– Are there PCs that you put OFF on Dec. 20 and that when turning ON on Jan. 5 will

no longer come up, i.e. really block, so they can only be thrown away?• (It is clear that non-compliant PCs will give you all sorts of headaches with wrong dates,

but if you are active there are work-arounds (date-changing) to be able to limp on.)

– It is said that keeping the PC running over the 1999->2000 rollover may cause it to irremediably block and become a useless piece of scrap-metal. Is this true and on which PCs is it likely to happen? Would this not have happened if OFF on Jan. 31?

• General Conclusion:– It would be nice to have central answers to these questions, rather than to let every

experiment find out for itself what will happen…

• Afterthought:– Are there other standard systems with CERNwide problems (e.g. OS9)?

Page 10: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

OS9 Support by IT Division (Martin Merkel)

• Most experiments incorporate OS9-based boards mainly running V2.4, but its a zoo of species with eating-habits that almost nobody can remember; e.g.:

– Microware ships software only for Motorola platforms. Others:see hardware vendors.

– C.E.S. sold FIC8230/32/34, but no longer supports OS-9/68k. Work-around to be found!

– Themis 131 (used in SL division) runs OS-9 V2.3, considered obsolete by Microware. Bye.(?)

– ELTEC offers Year 2000 update for old OS-9 v2.4.5 (includes Eurocom-5 SBC.) In German.

– PEP has V3.03 for VM40 and a patch for VM20.

– Struck/ALEPH Event Builders (no longer used by ALEPH, but maybe by NA49(?), run in-house port of Os-9 v2.3. Correction might be possible...

• So, do not think that you can buy standard food to survive the roll-over.

• CONCLUSION:– The hierarchy of responsibility CERN -> Divisions-> Experiments acceptable in the

sense that the experiments must analyze their specific problems.

– BUT: centralized help in the form of solutions for specific problems of each experiment must be provided by CERN on request by each experiment.

– IN FACT: Martin Merkel of IT/CO is busy on this task. He has bought the source of V2.9 (for Motorola) from which work-arounds for other systems should be buildable.

– ERGO: we must make sure that he is fully available for these tasks.

Page 11: History of Handling of the Year 2000 Problem at CERN CERN Computer Newsletter 228 (July-September) –Problem stated by Mike Metcalf, CERN Y2K coordinator

Examples of Problems encountered

• Expert advances clock to year 2000 to test software. License expires irrecoverably.

• ADVANCES & CLAIMS: entry of Feb. 29, 2000 translated to Feb. 29, 1900 which doesn’t exist producing error, before user program can do (IF 1900 THEN SET 2000).

• OS-9 similar problem: Jan 1, 2000 becomes Jan 1, 1900, but asctime -> Feb. 6, 2106.

• Database entries: CAS Console accepts 99 as 1999 and 100 as 2000 (you must know it!)

• Offline reconstruction program: one version decided that every event after Dec 31, 99 was crap to be discarded. This is typical of the difficult-to-predict faults.

• Embedded Systems:• Little experience available. But the Herald Tribune of Jan 2/3, 1999 reports in

connection with statements by John Koskinen, chief year 2000 adviser to Clinton, that “technicians have found remarkably few date-related problems with the electronic circuitry in a host of other everyday devices, from subway cars to elevators”.

– Few bugs with Boeing

– Few or no bugs with major automakers