pep-ii reliability and uptime
DESCRIPTION
PEP-II Reliability and Uptime. Roger Erickson 10 October 2003 With thanks to C.W. Allen, W. Colocho, P. Schuh, M. Stanek, and the Operations staff members who collected the data. Excludes “long” downtimes and holiday shut-downs. Statistics: Causes of Unscheduled Down Time. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/1.jpg)
PEP-II Reliabilityand Uptime
Roger Erickson10 October 2003
With thanks to C.W. Allen, W. Colocho, P. Schuh, M. Stanek, and the Operations staff members who collected the data.
![Page 2: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/2.jpg)
Excludes “long” downtimes and holiday shut-downs.
![Page 3: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/3.jpg)
Statistics: Causes ofUnscheduled Down Time
• 3 PEP-II running periods considered: January 2000 through June 2003.
• 22,936 total scheduled operating hours.• 2994 hours unscheduled down time.• 5469 reported malfunctions (“events”).• 1317 events directly tied to lost hours.
We can sort the data by area of the machine (HER, linac, etc.), by system categories (RF, vacuum, etc.), by date, and by details of resolution.
![Page 4: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/4.jpg)
Accelerator Performance Statistics
Definitions:
Revealed failures: malfunctions resulting in lost beam time. Also called “events”.
Unscheduled down time: hours lost from scheduled program due to malfunctions.
Mean Time to Fail:
MTTF = Scheduled beam timeEvents
Mean Time to Repair:
MTTR = Unscheduled down timeEvents
Availability = 1 - Unscheduled down timeScheduled beam time
NOTE: PEP-II aborts are not counted as downtime, unless the event is reported; i.e., unless we stop to fix something and make a database entry.
![Page 5: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/5.jpg)
![Page 6: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/6.jpg)
![Page 7: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/7.jpg)
![Page 8: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/8.jpg)
PEP-II Run Totals
Run 1: 1/12/00 – 10/31/00 Run 2: 2/4/01 – 6/30/02 Run 3: 11/15/02 – 6/30/03
Long annual downtimes and holiday shut-downs are not included.
![Page 9: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/9.jpg)
Hardware Availability by Run
MTTF MTTR Availabilityhours hours percent
Run 1 18.57 2.39 87.1
Run 2 17.88 2.02 88.7
Run 3 15.28 2.63 82.8
MTTF has been getting shorter (worse) each run.MTTR improved from Run 1 to Run 2, but got worse during Run 3.
![Page 10: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/10.jpg)
Unscheduled Downtime by Major System
System Run 1 Run 2 Run 3
Injection 5.6 5.0 4.2
PEP Rings 6.8 4.6 10.7
BaBar 0.3 1.2 0.8
PG&E 0.2 0.5 1.5
Availability 87.1 88.7 82.8
Total 100.0 100.0 100.0
Unscheduled down time (percentage), sorted by responsible system.
![Page 11: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/11.jpg)
![Page 12: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/12.jpg)
MTTR : PEP-II Rings
Run 1 Run 2 Run 3 Run 1 Run 2 Run 3
MTTR MTTR MTTR Evnts DT hrs Evnts DT hrs Evnts DT hrs
Power Supplies 2.37 1.52 1.50 61 144.7 97 147 83 124.9
Magnets 3.05 2.50 4.80 2 6.1 3 7.5 3 14.4
RF 2.47 1.80 2.71 55 135.8 58 104.2 47 127.6
Vacuum 10.58 3.82 28.68 5 52.9 26 99.4 6 172.1
Utilities 3.29 1.93 1.88 14 46 28 53.9 12 22.6
Controls 1.39 1.45 1.69 42 58.5 63 91.3 32 54.0
Safety 0.70 1 0.7
Other 2.85 1.69 4.13 2 5.7 8 13.5 6 24.8
Totals 182 450.4 283 516.8 189 540.4
![Page 13: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/13.jpg)
Time Required for Repairs
Beam time lost EventsPercent of
total eventsHours
down% of
total DT
> 0 to 1.0 hours 641 48.7% 383.4 12.8%
> 1.0 to 2.0 hours 286 21.7% 463.6 15.5%
> 2.0 to 4.0 hours 241 18.3% 723.0 24.1%
> 4.0 to 8.0 hours 85 6.5% 485.8 16.2%
> 8.0 to 24.0 hours 56 4.3% 686.0 22.9%
> 24.0 hours 8 0.6% 252.7 8.4%
1317 100.0% 2994.5 100.0%
Combined data set from all three runs.
![Page 14: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/14.jpg)
PEP Rings Events Requiring > 2 hours to Repair
Run 3 Data:
33 % of PEP ring eventsrequire > 2 hours to repair.
These account for81 % of PEP ring down time.
![Page 15: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/15.jpg)
Problems Requiring > 24 hours to Fix
January 2000 – June 2003:• 5 vacuum chamber failures in PEP rings.
Some known vulnerabilities were already receiving attention.Vacuum task force is studying options for upgrading some chambers.
• 2 site-wide electrical power outages.These were outside SLAC’s control.
• SLTR quadrupoles overheated when cooling water pump stopped, but power remained on.
![Page 16: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/16.jpg)
Recent Problems Requiring > 24 hours to Fix
August 20, 2003:
VVS transformer failure in linac.
• Failure occurred during E158; no impact on PEP. Two days for full recovery.• Failure was in the only dry-type transformer among 16 VVS’s. Oil-filled, fixed-ratio
replacement options being investigated.
September 12, 2003:
Site-wide power failure when tree grew too closeto 230 kV line. Time lost to PEP program >47 hours.
• Tree trimming had not been done on established schedule.• SLAC now has new contract with tree-trimmer company, with option to renew for five
years.
![Page 17: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/17.jpg)
Underlying Problems Sometimes Cross Technical and Jurisdictional Boundaries
• Seasonal high ambient temperatures cause drift, jitter, timing-shifts, spurious trips, and sometimes component failures in power supplies and sensitive electronics.
• Plan to air-condition the electronics alcove at Linac Sector 0, which houses the master oscillator and electronics critical to accelerator timing. A contract has been awarded.
• Several PEP support buildings have temperature control problems on hot days. More needs to be done to identify cost-effective improvements.
An example of a problem not easily identified by counting malfunction reports.
![Page 18: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/18.jpg)
Injection and Tuning
Normal top-off:
Typically 4 to 5 minutes to fill at intervals of 40 to 50 min. Approx. 10% of scheduled run time.
Why is 21% spent injecting and tuning?Beam aborts require fill from scratch; typically 15 to 25 minutes each time.
![Page 19: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/19.jpg)
Beware of Double counting: An abort in one ring usually leads to an abort in the other.
![Page 20: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/20.jpg)
HER RF Aborts
Station Run 2 Run 3
– 12-1: 0.33 1.1 aborts/day– 12-3: 0.50 0.34– 8-1: 0.22 0.57– 8-3: 0.50 0.68– 8-5: 0.51 0.66– 12-6: 1.65*Total = 2.1 5.0 aborts/day
– All stations were worse in 2003, except 12-3.
* 12-6 fault accounting only available since 10-May-2003.
![Page 21: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/21.jpg)
LER RF Aborts
Station Run 3
– 4-3: 0.88 aborts/day* – 4-4: 0.55 (was 0.56 in 2002)– 4-5: 0.55 (was 0.53 in 2002)
Total = 2 aborts per day
* 4-3 fault accounting only available since 10-May-2003.
![Page 22: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/22.jpg)
BaBar Radiation Aborts
3-year trend, based on data latched by accelerator control system:
– 2000: 5.6 aborts/day– 2001: 4.1 – 2002: 3.6– 2002/3: 2.8
![Page 23: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/23.jpg)
Injection and Tuning Summary
Percentages of scheduled operating hours:
• Normal top-offs: 10%
Fill from scratch following:• RF aborts: 6.3%• BaBar radiation aborts: 3.5%
Approximate total: 20%
Trickle charging could have significant beneficial impact!
![Page 24: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/24.jpg)
Scheduled Off Time
• No routine scheduled maintenance days.
• Repair Opportunity Days (“RODs”) are launched when needed for show-stoppers or upgrade projects (typically 1/month).
• As many ROD and SML jobs as possible are completed during program interruption (typically 50 to 100 identified jobs).
![Page 25: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/25.jpg)
Personnel Protection System (PPS) Testing
• Formerly required approx 3 months of beam-off, most of which was folded into long downtimes, but “verifications” were required at 6-month intervals.
• Net impact on PEP program depended on interval between long downtimes. Typically about 2 weeks/year.
• New policies and procedures have reduced testing to about 3 weeks once each year to coincide with long downtimes, plus operator interlock checks.
![Page 26: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/26.jpg)
Opportunities for FurtherPPS Testing Improvements
• Add switches and indicators to further decouple zones/subsections/systems for testing purposes.
• Further streamline test procedures (much progress made last year).• Train/authorize more staff members, so that testing can be done 24
hours/day when opportunities arise.
Additional uptime to be gained?Possibly 1 week/year, depending on long downtime schedule and “opportunistic” down days.
Long-range proposal: Replace linac and BSY PPS with modern system to facilitate testing and minimize downtime for diagnosing problems.
![Page 27: PEP-II Reliability and Uptime](https://reader036.vdocuments.net/reader036/viewer/2022062321/56813982550346895da114f1/html5/thumbnails/27.jpg)
How to Increase PEP-II Up Time:Challenges to Ourselves
• Allocate resources among hardware projects to achieve optimal improvement in MTTF.
• Identify common-mode or infrastructure projects that will improve overall uptime and stability.
• Find ways to reduce frequency of aborts.
• Minimize scheduled off time through policy and procedure changes and aggressive scheduling.
• Reduce MTTR with improved procedures, diagnostic tools, and organizational efficiency.