Introduction
Using WO as reliability data source to calculate equipment MTBF is a common practice in many big organizations. Meanwhile there are also many software vendors exploiting this demand by providing applications that extract WO data to generate MTBF values for the corresponding equipment.
This presentation describes the flaws in using WO captured in CMMS system to derive MTBF of equipment. First, I state the obvious assumptions when you use WO to derive MTBF. I will then explain the issues associated with these assumptions in practice.
Work-Order is not meant for data source for reliability analysis.
Assumptions
To derive the MTBF from Work-Order, the following assumptions are made by the software developer:
- Each Corrective Maintenance Work-Order (CM-WO) for an equipment corresponds to its failure.
- Repair start-time to Repair end-time of CM- WOs of an equipment is considered as equipment downtime.
- The duration between 2 CM- WOs of an equipment is considered as operational.
The Issues...
The following describes the problems that do not go well with the above assumptions.
Problem 1: A failure may have multiple Work-Order
It is common to see multiple Notification-Order/Work-Orders being issued for a failure. This may result in multiple counts for a failure.
Analyst needs to ensure WOs that refer to the same failure are combined as single event to avoid repeated counts on the same event. This is not an easy task!
If this problem is not handled properly, the first assumption will be violated.
Problem 2: Error in downtime calculation
Consider a Crude-Oil-Transport-Pump (COTP) system that consists of 3-pumps (A, B and C), with “2 active and 1 standby” operating policy.
On day 50, Pump A failed; Work-Order was raised and repair started on day 70 and completed on day 80. The actual downtime for Pump A was 30 days, but repair time was 10 days.
If the analyst relies on Work-Order record as data source, the downtime would be only 10 days! To correct this error, the analyst needs to look from the corresponding Notification Order for Mal-function start-time.
This problem makes the second assumptions difficult to achieve.
Problem 3: Standby duration cannot be obtained from CMMS
The following shows the tri-state plot of three identical Gas-Turbines (with “2 active and 1 standby” operating policy) of a power generation system from 1 September 2016 to 18 February 2019 (900 days). There were 5 failures.
If we rely on CMMS system, 5 failures were observed over 900 days for 3 identical units. Combined downtime for all Gas-Turbines is 80 days. It can’t tell how long is the standby duration of each equipment.
The MTBF of individual Gas-Turbine = (900 x 3 - 80) / 5 = 524 days.
The problem with this calculation is that we used the calendar day of 900 for each unit (minus downtimes), and give credit for standby duration.
Standby duration should be removed from the calendar time.
Following shows the correct calculation:
From the profile determine total run times.
GT-7510A: 368 days
GT-7510B: 304 days
GT-7510C: 328 days
Total running time: 368 + 304 + 328 = 1000 days
Total number of failures: 5
MTBF = 1000/5 = 200 days
Relying on CMMS information will over estimate the reliability of the assets.
Problem 4: Non-Failure event (process event)
Work-Order does not capture downtime events that are due to process upset. Normally, any downing event that is not due to equipment failure, Work-Order are not generated, and hence the corresponding downtimes are not captured.
Situation where equipment A is down due to other equipment, B failure. Work-Order is issued for equipment B but not for A. Hence the downtime information for A is not captured.
Conclusion
CMMS systems are typically designed by accountants and serve as excellent tools for accounting and financial purposes. However, it is unlikely that a reliability engineer is involved in the system's design, and as a result, the requirements for reliability analysis may not be adequately considered. Relying solely on work orders as the primary source of reliability data can be a recipe for failure in an enterprise reliability program.
A data collection system for the process industry should, at a minimum, capture the following information:
- Downtimes with associated failure modes.
- Downtimes due to process events with root causes.
- Equipment on standby time.
- Daily production.
Analyst should avoid using calendar time to derive equipment operating profile, unless the equipment is operating 24/7, and the non-operating times are negligible.
The objective of the data collection system is to capture events in such a way that historical operating profiles can be reproduced with reasonable precision, as shown in Figure 2. This capability enables analysts to accurately derive the failure rate behavior and downtime distribution of each critical piece of equipment.
-End-