Why Did it Fail? Breaking Down Asset Failures
KEEP, TRACK, AND TREND FAILURE CAUSES TO OPTIMIZE YOUR MAINTENANCE PROGRAM.
The study of failure for the purpose of improving asset reliability in a maintenance program starts with the relationship that failure has with function. At the highest level in the reliability structure, equipment and components have a specific function or user-designed task to perform.
An asset’s function can be broken down into two elements: its purpose and the process it will perform. With any undesirable deviation or change in these elements, the user-defined function will have a functional failure. A functional failure is the inability to perform the expected function or meet the acceptable performance standards required by the user. In reliability control tools, such as a FMEA, it is important to identify the acceptable performance standards required to establish a failed state of its function. These failed states can be identified in several categories. The complete loss of an asset’s function is considered a total failure. When an asset performs outside of acceptable limits, it is a partial failure.
A functional failure may produce manufactured goods or products that are out of specification limits or tolerances. The performance standard for measurements, such as the gauges and indicators, can also have a functional failure. It is important to recognize that the failed state or the failure mode may be different for identical assets depending upon their operating context.
The failure mode is the specific way in which the functional failure manifests itself. For example, when a liquid pump has a functional failure and it will not pump the required amount of liquid, the failure mode of the pump is that the output flow is low. There is no reason or cause identified, only that the pump has low flow output. There will be more than one failure mode for equipment assets and components, so we should build a list of the ways something could fail from the desired function. In the case of the liquid pump, its function is to pump liquid at a user-specified rate. If it fails to do that function, the failure modes can be too little flow, no flow, or too much flow.
An interesting point about failure modes is that they have to be caused by something. The failure mode is the manner in which the equipment or component failed, but what caused it? There needs to be a change of state from the desired to undesired condition. We probably won’t know the root cause of this change until corrective activities have been completed, but we might be able identify the mechanisms that could possibly create the change that becomes the failure mode.
We can group failure mechanisms into two major categories: overstress and wear-out. An overstress failure arises because of a single load (stress) condition that exceeds a fundamental strength property. A wear-out failure arises as a result of cumulative damage related to loads (stresses) applied over an extended time. These two major categories of the failure mechanism can come from different sources that are the result of physical, chemical, thermodynamic, or other processes that lead to the failure.
The different types of potential mechanisms include mechanical, electrical, material, instrumentation, or external influences. Mechanical mechanisms could include excessive deflection, buckling, ductile fracture, brittle fracture, impact, creep, relaxation, thermal shock, wear, and overheating, just to name a few. Electrical mechanisms could include short-circuiting, open circuit, no power or voltage, and faulty power. A material mechanism could take the form of corrosion, erosion, wear, breakage, or fatigue. The list is quite long but always points back to an overstress or wear-out condition for the equipment item or component.
As stated earlier, a functional failure is the inability of an equipment item or component to perform the expected function or meet the acceptable performance standards. This undesirable performance or failure will have an impact on the user-defined purpose of that function. This is called the failure effect, defined as the evidence, condition, or resulting influence on the equipment, system, or process that the failure mode creates.
A failure effect can be a visible or physical result that is easy to detect. However, often the failure effect is not immediately evident and requires special methods to identify it. Also, there can be a secondary or collateral damage failure created that is a result from the primary failure effect.
A failure cause is the initiating event or root cause in the sequence leading up to an equipment item’s failure. If the cause element can be removed, prevented, mitigated, or kept within normal limits, the failure event will not happen. A maintenance plan will list common failure causes so that preventive or predictive controls can be implemented to minimize or eliminate the possibility of a failure mode occurring.
With historical data and record-keeping for assets, it is possible to keep track and trend failure causes to optimize the maintenance program. Through collected data review, the value of current preventive maintenance (PM) and predictive maintenance (PdM) practices can be analyzed, tweaked, and improved.
There are multiple strategies for recording equipment historical data. One of the best is through failure coding within a CMMS system. By using failure codes, you limit the database size, so searches will be quicker. A menu-driven pick list code structure will standardize the data entry to provide uniformity within the data records. In addition to coded data record fields, a free-text field describing the event will add unique details about the event and enhance the data record’s quality.
The ISO Standard 14224:2016 Annex B provides very good insight and a baseline for failure coding with examples of each element in the failure taxonomy. This ISO standard, which will fit with multiple manufacturing organizations, provides a template for creating a custom failure code structure. Note that for a code pick list to be effective, it must be easy to use and not so long that a technician has to scroll through it looking for the best choice.
Keep the failure coding list short and generic enough to provide the needed level of detail to create relevant historical records. Rather than adding more menu choices, use the free-text fields for additional record detail.
Written by: Monroe Blanton, a Reliability Technician with Life Cycle Engineering, for Plant Services.