Alarm management

Quality vs. quantity of alarm messages - How to measure the performance of an alarm system

Despite significant efforts to measure and assess the performance of alarm systems, to this day, no silver bullet has been found. The majority of the existing standards and guidelines focus on the alarm load of the operators, either during normal or upset plant conditions, and only a small fraction takes into consideration the actions performed by the operators. In this study, an overview of the evolution of alarm system performance metrics is presented and the current data-based approaches are grouped into seven categories based on the goals of and the methodologies associated with each metric. Deriving from our categorical overview, the terminological differences between the academic and industrial approaches of alarm system performance measurement are reflected. Moreover, we highlight how extremely unbalanced the performance measurement of alarm systems is towards quantitative metrics instead of focusing on qualitative assessment, invoking the threat of excessive alarm reductions resulting from such a unilateral approach. The critical aspects of qualitative performance measurement of alarm systems is demonstrated in terms of the comparison of the alarm system of an industrial hydrofluoric acid alkylation unit before and after the alarm rationalization process. The quality of the alarm messages is measured via their informativeness and actionability, in other words, how appropriate the parameter settings are for the everyday work and how actionable they are by the operators of the process.

Decision trees for informative process alarm definition and alarm-based fault classification

Alarm messages in industrial processes are designed to draw attention to abnormalities that require timely assessment or intervention. However, in practice, alarms are arbitrarily and excessively defined by process operators resulting numerous nuisance and chattering alarms that are simply a source of distraction. Countless techniques are available for the retrospective filtering of alarm data, e.g., adding time delays and deadbands to existing alarm settings. As an alternative, in the present paper, instead of filtering or modifying existing alarms, a method for the design of alarm messages being informative for fault detection is proposed which takes into consideration that the occurring alarm messages originally should be optimal for fault detection and identification. This methodology utilizes a machine learning technique, the decision tree classifier, which provides linguistically well-interpretable models without the modification of the measured process variables. Furthermore, an online application of the defined alarm messages for fault identification is presented using a sliding window-based data preprocessing approach. The effectiveness of the proposed methodology is demonstrated in terms of the analysis of a well-known benchmark simulator of a vinyl-acetate production technology, where the complexity of the simulator is considered to be sufficient for the testing of alarm systems.

Note to practitioners: Process-specific knowledge can be used to label historical process data to normal operating and fault-specific periods. Alarm generation should be designed to be able to detect and isolate faulty states. Using decision trees, optimal”cuts” or alarm limits for the purpose of fault classification can be defined utilizing a labelled dataset. The results apply to a variety of industries operating with online control systems, and especially timely in the chemical industry.


Learning and predicting operation strategies by sequence mining and deep learning (full paper)

The operators of chemical technologies are frequently faced with the problem of determining optimal interventions. Our aim is to develop data-driven models by exploring the consequential relationships in the alarm and event-log database of industrial systems. Our motivation is twofold: (1) to facilitate the work of the operators by predicting future events and (2) analyse how consequent the event series is. The core idea is that machine learning algorithms can learn sequences of events by exploring connected events in databases. First, frequent sequence mining applications are utilised to determine how the event sequences evolve during the operation. Second, a sequence-to-sequence deep learning model is proposed for their prediction. The long short-term memory unit-based model (LSTM) is capable of evaluating rare operation situations and their consequential events. The performance of this methodology is presented with regard to the analysis of the alarm and event-log database of an industrial delayed coker unit.

Dörgő Gy., Abonyi J.: "Learning and predicting operation strategies by sequence mining and deep learning", Computers & Chemical Engineering, Volume 128, 2 September 2019, Pages 174-187

Hierarchical frequent sequence mining algorithm for the analysis of alarm cascades in chemical processes

Faults and malfunctions on complex chemical production systems generate alarm cascades that hinder the work of the operators and make fault diagnosis a complex and challenging task. The core concept of the present work is the incorporation of the hierarchical structure of the technology in a multitemporal sequence mining algorithm to group the large number of variables. The spreading of the effect of malfunctions over the plant is thoroughly traceable on the higher levels of the hierarchy, while the critical elements of the spillover effect can be detected on the lower levels. Confidence-based goal-oriented measures have been proposed to describe the orientation of fault propagation providing a good insight into the causality on a local level of the process, while the network-based representation yields a global view of causal connections. The effectiveness of the proposed methodology is presented in terms of the analysis of the alarm and event-log database of an industrial delayed-coker plant, where the complexity of the problem and the size of the event-log database requires a hierarchical constraint-based representation.


Understanding the importance of process alarms based on the analysis of deep recurrent neural networks trained for fault isolation

The identification of process faults is a complex and challenging task due to the high amount of alarms and warnings of control systems. To extract information about the relationships between these discrete events, we utilise multi-temporal sequences of alarm and warning signals as inputs of a recurrent neural network (RNN) based classifier and visualise the network by principal component analysis. The similarity of the events and their applicability in fault isolation can be evaluated based on the linear embedding layer of the network, which maps the input signals into a continuous-valued vector space. The method is demonstrated in a simulated vinyl acetate production technology. The results illustrate that with the application of RNN based sequence learning not only accurate fault classification solutions can be developed, but the visualisation of the model can give useful hints for hazard analysis.

The manuscript and the Python codes can be downloaded from our Github repository.

Towards Operator 4.0, Increasing Production Efficiency and Reducing Operator Workload by Process Mining of Alarm Data

A methodology to extract temporal patterns of alarm sequences and operator actions from the log files of alarm management systems is proposed. Firstly, time-segments that are informative from the viewpoint of operator interventions are identified by the algorithm. These segments include series of alarms that initialize operator actions, sets of operator actions, and a period that potentially covers the effects of the corrective actions of the operators. In the second step of the methodology, the sets of operator actions that are frequently applied in the same situations are determined. For this purpose, the FP-Growth Algorithm, which is one of the fastest tools of frequent item-set mining and generates well-structured action trees that are not only suitable for the visualization of interventions but lend themselves to build association rules that could be directly applied in decision support systems, is utilized. Finally, multi-temporal sequence mining is applied to reveal what alarms led to the sets of operator actions and what were the effects of these interventions. The applicability of the methodology is illustrated by presenting results connected to the analysis of the delayed coker plant at the Danube Refinery of the MOL Group.

Learning operation strategies from alarm management systems by temporal pattern mining and deep learning

We introduce a sequence to sequence deep learning algorithm to learn and predict sequences of process alarms and warnings. The proposed recurrent neural network model utilizes an encoder layer of Long Short-Term Memory (LSTM) units to map the input sequence of discrete events into a vector of fixed dimensionality, and a decoder LSTM layer to form a prediction of the sequence of future events. We demonstrate that the information extracted by this model from alarm log databases can be used to suppress alarms with low information content which reduces the operator workload. To generate easily reproducible results and stimulate the development of alarm management algorithms we define an alarm management benchmark problem based on the simulator of a vinyl acetate production technology. The results confirm that sequence to sequence learning is a useful tool in alarm rationalization and, in more general, for process engineers interested in predicting the occurrence of discrete events.

Sequence Mining based Alarm Suppression

Despite the high-pace improvement of industrial process automation, the management of abnormal events still requires human actions. Alarm systems are becoming crucial in providing situation-specific information to the decreasing number of operators. The key role of an alarm management system is to ensure that only the currently significant alarms are annunciated. The design of alarm suppression rules requires the systematic analysis of the process and its control system. We give an overview of the recently developed data-driven techniques and show that the widely applied correlation-based methods utilize a static view of the system. To provide more insight into the process dynamics and represent the temporal relationships among faults, control actions, and process variables, we propose of a multi-temporal sequence mining-based algorithm. The methodology starts with the generation of frequent temporal patterns of the alarm signals. We transform the multi-temporal sequences into Bayes classifiers. The obtained association rules can be used to define the alarm suppression rules. We analyze the data set of a laboratory-scale water treatment testbed to illustrate that multi-temporal sequences are applicable for the description of operation patterns. We extended the benchmark simulator of a vinyl acetate production technology to generate easily reproducible results and stimulate the development of alarm management algorithms. The results of detailed sensitivity analyses confirm the benefits of the application of temporal alarm suppression rules, which are reflecting the dynamical behavior of the process.

For the extended simulator of the vinyl acetate production technology and the source codes of the Bayes’ theorem-based evaluation of sequences see: HTTPS://GITHUB.COM/ABONYILAB/VACSIMULATOR

The MATLAB implementation of the sequence mining algorithm is available at: HTTPS://GITHUB.COM/ABONYILAB/MULTI-TEMPORAL-SEQUENCE-MINING

Detection of Safe Operating Regions - a Novel Dynamic Process Simulator Based Predictive Alarm Management Approach

The operation of complex production processes is one of the most important research and development problems in process engineering. A Safety Instrumented System (SIS) performs specified functions to achieve or maintain a safe state of the process when unacceptable or dangerous process conditions are detected. The safe state is a state of the process operation where the hazardous event cannot occur. The set of safe states definene safe operating regions. A logic solver is required to receive the sensor input signal(s), make appropriate decisions based on the nature of the signal(s), and change its outputs according to userdefined logic. Next, the change of the logic solver output(s) results in the final element(s) taking action on the process (e.g. closing a valve) to bring (back) it to a safe state. Alarm management is a powerful tool to support the work of the operators to control the process in safe operating regions and detect process malfunctions. Predictive alarm management systems should be able not only to the early detection of the alarm, but also to give advice to process operators which safety action (or safety element(s)) must be applied. The aim of this paper is to develop a novel methodology and toolkit to support these tasks. The essential of the proposed methodology is the simulation of the effect of safety elements over a prediction horizon. Since different manipulations have different time demand to avoid the evolution of the unsafe situation (safety time), the process operators should know which safety action(s) should be taken in a given time. For this purpose a method for model based predictive stability analysis has been worked out based on the Ljapunov's stability analysis of simulated state trajectories. The introduced algorithm can be applied to explore the stable and unstable operating regimes of a process (set of safe states), which information can be used for predictive alarm management. The developed methodology has been applied in two industrial benchmark problems related to the thermal runaway of these reactors.