# Time series mining

## Energy monitoring of process systems: time-series segmentation-based targeting models

Energy monitoring systems calculate actual energy use, estimate energy needs at normal operation, track energy metrics, and highlight issues related to energy efficiency of process plants. Analysis of key energy indicators (KEIs) allows the comparison of process efficiency at different operating regimes. Based on the extracted knowledge realistic targets of KEIs can be determined. The performance of data-driven targeting models depends on how effective the operating regimes are characterized. Till now this modeling task is performed manually based on heuristic and subjective evaluation of the operation. A goal-oriented time-series segmentation technique has been developed to automate the selection of proper data used for the identification of targeting models. With the proposed novel segmentation algorithm targeting-models for different operating regions can be automatically determined. The concept of the resulted energy monitoring system is demonstrated at Heavy Naphtha Hydrotreater and CCR Reforming Units of MOL Hungarian Oil and Gas Company.

Janos Abonyi, Tibor Kulcsar, Miklos Balaton, Laszlo Nagy, Energy monitoring of process systems: time-series segmentation-based targeting models, Clean Technologies and Environmental Policy, 16 (2014) 1245-1253

## Fisher information matrix based time-series segmentation of process data

Advanced chemical process engineering tools, like model predictive control or soft sensor solutions require proper process models. Parameter identification of these models needs input–output data with high information content. When model based optimal experimental design techniqes cannot be applied, the extraction of informative segements from historical data can also support system identification. We developed a goal-oriented Fisher information based time-series segmentation algorithm, aimed at selecting informative segments from historical process data. The utilized standard bottom-up algorithm is widely used in off-line analysis of process data. Different segments can support the identification of parameter sets. Hence, instead of using either D- or E-optimality as the criterion for comparing the information content of two input sequences (neigbouring segments), we propose the use of Krzanowski's similarity coefficient between the eigenvectors of the Fisher information matrices obtained from the sequences. The efficiency of the proposed methodology is demonstrated by two application examples. The algorithm is capable to extract segments with parameter-set specific information content from historical processdata.

L. Dobos, J. Abonyi, Fisher information matrix based time-series segmentation of process data, Chemical Engineering Science 101 (2013) 99-108

The Matlab® implementation of the proposed method can be downloaded from here: Fisher_MATLAB_sources

## On-line detection os homogeneous operation ranges by dynamic principal component analysis based time-series segmentation

Development of chemical process technologies shall be based on the analisys of process data. In the field of process monitoring the recursive. Principal Component Analysis (PCA) is widely applied to detect any misbehavior of the technology. The investigation of transient states needs dynamic PCA to describe the dynamic behavior mopre accurately. By combining and integrating the recursive and dynamic PCA into time series segmentation techniques, efficient multivariate segmentation methods were resulted to detect homogenous operation ranges based on process data. The similarity os time series segments is evaluated based on the Krzanowski-similarity factor, wich compares the hyperplanes determined by the PCA models. With the help of developed time series segmentation framework separation of operation regimes becomes pissible for supporting process monitoring and control. The performance of the proposed methodology is presented throughout a linear process and the commonly applied Tennessee Eastman process.

L. Dobos, J. Abonyi, On-line detection os homogeneous operation ranges by dynamic principal component analysis based time-series segmentation, Chemical Engineering Science 75 (2012) 96–105

## Correlation based dynamic time warping of multivariate time series

In recent years, dynamic time warping (DTW) has begun to become the most widely used technique for comparison of time series data where extensive a priori knowledge is not available. However, it is often expected a multivariate comparison method to consider the correlation between the variables as this correlation carries the real information in many cases. Thus, principal component analysis (PCA) based similarity measures, such as PCA similarity factor (SPCA), are used in many industrial applications.

In this paper, we represent a novel algorithm called correlation based dynamic time warping (CBDTW) wich combines DTW and PCA based similarity measures. To preserve correlation, multivariate time series are segmented and the local dissimilarity function of DTW originated from SPCA. The segments are obtained by bottom-up segmentation using special, PCA related costs. Our novel technique qualitified on two databases, the database of signature verification competition 2004 and the commonly used AUSLAN dataset. We show that CBDTW outperforms the standard SPCA and the most commonly used, Euclidean distance based multivariate DTW in case of datasets wich complex correlation structure.

## Dynamic Principal Component: Analysis in Multivariate Time-Series Segmentation

Principal Component Analysis (PCA) based, time-series analysis methods have become basic tools of every process engineer in the past few years thanks to their efficiency and solid statistical basis. However, there are two drawbacks of these methods which have to be taken into account. First, linear relationships are assumed between the process variables, and second, process dynamics are not considered. The authors presented a PCA based multivariate time-series segmentation method which addressed the first problem. The nonlinear processes were split into locally linear segments by using T2 and Q statistics as cost functions. Based on this solution, we demonstrate how the homogeneous operation ranges and changes in process dynamics can also be detected in dynamic processes. Our approach is examined in detail on simple, theoretical processes and on the well-known pH process.

Zoltán Bankó, László Dobos, János Abonyi, DynamicPrincipal Component: Analysis in Multivariate Time-Series Segmentation, Conservation, Information, Evolution - Towards A Sustainable Engineering And Economy 1: (1) pp. 11-24. (2011)

## Fuzzy clustering based time-series segmentation

Partitioning a time-series into internally homogeneous segments is an important data mining problem. The proposed method can effectively solve this problem. The changes of the variables of a multivariate time-series are usually vague and do not focus on any particular time point. Therefore, it is not practical to define crisp bounds of the segments. Although fuzzy clustering algorithms are widely used to group overlapping and vague objects, they cannot be directly applied to time-series segmentation, because the clusters need to be contiguous in time. This paper proposes a clustering algorithm for the simultaneous identification of local Probabilistic Principal Component Analysis (PPCA) models used to measure the homogeneity of the segments and fuzzy sets used to represent the segments in time. The algorithm favors contiguous clusters in time and able to detect changes in the hidden structure of multivariate time-series. A fuzzy decision making algorithm based on a compatibility criteria of the clusters have been worked out to determine the required number of segments, while the required number of principal components are determined by the screeplots of the eigenvalues of the fuzzy covariance matrices. The application example shows that this new technique is a useful tool for the analysis of historical process data.

Z. Bankó, J. Abonyi, Correlation based dynamic time warping of multivariate time series, Expert Systems with Applications 39 (2012) 12814–12823

The Matlab® implementation of the proposed method can be downloaded from here: CbDTW_MATLAB_sources

J. Abonyi, B. Feil, S. Nemeth, P. Arva, Modiﬁed Gath–Geva clustering for fuzzy segmentation of multivariate time-series, Fuzzy Sets and Systems 149 (2005) 39–56

The Matlab® implementation of the proposed method can be downloaded from here: ppcatss_MATLAB_sources.

The help page of PPCA-TSS software is available here: Html help for PPCA-TSS in Matlab

## Monitoring process transitions by Kalman filtering and time-series segmentation

The analysis of historical process data of technological systems plays important role in process monitoring, modelling and control. Timeseries segmentation algorithms are often used to detect homogenous periods of operation-based on input–output process data. However, historical process data alone may not be sufficient for the monitoring of complex processes. This paper incorporates the first-principle model of the process into the segmentation algorithm. The key idea is to use a model-based non-linear state-estimation algorithm to detect the changes in the correlation among the state-variables. The homogeneity of the time-series segments is measured using a PCA similarity factor calculated from the covariance matrices given by the state-estimation algorithm. The whole approach is applied to the monitoring of an industrial high-density polyethylene plant.

B. Feil, J. Abonyi, S. Nemeth, P. Arva, Monitoring process transitions by Kalman filtering and time-series segmentation, Computers and Chemical Engineering 29 (2005) 1423–1431