Modified Gath-Geva Clustering for Fuzzy Segmentation of Multivariate Time-series

PPCA-TSS method

Partitioning a time-series into internally homogeneous segments is an important data mining problem.

The proposed method can effectively solve this problem.

The changes of the variables of a multivariate time-series are usually vague and do not focus on any particular time point. Therefore, it is not practical to define crisp bounds of the segments. Although fuzzy clustering algorithms are widely used to group overlapping and vague objects, they cannot be directly applied to time-series segmentation, because the clusters need to be contiguous in time. This paper proposes a clustering algorithm for the simultaneous identification of local Probabilistic Principal Component Analysis (PPCA) models used to measure the homogeneity of the segments and fuzzy sets used to represent the segments in time. The algorithm favors contiguous clusters in time and able to detect changes in the hidden structure of multivariate time-series. A fuzzy decision making algorithm based on a compatibility criteria of the clusters have been worked out to determine the required number of segments, while the required number of principal components are determined by the screeplots of the eigenvalues of the fuzzy covariance matrices. The application example shows that this new technique is a useful tool for the analysis of historical process data.

J. Abonyi, B. Feil, S. Nemeth, P. Arva, Modified Gath–Geva clustering for fuzzy segmentation of multivariate time-series, Fuzzy Sets and Systems 149 (2005) 39–56

The Matlab® implementation of the proposed method can be downloaded from here.

Html help for PPCA-TSS in Matlab