Frequent Sequence, Itemset and Association Rule Mining

Sequence Mining based Alarm Suppression

To provide more insight into the process dynamics and represent the temporal relationships among faults, control actions and process variables we propose of a multi-temporal sequence mining based algorithm. The methodology starts with the generation of frequent temporal patterns of the alarm signals. We transformed the multi-temporal sequences into Bayes classifiers. The obtained association rules can be used to define alarm suppression rules. We analyzed the dataset of a laboratory-scale water treatment testbed to illustrate that multi-temporal sequences are applicable for the description of operation patterns. We extended the benchmark simulator of a vinyl acetate production technology to generate easily reproducible results and stimulate the development of alarm management algorithms. The results of detailed sensitivity analyses confirm the benefits of the application of temporal alarm suppression rules which are reflecting the dynamical behaviour of the process.

The files are the supplementary materials of our paper will be published in IEEE Access, 2018 For the extended simulator of the vinyl acetate production technology and the source codes of the Bayes’ theorem-based evaluation of sequences see: HTTPS://GITHUB.COM/ABONYILAB/VACSIMULATOR

The MATLAB implementation of the sequence mining algorithm is available at: HTTPS://GITHUB.COM/ABONYILAB/MULTI-TEMPORAL-SEQUENCE-MINING

Fuzzy association rule mining for feature and model structure selection

Effective methods for feature and model structure selection are very important for data-driven modeling and system identification tasks. A new method for selecting important variables in nonlinear (dynamic) models with mixed discrete (categorical, fuzzy) and continuous inputs and outputs was developed. The method applies fuzzy association rule mining and the selection process of important variables (model structure) is based on two rule interesting measures. The method is able to select the most relevant variables in nonlinear feature selection problems. Moreover it selects the right model order of strongly nonlinear dynamical system, therefore it can be a very efficient tool for process modeling.

F. P. Pach, A. Gyenesei and J. Abonyi, MOSSFARM: Model structure selection by fuzzy association rule mining, Journal of Intelligent and Fuzzy Systems, pp. 399-407 (2008)


Compact and accurate fuzzy classifiers can be constructed by fuzzy association rule mining

The interpretability and accuracy are critical issues in many classification applications. Associative classifier methods can have high accuracy but these predictions are based on too large sets of rules. In contrast to them, a new method was developed which produces very compact and accurate fuzzy classifier systems at the same time. Therefore, it efficiently helps to understand the relationships of data and the predict mechanism in several types of classification problem.

Pach F.P., Gyenesei A., Abonyi J., Compact fuzzy association rule-based classifier, Expert systems with applications, 2008,34,4,2406-2416

Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data

During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in highdimensional data.The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data) and biclustering (applied to gene expression data analysis). The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner.The proposed algorithm has been implemented in the commonly used MATLAB environment.

A Király, A. Gyenesei, J. Abonyi, Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data, The Scientific World Journal, vol. 2014, Article ID 870406, 7 pages

Bittable_TID - a Bit-Table based biclustering tool

Bittable_TID is a biclustering tool written in MATLAB. It provides a fast solution for finding all biclusters within a binary data matrix.

Quick Download and running guide

You can download the MATLAB source code, the other software tools used for comparison and data sets from here:

  • Bittable_TID program package: Bittable_TID
  • BiMAX software package: BiMAX
  • Supplementary data for the related publication (input data sets): Bittable_TID datasets

After downloading and unpacking the program package, the program can be run by opening the file bittable_TID.m in MATLAB. The resulted closed itemsets are presented in the variable: itemsCell. Each row represents a closed itemset, where first column contains the involved rows while the second the involved columns.

Donwload Bittable_TID

Bit-table representation of market basket data.