Frequent Sequence, Itemset and Association Rule Mining
Sequence Mining based Alarm Suppression
To provide more insight into the process dynamics and represent the temporal relationships among faults, control actions and process variables we propose of a multi-temporal sequence mining based algorithm. The methodology starts with the generation of frequent temporal patterns of the alarm signals. We transformed the multi-temporal sequences into Bayes classifiers. The obtained association rules can be used to define alarm suppression rules. We analyzed the dataset of a laboratory-scale water treatment testbed to illustrate that multi-temporal sequences are applicable for the description of operation patterns. We extended the benchmark simulator of a vinyl acetate production technology to generate easily reproducible results and stimulate the development of alarm management algorithms. The results of detailed sensitivity analyses confirm the benefits of the application of temporal alarm suppression rules which are reflecting the dynamical behaviour of the process.
The files are the supplementary materials of our paper will be published in IEEE Access, 2018 For the extended simulator of the vinyl acetate production technology and the source codes of the Bayes’ theorem-based evaluation of sequences see: HTTPS://GITHUB.COM/ABONYILAB/VACSIMULATOR
The MATLAB implementation of the sequence mining algorithm is available at: HTTPS://GITHUB.COM/ABONYILAB/MULTI-TEMPORAL-SEQUENCE-MINING
Fuzzy association rule mining for feature and model structure selection
Effective methods for feature and model structure selection are very important for data-driven modeling and system identification tasks. A new method for selecting important variables in nonlinear (dynamic) models with mixed discrete (categorical, fuzzy) and continuous inputs and outputs was developed. The method applies fuzzy association rule mining and the selection process of important variables (model structure) is based on two rule interesting measures. The method is able to select the most relevant variables in nonlinear feature selection problems. Moreover it selects the right model order of strongly nonlinear dynamical system, therefore it can be a very efficient tool for process modeling.
Compact and accurate fuzzy classifiers can be constructed by fuzzy association rule mining
The interpretability and accuracy are critical issues in many classification applications. Associative classifier methods can have high accuracy but these predictions are based on too large sets of rules. In contrast to them, a new method was developed which produces very compact and accurate fuzzy classifier systems at the same time. Therefore, it efficiently helps to understand the relationships of data and the predict mechanism in several types of classification problem.
Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data
During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in highdimensional data.The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data) and biclustering (applied to gene expression data analysis). The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner.The proposed algorithm has been implemented in the commonly used MATLAB environment.
A Király, A. Gyenesei, J. Abonyi, Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data, The Scientific World Journal, vol. 2014, Article ID 870406, 7 pages
Bittable_TID - a Bit-Table based biclustering tool
Bittable_TID is a biclustering tool written in MATLAB. It provides a fast solution for finding all biclusters within a binary data matrix.
Quick Download and running guide
You can download the MATLAB source code, the other software tools used for comparison and data sets from here:
- Bittable_TID program package: Bittable_TID
- BiMAX software package: BiMAX
- Supplementary data for the related publication (input data sets): Bittable_TID datasets
After downloading and unpacking the program package, the program can be run by opening the file
bittable_TID.m in MATLAB. The resulted closed itemsets are presented in the variable:
itemsCell. Each row represents a closed itemset, where first column contains the involved rows while the second the involved columns.