Novel techniques and an efficient algorithm for closed pattern mining
Frequent closed itemset mining and biclustering, the two most prominent application fields in pattern discovery, can be reduced to the same problem when dealing with binary (0–1) data. We introduced the FCPMiner pattern mining method to mine such data efficiently. The uniqueness of the proposed method is its extendibility to non-binary data. The mining method is coupled with a novel visualization technique and a pattern aggregation method to detect the most meaningful, non-overlapping patterns. The proposed methods are rigorously tested on both synthetic and real data sets.
Kiraly, A.; Abonyi, J.; Laiho, A.; Gyenesei, A., "Biclustering of High-throughput Gene Expression Data with Bicluster Miner," Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on , vol., no., pp.131,138, 10-10 Dec. 2012
The algorithm is implemented as FCPMiner toolbox, which is a biclustering tool written in Java programming language.
It provides a fast solution for finding all biclusters within a discretized data matrix. The method is capable of distinguishing oppositely changing values within the data. There are three available formats for input data, allowed values including 1,0 and -1.
- FCPMiner program package: FCPMiner
- BiMAX software package: BiMAX
- spmf software package (used for DCI-Closed): DCI_Closed
- IBMQuest software package (for synthetic input data generation): IBMQuest
- DCI-IBM-Binary-Converter software package (for conversion between input data types): DCI-IBM-Binary-Converter
- Supplementary data for the related publication (biclustering results, input data and results of functional enrichment analysis): SupplementaryData Part1, Part2
- Additional supplementary data files for the related publication (additional tables): SupplementaryTables
After downloading and unpacking the program package, the program with a graphical user interface (GUI) can be run by double-clicking on the executable ‘FCPMiner.jar’ file, or typing the following command to the command line: java -Xmx1024M -jar FCPMiner.jar
Use the ‘Browse’ buttons to specify the path of the necessary files. After setting the required parameters (default values are appropriate in most cases), the data has to be loaded by clicking the ‘Load data’ button, and the mining process will start by clicking on ‘Run FCPMiner’ button.
Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data
During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in highdimensional data.The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data) and biclustering (applied to gene expression data analysis). The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner.The proposed algorithm has been implemented in the commonly used MATLAB environment.
Bittable_TID - a Bit-Table based biclustering tool
Bittable_TID is a biclustering tool written in MATLAB. It provides a fast solution for finding all biclusters within a binary data matrix.
Quick Download and running guide
You can download the MATLAB source code, the other software tools used for comparison and data sets from here:
- Bittable_TID program package: Bittable_TID
- BiMAX software package: BiMAX
- Supplementary data for the related publication (input data sets): Bittable_TID datasets
After downloading and unpacking the program package, the program can be run by opening the file
bittable_TID.m in MATLAB. The resulted closed itemsets are presented in the variable:
itemsCell. Each row represents a closed itemset, where first column contains the involved rows while the second the involved columns.