Fuzzy Clustering and Data Analysis Toolbox

About the Toolbox

The Fuzzy Clustering and Data Analysis Toolbox is a collection of Matlab functions. Its propose is to divide a given data set into subsets (called clusters), hard and fuzzy partitioning mean, that these transitions between the subsets are crisp or gradual. The toolbox provides four categories of functions:

Clustering algorithms.

These functions group the given data set into clusters by different approaches: functions Kmeans and Kmedoid are hard partitioning methods, FCMclust, GKclust, GGclust are fuzzy partitioning methods with different distance norms.

Evaluation with cluster prototypes.

On the score of the clustering results of a data set there is a possibility to calculate membership for "unseen" data sets with these set of functions. In 2-dimensional case the functions draw a contour-map in the data space to visualize the results.

Validation.

The validity function provides cluster validity measures for each partition. It is useful when the number of cluster is unknown a priori. The optimal partition can be determined by the point of the extrema of the validation indexes in dependence of the number of clusters. The indexes calculated are: Partition Coefficient (PC), Classification Entropy (CE), Partition Index (SC), Separation Index (S), Xie and Beni's Index (XB), Dunn's Index (DI) and Alternative Dunn Index (DII).

Visualization.

The Visualization part of this toolbox provides the modified Sammon mapping of the data. This mapping method is a multidimensional scaling method described by Sammon. The original method is computationally expensive when a new data point has to be mapped, so a modified method described by Abonyi got into this toolbox.

Example.

An example based on industrial data set to present the usefulness of these toolbox and algorithms.

The toolbox provides

- 5 different clustering algorithms (2 hard and 3 fuzzy methods);

- 7 different validity measures;

- 3 types of visualization tools for high dimensional data;

by the following structure:

- Theoretical Introduction;

- References;

- Case Studies.

Clustering algorithms

Hard clustering methods

- K-means algorithm

- K-medoid algorithm

Fuzzy clustering algorithms

- Fuzzy C-means algorithm

- Gustafson – Kessel algorithm

- Gath – Geva algorithm

Validation indices

- PC: Partiton Coefficient

- CE: Classification Entropy

- SC: Partition Index

- S: Separation Index

- XB: Xie and Beni’s Index

- DI: Dunn’s Index

- ADI: Alternative Dunn Index

Projection into 2-D for visualization

- Principal Component Analysis

- Sammon’s Mapping

- Fuzzy Sammon’s Mapping

Installation

The installation is straightforward and it does not require any changes to your system settings. If you would like to use these functions, just copy the directory "FUZZCLUST" within its files where the directory "toolbox" is situated (...MATLAB\ TOOLBOX\ ...).

The documentation that contains the detailed description of the toolbox is available from here.

The MatlabR implementation of the proposed method can be downloaded from here.