Help for PPCA-TSS in Matlab
The main program of the proposed Probabilistic Principal Component Analysis for Time-Series Segmentation technique is main_ppcatss.m.
Part 1 Set the initial parameters
1. The proposed method is able to choose the number of segments based on compatible cluster merging method. For this the initial number of segments (inic) has to be chosen first. If there is no suppose about this, it is advisable to set it "large" - relative to the particular problem.
2. Another important parameter is the threshold value (thres) for compatible cluster merging. It can happen that different problems or datasets need very different threshold value for acceptable results. It is advisable to run the program more times with different thres values.
3. The fuzziness parameter (m) and the number of iterations for analyzing the cluster merging criterion and the final refinement (stop1 and stop2) can also be chosen, their proposed value are m = 2; stop1,stop2 = 100...1000.
4. This program enables the user to compare the results of this method with the PCA based Bottom-Up algorithm (for more details click here - another paper can be downloaded). This method can be based on the Hotelling T2 or the Q model error values. Both of their results are given by this program. The final number of segments have to be chosen (inicBU).
5. It is possible to initialize the PPCA-TSS method based on the results of the Bottom-Up algorithm (inicfromBU and inicfromT2).
inic=5; % Initial number of segments m=2; % Fuzzyness parameter thres=0.75; % Threshold for compatible cluster merging stop1=100; % After every "stop1" number of iterations % the cluster merging criterion will be analyzed stop2=100; % Number of iterations after there are % no mergeable cluster
inicBU=7; % Number of segments for Bottom-Up algorithm inicfromBU=0; % Initialize the PPCA-TSS based on the results of Bottom-Up inicfromT2=0; % If inicfromBU==1, then based on Hotelling T^2 (1) or Q reconstruction error (0)
Part 2 Generate or load the data
You can load your own datasets (note: samples in the rows, time is in the first column) or you can use our synthetic dataset - see the datagen.m function. The observable variables are then depicted in Figure 2.
Part 3 Choose the number of principal component
For this purpose the program depicts the so-called screeplot that plots the ordered eigenvalues according to their contribution to the variance of data. Another possibility is to define the number of principal components (q) based on the desired accuracy (loss of variance) of the PPCA models - it is related to the rate of cumulative sum of the eigenvalues. These can be seen in Figure 3. After you have chosen the right number of principal component, you have to type it in the Matlab Command Window and press Enter.
Part 4 Bottom-Up segmentation
The basic algorithm works by creating a fine segmented representation then merging the lowest cost segments until only the desired number of segments remain. The cost function can be the Hotelling T2 or the Q model error measure of PCA models. The pcaseg.m function fulfils this technique. The results are then depicted in Figure 4. Another submitted paper can be downloaded here that contains the detailed description of this technique.
Part 5 PPCA-TSS segmentation
The ppcamod.m function contains the algorithm. It runs stop1 number of iterations, then the compatibility criterion will be analyzed. See the compat.m function for more details. If there is greater compatibility value by the adjacent clusters in time than the pre-defined threshold value thres, then the most compatible clusters will be merged. This can be done by the mergeclust.m function. This will continue until there are compatible clusters. After that, the algorithm runs stop2 number of iterations for the purpose of final refinement.