This tutorial shows how to estiamte Gaussian mixture model using the VlFeat implementation of the Expectation Maximization (EM) algorithm.
A GMM is a collection of Fisher vectors encodings.Gaussian distribution. Each distribution is called a mode of the GMM and represents a cluster of data points. In computer vision applications, GMM are often used to model dictionaries of visual words. One important application is the computation of
Learning a GMM with expectation maximization
Consider a dataset containing 1000 randomly sampled 2D points:
priors are respectively the means , diagonal covariance matrices , and prior probabilities of the
numClusters Gaussian modes.
These modes can be visualized on the 2D plane by plotting ellipses corresponding to the equation:for each of the modes. To this end, we can use the
This results in the figure:
Diagonal covariance restriction
Note that the ellipses in the previous example are axis alligned. This is a restriction of the
vl_gmm implementation that imposes covariance matrices to be diagonal.
This is suitable for most computer vision applications, where estimating a full covariance matrix would be prohebitive due to the relative high dimensionality of the data. For example, when clustering SIFT features, the data has dimension 128, and each full covariance matrix would contain more than 8k parameters.
For this reason, it is sometimes desirable to globally decorrelated the data before learning a GMM mode. This can be obtained by pre-multiplying the data by the inverse of a square root of its covariance.
Initializing a GMM model before running EM
The EM algorithm is a local optimization method, and hence particularly sensitive to the initialization of the model. The simplest way to initiate the GMM is to pick
numClusters data points at random as mode means, initialize the individual covariances as the covariance of the data, and assign equa prior probabilities to the modes. This is the default initialization method used by
Alternatively, a user can specifiy manually the initial paramters of the GMM model by using the
custom initalization method. To do so, set the
'Initialization' option to
'Custom' and also the options
'IniPriors' to the desired values.
A common approach to obtain an initial value for these parameters is to run KMeans first, as demonstrated in the following code snippet:
The demo scripts
vl_demo_gmm_3d also produce cute colorized figures such as these: