F. Perronnin and C. Dance. Fisher kenrels on visual vocabularies for image categorizaton. In Proc. CVPR, 2006.
Florent Perronnin, Jorge Sánchez, and Thomas Mensink. Improving the fisher kernel for large-scale image classification. In Proc. ECCV, 2010.
The FV is an image representation obtained by pooling local image features. It is frequently used as a global image descriptor in visual classification.
While the FV can be derived as a special, approximate, and improved case of the general Fisher Kernel framework, it is easy to describe directly. Let be a set of dimensional feature vectors (e.g. SIFT descriptors) extracted from an image. Let be the parameters of a Gaussian Mixture Model fitting the distribution of descriptors. The GMM associates each vector to a mode in the mixture with a strength given by the posterior probability:
For each mode, consider the mean and covariance deviation vectors
wherespans the vector dimensions. The FV of image is the stacking of the vectors and then of the vectors for each of the modes in the Gaussian mixtures:
The improved Fisher Vector  (IFV) improves the classification performance of the representation by using to ideas:
- Non-linear additive kernel. The Hellinger's kernel (or Bhattacharya coefficient) can be used instead of the linear one at no cost by signed squared rooting. This is obtained by applying the function additive kernels can also be used at an increased space or time cost. to each dimension of the vector . Other
- Normalization. Before using the representation in a linear model (e.g. a support vector machine), the vector is further normalized by the norm (note that the standard Fisher vector is normalized by the number of encoded feature vectors).
After square-rooting and normalization, the IFV is often used in a linear classifier such as an SVM.
In practice, several data to cluster assignmentsare likely to be very small or even negligible. The fast version of the FV sets to zero all but the largest assignment for each input feature .