Mapping/Projection Algorithms

 

back to notes

 

Linear Mapping

Linear mapping or -projection is a mapping from a multidimensional input space to a lower-dimensional output space, where a linear dependency exists between the different axes of the input space. The linear principal and independent component analysis (PCA and ICA) model the data as having been generated by independent sources through a linear mapping. The difference between the two is that PCA restricts the distribution of the sources to be Gaussian, whereas ICA does not, in general, restrict the distribution of the sources.  

 

   

PCA (Jolliffe, 1986) / Karhunen-Loeve transform / Eigenanalysis / Hotelling transform

Principal component analysis (or PCA) is a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible. Objectives are to reduce the dimensionality of a data set, and to identify new underlying variables that are now orthogonal.

It can also be seen as a translation, followed by a rotation, so that the data fits better around the axes.

It can be formulated in a recursive way: the direction of the first component is found by maximizing the variance of all variables, and all others are found by using the previous ones (by maximizing the variance of all variables in the space orthogonal to w1).

Usually, the components are found by computing the sample correlation matrix and selecting its eigenvectors (=principal components) for the n biggest eigenvalues (=variances). This is often done by making sure the matrix is square, symmetric, and positive definite (then its eigenvalue decomposition and singular value decomposition are the same) and decomposing it in U, S, V matrices using singular value decomposition. The singular values on the diagonal of the S matrix are then also the eigenvalues. There are, however, faster ways to calculate the eigenvectors with the largest eigenvalues.

PCA doesn't reveal everything, classical examples (such as the "horseshoe") end up with all components having the same weight.

PCA has been implemented in some neural networks by Oja and Sanjer.

I.T. Jolliffe. Principal Component Analysis, Springer-Verlag, New-York, 1988.

 

   

Independent Component Analysis

The aim in ICA is to find independent components (aka sources, or factors) that are non-gaussian and mutually independent. It could be seen as a more powerful extension of PCA or factor analysis (see previous sections).

In many cases, the measurements are given as a set of parallel signals or time series; the term blind source separation is used to characterize this problem. Typical examples are mixtures of simultaneous speech signals that have been picked up by several microphones, brain waves recorded by multiple sensors, interfering radio signals arriving at a mobile phone, or parallel time series obtained from some industrial process. [hut]

 

   

Non-linear Mapping

The problem of nonlinear projection can be stated as follows.  Suppose we have a numerical dataset of N vectors embedded in an n-dimensional space.  Such database may originate from a set of n sensors.  If there exist dependencies between the features of the vector, data are not randomly distributed and the database is said to have a structure.  For example, a plane in the 3-dimensional space is a 2-dimensional structure since there is one dependency linking the 3 coordinates x, y and z.  Very often, the dimension of a structure (say p) is lower than the dimension of its embedding n-dimensional space, so the structure can be projected to a p-dimensional space.  A well known projection method is the Principal Component Analysis (PCA - see previous section): it works pretty well, but only when the dependencies are strictly linear.  Once you have nonlinear dependencies, another method is needed, in order to find a nonlinear mapping between the n-dimensional embedding space and the p-dimensional projection space.  

 

   

Sammon's mapping

One way to achieve Topology Preservation consists in preserving distances between samples of the numerical database before and after projection:
  • Two samples that are close to each other have to stay close when projected;
  • Two projected samples that are close to each other have to originate from two samples that were close two each other.

This kind of topology conservation is also used in the well known Kohonen's maps. In order to guarantee topology preservation, Sammon defines an error function:


 

where dni,j and dpi,j are the distances between the i-th and j-th vector, respectively in the n-dimensional embedding space, and in the p-dimensional projection space.  Given this error function, an optimal projection can be computed using a gradient descent algorithm:

 

   

Curvilinear Component Analysis

Curvilinear Component Analysis brings some improvements to Sammon's mapping.  Actually, when unfolding a nonlinear structure, Sammon's mapping cannot reproduce all distances.  One way to face this problem consists in favouring local topology: CCA tries to reproduce short distances firstly, long distances being secondary.  Formally, this reasoning leads to the following error function(without normalization):

By comparison with ESammon, ECCA has an additional weighting function F depending on di,jp and on parameter l.  The factor F is a decreasing function of its argument, so it is used to favour local topology preservation.  For example, F could be a step function of (l-d
Another improvement brought by CCA to Sammon's mapping is the use of Vector Quantization (VQ).  VQ decreases the computational load of the distance calculation (there 0.5*N*(N-1) distance to compute!).  Unfortunately, the use of VQ implies the use of an interpolation in order to project vectors that are not prototypes of the codebook.  
 

   

Curvilinear Distance Analysis

Curvilinear Distances Analysis is the last development in the Sammon/CCA family.  The novelty in CDA is the use of true curvilinear distances.  The curvilinear distance di,jn between the i-th and j-th points of a structure is the distance measured along the structure and not through the object, like the Euclidean distance.  In practice, the curvilinear distance is computed as the shortest path between two prototypes of the codebook, after quantization and linking of the prototypes. The curvilinear distance is only a little change in the error function:

But it has great consequences.  Hard nonlinear objects like spirals can now be perfectly unfolded!

-from the excellent page http://www.dice.ucl.ac.be/neural-nets/Research/CDA/CDA.htm by John A. Lee

 

Projection of a spiral  (2D to 1D).  

Projection of a ribbon (3D to 2D).  

 

 

Compiled by Kristof Van Laerhoven.