This seriation method is fairly analogous to the homonymous
statistical method (Hubert et Arabie 1986).
It searches for the permutation of rows and columns from D which
minimizes the following criterion:
where a is a multiplicative scaling constant and the
permutation.
In PermutMatrix, we propose two heuristics, one combinatory and one numerical.
The first one is greedy and optimizes the criterion by iteratively
maximizing the gradient.
At each step, an inversion of two elements is sought so that the criterion
will be as improved as possible.
This rough heuristic quickly yields a solution that is often close to the
global optimum.
This is usually sufficient for exploratory work conducted in PermutMatrix.
However, in this algorithm, calculations at each stage are long
O(n2) and the number of stage indefined.
This algorithm could be impracticable for large dataset but it can be very
useful for small data set or for intra-class reorganization in a clustering
tree not totally ramified (see further).
The second heuristic is numerical.
In the first step, it involves searching for the best unidimensional representation
of dissimilarities via the method of Torgerson (1958).
If D is computed with usual Euclidian metrics, then this method consists of
finding the first principal component of the dataset.
This is equivalent to defining a coordinate system x in such a way
that the criterion
is minimum.
The approximate seriation is the enumeration of elements in the order of projections
on the first principal component of the dataset.
Références :
Hubert, L. J. and Arabie, P. (1986) Unidimensional Scaling and Combinatorial Optimization. In J. De Leeuw, W. J. H., J. Meulman, and F. Critchley (ed.), Multidimensional Data Analysis. DSWO Press, Leiden, The Netherlands, pp. 181-196.
Torgerson, W. S. (1958) Multidemensional Scaling, Theory and Methods of Scaling. John Wiley & Sons, New York, NY, pp. 247-297.
PermutMatrix |
|
TOP |