Untitled Document

Click on an area of the dialog box below to reach its description.

Data

Example	Type	Numeric	The input file contains numerical data.
		Binary	The input file contains binary data.(Option not implemented yet)


	Delimiter	In the data file, blanck characters are interpreted as delimiters.

	Input File	The first record contains the column identifiers.
		The first column contains the row identifiers.
	Weighted	Rows	Rows are weighted. Place the weights of each row in the first numerical column, just after the row identifiers.
		Columns	Columns are weighted. Place the weights of each column in the first numerical row, just after the column identifiers.

TOP

What to display ?

1 - Numerical Data

	Original values	The graphical representation is based on the raw data of the input file, without any transformation.
	Mean center Columns	The graphical representation is based on column-centered data: column mean is subtracted from each value, so that mean value of each column is 0.
	Mean center Rows	The graphical representation is based on row-centered data: row mean is subtracted from each value, so that mean value of each row is 0.
	Normalize Columns (Z-score)	The graphical representation is based on Z-scores. Columns are standardized to have mean 0 and standard deviation 1.
	Normalize Rows (Z-score)	The graphical representation is based on Z-scores. Rows are standardized to have mean 0 and standard deviation 1.
	Row-column interactions	The graphical representation is based on data after subtraction of row and column deviation. The sums of each row and each column are equal to zero. These values correspond to the interaction term issued from a two-factor analysis of variance (rows x columns). This term can be expressed as:

2 - Nominal (categorical) Data

	Same Color Levels	The graphical representation is based on the raw data of the input file, without any transformation.
	Specific Color Levels	The graphical representation is based on column-centered data: column mean is subtracted from each value, so that mean value of each column is 0.

TOP

Dissimilarity

1 - Numerical Data 2 - Binary Data 3 - Nominal (categorical) Data		Distances or Index of dissimilarity used for clustering and seriation
	Euclidean distance		http://en.wikipedia.org/wiki/Euclidean_distance
	Manhattan distance		http://en.wikipedia.org/wiki/Taxicab_geometry
	Chebyshev distance		http://en.wikipedia.org/wiki/Chebyshev_distance
	Pearson distance	Clustering, and seriation methods, are based on Pearson disimilarity s =1-r, where r is the usual Pearson correlation coefficient. This metric varie between 0 and 2. s = 0 when two curves have "identical" shape, but different magnitude. s = 1 when curves variations are completely independent. s = 2 when two curves have opposite shape.
	Squared Pearson distance	Clustering, and seriation methods, are based on Pearson disimilarity , where r is the usual Pearson correlation coefficient. This metric varies between 0 and 1. s = 0 when two curves have "identical" or "opposite" patterns, but different magnitude. s = 1 when curves variations are completely independent. This disimilarity will be useful for very specific situation. It will be difficult to read clustering and seriation results. Two very different colour patterns could be put together.
	Jaccard's distance		http://en.wikipedia.org/wiki/Jaccard_coefficient
	Dice's dissimilarity indice		http://en.wikipedia.org/wiki/Dice's_coefficient
	Read dissimilarity values from a file	If you want a special distance, or dissimilarity, you can compute your own distance or dissimilarity values with another application and import these values in PermutMatrix from a file. This file must be a triangular matrix in a standard text file format (ASCII) as for data file format. In this case, you have to provide two files. One for distances, or dissimilarities, between rows and one for distances, or dissimilarities, between columns.

TOP

Hierarchical clustering

		aggregation criteria
		Complete linkage	The distance between two classes A and B is the maximum of all pairwise distances between items contained in A and B.
		Single linkage	The distance between two classes A and B is the minimum of all pairwise distances between items contained in A and B.
		McQuitty 's method (WPGMA)	The distance between two classes A and B is an weighted mean of all pairwise distances between items contained in A and B. The distance of a group of two classes A and B to a class C is the unweighted average of the distances between A and C on one hand, and B and C on the other.
		Average linkage (UPGMA)	The distance between two classes A and B is the unweighted mean of all pairwise distances between items contained in A and B. So, the distance of a group of two classes A and B to a class C is the average of the distances between A and C on one hand, and B and C on the other hand, the average being weighted by the size of each class.
		Ward's minimum variance	At each aggregation step, the lowest variation of intra-class inertia is searched.

TOP

Seriation

		Seriation Criteria:
		Multiple-fragment heuristic (MF)
		Unidimentional Scaling (UDS)
		Inertia around diagonal
		Bipolarization
		Minimum path length (TSP)

TOP

Improve seriation solution

Multiple-fragment heuristic do not guarantee to find an optimal solution but give quickly a "reasonable good" solution. In PermutMatrix the Multiple-fragment heuristic can be improved by the "2-Opt" local search algorithm.

TOP

Automation

	Check these boxes to execute automatically the specified seriation method after a hierarchical clustering. If these boxes are checked, Hierarchical clustering and seriation will be performed automatically after any modification of options parameter.
At any time, you are able to run cluster analysis, followed automatically by seriation, using the keyboard shortcut : F6, for rows reorganisation; Shift + F6 for columns reorganisation; F7 for both.

TOP

PermutMatrix

Data

What to display ?

Original values

Mean center Columns

Mean center Rows

Normalize Columns (Z-score)

Normalize Rows (Z-score)

Row-column interactions

Same Color Levels

Specific Color Levels

Dissimilarity

1 - Numerical Data 2 - Binary Data 3 - Nominal (categorical) Data

Distances or Index of dissimilarity used for clustering and seriation

Euclidean distance

Manhattan distance

Chebyshev distance

Pearson distance

Squared Pearson distance

Jaccard's distance

Dice's dissimilarity indice

Read dissimilarity values from a file

Hierarchical clustering

Seriation

Improve seriation solution

Automation

1 - Numerical Data

2 - Binary Data

3 - Nominal (categorical) Data