How do you know when to use a particular type of affinity index? Euclidean distance coefficient versus Pearson

What are some of the decisive factors to consider when choosing a similarity index. In what cases is the Euclidean distance preferable to Pearson and vice versa?

+8
statistics artificial-intelligence machine-learning nlp
source share
2 answers

The correlation is independent of unity; if you scale one of the objects ten times, you will get different Euclidean distances and the same correlation distances. Therefore, correlation indicators are excellent when you want to measure the distance between objects such as genes defined by their expression profile.

Often absolute or square correlation is used as a distance metric, because we are more interested in the strength of the relationship than in its sign.

However, correlation is only suitable for high-dimensional data; it can hardly be calculated for two- or three-dimensional data points.

Also note that the “Pearson distance” is a weighted type of Euclidean distance, not the “correlation distance” using the Pearson correlation coefficient.

+13
source share

It really depends on the application script you have. Very briefly, if you are dealing with data in which the actual difference in attribute values ​​is important, go with Euclidean Distance. If you are looking for trend or shape similarities, move on to correlation. Also note that if you normalize the z-score in each object, Euclidean Distance behaves similarly to the Pearson correlation coefficient. Pearson is not sensitive to linear data transformations. There are other types of correlation coefficients that take into account only series of values ​​that are insensitive to both linear and nonlinear transformations. Note that the usual use of correlation as a dissimilarity is 1 — a correlation that does not take into account all the rules for metric distance.

There are several studies that select a proximity criterion for a specific application, for example:

Pablo A. Jaskowiak, Ricardo JGB Campello, Ivan G. Costa Filho, “Approximations for Clustering Gene Subtraction Data: Validation Data and Comparative Analysis”, IEEE / ACM Transactions on Computational Biology and Bioinformatics, vol. 99, no. PrePrints, p. 1, 2013

+6
source share

All Articles