Determine how different some vectors are

I want to distinguish data vectors to find similar ones. For example:

A=[4,5,6,7,8]; B=[4,5,6,6,8]; C=[4,5,6,7,7]; D=[1,2,3,9,9]; E=[1,2,3,9,8]; 

In the previous example, I want to highlight that the vectors A, B, C are similar (not the same) to each other, and D, E are similar to each other. The result should be approximately the same: A, B, C are similar and D, E are similar, but group A, B, C is not similar to group D, E. Can Matlab do this? I thought using some sort of classification algorithm or Kmeans, ROC, etc., but I'm not sure which one will be the best.

Any suggestion? thanks in advance

+6
math vector matlab classification
source share
4 answers

Here is the solution I propose based on your results:

 Z = [A;B;C;D;E]; Y = pdist(Z); matrix = SQUAREFORM(Y); matrix_round = round(matrix); 

Now that we have a vector, we can set a threshold value based on the maximum value and decide with which prerequisites it is most suitable.

It would be nice to create a cluster graph showing the differences between them.

Best wishes

+1
source share

One of my new favorite methods for this kind of thing is clustering agglomerates .

First, combine all your vectors into a matrix, where each row is a separate vector. This facilitates the use of such methods:

 F = [A; B; C; D; E]; 

Then you can find the links:

 Z = linkage(F, 'ward', 'euclidean'); 

This can be built using:

 dendrogram(Z); 

enter image description here

This shows a tree where each leaf below is one of the source vectors. Branch lengths show similarities and differences.

As you can see, 1, 2 and 3 are shown very closely, like 4 and 5. This even gives a measure of proximity and shows that vectors 1 and 3 are considered closer than vectors 2 and 3 (in the sense that in percent 7 it is closer to 8 than 6, to 7).

+10
source share

If all the vectors you are comparing have the same length, then the norm for pair differences can be quite sufficient. The selected norm will, of course, depend on your specific criteria of proximity, but with the examples that you show, simply summing the absolute values โ€‹โ€‹of the components of pairwise differences gives:

  ABCDE A 0 1 1 12 11 B 0 2 13 12 C 0 13 12 D 0 1 E 0 

which does not need a specially configured threshold for work.

+2
source share

You can use pdist () , this function gives you pairwise distances.

Various distance metrics (in contrast to similarities) have already been implemented, Euclidean seems appropriate for your situation, although you can try to experience the influence of different indicators.

+1
source share

All Articles