Computational degree of similarity between a group of sets

Suppose there are 4 sets:

s1 = {1,2,3,4};
c2 = {2,3,4};
s3 = {2,3,4,5};
s4 = {1,3,4,5};

Is there any standard metric to represent the degree of similarity of this group of 4 sets?

Thank you for suggesting the Jaccard method. However, it seems in pairs. How can I calculate the degree of similarity of the whole group of sets?

+4
source share
5 answers

Pairwise, you can calculate the jaccard distance of two sets. It’s just the distance between two sets if they were Boolean vectors in space, where {1, 2, 3 ...} are all unit vectors.

+8
source

Your question is not very specific. But I suppose you mean something like an “editing distance” between them? That is, how much do you need to change s1 to go to s2?

Check out Wikipedia's article on Change distance .

+2
source

As Tobu said, I would use the Jaccard Index , which is just an intersection separated by a union of sets.

+2
source

you can calculate the size of the intersection between each set

0
source

You can calculate the Euclidean distance between them and build a dendrogram to visualize the similarities.

0
source

All Articles