Clustering a sparse binary vector dataset

Question

Clustering a sparse binary vector dataset

If I have a sparse data set where each data is described by a vector of 1000 elements, each element of this vector can be either 0 or 1 (many 0 and some 1), do you know any distance function that could help me group them? Is something like Euclidean distance convenient in this case? I would like to know if there is a simple convenient distance metric for such a situation in order to try my data.

thank

+5

sparse-matrix cluster-analysis distance

shn Dec 20 '11 at 8:40

source share

4 answers

There is no answer in your question. Depending on the domain, there are best practices.

Once you decide on a similarity metric, clustering is usually done by averaging or searching for medoids. See these binary data clustering docs for sample algorithms:

Carlos Ordonez. Clustering binary data streams using K-tools. Pdf
Tao Li. General binary data clustering model. Pdf

. - ". : -, , -, , , , -, -, , , , , 2, , -, -. :

, . .,
, ., - .
Toit, du S.H.C.; Steyn, A.G.W.; Stumpf, R.H.; ; 3, . 77, 1986; Springer-Verlag.

( . KL- .)

+10

cyborg 20 . '11 10:39

If there are actually many 0 and several 1, you can try clustering for the first or last 1 - see http://aggregate.org/MAGIC/#Least Significant 1 bit

0

Eugen ieck Dec 20 '11 at 8:45

source share

A distance / similarity function for binary vectors is proposed.

In a review of binary similarities and distance measurements - Choi, Cha, Tappert 2010 , the authors list 76 such functions.

0

Lior kogan Jul 2 '16 at 8:29

source share

Anony-mousse · Accepted Answer · 2011-12-21T08:10:13+0000

Take a look at the distance functions used for sparse text vectors such as cosine distance and to compare sets such as Jaccard distance.

Clustering a sparse binary vector dataset

More articles: