Why is the cosine similarity between the two vectors negative?

Question

Why is the cosine similarity between the two vectors negative?

I have 2 vectors with 11 sizes.

a <- c(-0.012813841, -0.024518383, -0.002765056, 0.079496744, 0.063928973, 0.476156960, 0.122111977, 0.322930189, 0.400701256, 0.454048860, 0.525526219) b <- c(0.64175768, 0.54625694, 0.40728261, 0.24819750, 0.09406221, 0.16681692, -0.04211932, -0.07130129, -0.08182200, -0.08266852, -0.07215885) cosine_sim <- cosine(a,b)

which returns:

 -0.05397935

I used cosine() from the lsa package.

for some values, I get a negative cosine_sim, as given. I'm not sure how similarities can be negative. It must be between 0 and 1.

Can anyone explain what is happening here.

+7

r negative-number similarity cosine

Robin Jul 6 '11 at 13:16

source share

4 answers

Ben bolker · Answer 1 · 2011-07-06T13:36:35+0000

The good thing about R is that you can often delve into the functions and see for yourself what is happening. If you type cosine (without parentheses, arguments, etc.), then R returns the body of the function. By showing this (which requires some practice), you can see that there are many mechanisms for calculating paired similarities of matrix columns (i.e., a Bit wrapped in the condition if (is.matrix(x) && is.null(y)) , but the key line is the function

 crossprod(x, y)/sqrt(crossprod(x) * crossprod(y))

Pull this out and apply to your example:

 > crossprod(a,b)/sqrt(crossprod(a)*crossprod(b)) [,1] [1,] -0.05397935 > crossprod(a) [,1] [1,] 1 > crossprod(b) [,1] [1,] 1

So, you are using vectors that are already normalized, so you just have crossprod to view. In your case, this is equivalent

 > sum(a*b) [1] -0.05397935

(for real matrix operations, crossprod much more efficient than creating an equivalent operation manually).

As @Jack Maney says, the dot product of two vectors (length (a) * length (b) * cos (a, b)) can be negative ...

For what it's worth, I suspect that the cosine function in lsa may be more easily / efficiently implemented for matrix arguments like as.dist(crossprod(x)) ...

edit : in the comments on the now deleted answer below, I suggested that the square of the cosine distance measure may be appropriate if you want the similarity measure to [0,1] - this would be similar to using the determination coefficient (r ^ 2), rather than correlation coefficient (r), but also, perhaps, it is worth returning and thinking more carefully about the purpose / significance of the similarity methods used ...

Richie cotton · Answer 2 · 2011-07-06T13:46:43+0000

cosine function returns

 crossprod(a, b)/sqrt(crossprod(a) * crossprod(b))

In this case, both terms in the denominator are 1, but crossprod(a, b) is -0.05.

user554546 · Answer 3 · 2011-07-06T13:31:48+0000

The cosine function can take negative values.

Surjan · Answer 4 · 2017-01-13T13:27:45+0000

While the cosine of two vectors can take any value from -1 to +1, the coefficient of similarity of the cosine (during repeated playback) is used to accept values from the interval [0,1]. The reason is simple: there are no negative values in the WordxDocument matrix, so the maximum angle of two vectors is 90 degrees, for which the cosine is 0.

Why is the cosine similarity between the two vectors negative?

More articles: