Why is the cosine similarity between the two vectors negative?

I have 2 vectors with 11 sizes.

a <- c(-0.012813841, -0.024518383, -0.002765056, 0.079496744, 0.063928973, 0.476156960, 0.122111977, 0.322930189, 0.400701256, 0.454048860, 0.525526219) b <- c(0.64175768, 0.54625694, 0.40728261, 0.24819750, 0.09406221, 0.16681692, -0.04211932, -0.07130129, -0.08182200, -0.08266852, -0.07215885) cosine_sim <- cosine(a,b) 

which returns:

 -0.05397935 

I used cosine() from the lsa package.

for some values, I get a negative cosine_sim, as given. I'm not sure how similarities can be negative. It must be between 0 and 1.

Can anyone explain what is happening here.

+7
source share
4 answers

The good thing about R is that you can often delve into the functions and see for yourself what is happening. If you type cosine (without parentheses, arguments, etc.), then R returns the body of the function. By showing this (which requires some practice), you can see that there are many mechanisms for calculating paired similarities of matrix columns (i.e., a Bit wrapped in the condition if (is.matrix(x) && is.null(y)) , but the key line is the function

 crossprod(x, y)/sqrt(crossprod(x) * crossprod(y)) 

Pull this out and apply to your example:

 > crossprod(a,b)/sqrt(crossprod(a)*crossprod(b)) [,1] [1,] -0.05397935 > crossprod(a) [,1] [1,] 1 > crossprod(b) [,1] [1,] 1 

So, you are using vectors that are already normalized, so you just have crossprod to view. In your case, this is equivalent

 > sum(a*b) [1] -0.05397935 

(for real matrix operations, crossprod much more efficient than creating an equivalent operation manually).

As @Jack Maney says, the dot product of two vectors (length (a) * length (b) * cos (a, b)) can be negative ...

For what it's worth, I suspect that the cosine function in lsa may be more easily / efficiently implemented for matrix arguments like as.dist(crossprod(x)) ...

edit : in the comments on the now deleted answer below, I suggested that the square of the cosine distance measure may be appropriate if you want the similarity measure to [0,1] - this would be similar to using the determination coefficient (r ^ 2), rather than correlation coefficient (r), but also, perhaps, it is worth returning and thinking more carefully about the purpose / significance of the similarity methods used ...

+14
source

cosine function returns

 crossprod(a, b)/sqrt(crossprod(a) * crossprod(b)) 

In this case, both terms in the denominator are 1, but crossprod(a, b) is -0.05.

+2
source

The cosine function can take negative values.

+1
source

While the cosine of two vectors can take any value from -1 to +1, the coefficient of similarity of the cosine (during repeated playback) is used to accept values โ€‹โ€‹from the interval [0,1]. The reason is simple: there are no negative values โ€‹โ€‹in the WordxDocument matrix, so the maximum angle of two vectors is 90 degrees, for which the cosine is 0.

0
source

All Articles