I use TF / IDF to calculate the similarities. For example, if I have the following two documents.
Doc A => cat dog Doc B => dog sparrow
It is normal that the similarity will be 50%, but when I calculate its TF / IDF. It is as it should
Tf Values ββfor Doc A
dog tf = 0.5 cat tf = 0.5
Tf Values ββfor Doc B
dog tf = 0.5 sparrow tf = 0.5
IDF Values ββfor Doc A
dog idf = -0.4055 cat idf = 0
IDF Values ββfor Doc B
dog idf = -0.4055 ( without +1 formula 0.6931) sparrow idf = 0
TF / IDF Value for Doc A
0.5x-0.4055 + 0.5x0 = -0.20275
TF / IDF Values ββfor Doc B
0.5x-0.4055 + 0.5x0 = -0.20275
Now it looks like there is -0.20275 similarity. It? Or am I missing something? Or is this some kind of next step? Please tell me so that I can calculate this too.
I used the tf / idf formula that Wikipedia mentioned
source share