I want to calculate the cosine similarity of two lists, for example:
A = [u'home (private)', u'bank', u'bank', u'building(condo/apartment)','factory']
B = [u'home (private)', u'school', u'bank', u'shopping mall']
I know that the similarity of cosines A and B should be
3/(sqrt(7)*sqrt(4)).
I am trying to convert lists to forms like “building a house bank of a factory house”, which looks like a sentence, however some elements (like home (private)) have white space by themselves, and some elements have brackets, so it’s hard for me to calculate the occurrence of the word.
Do you know how to calculate the occurrence of a word in this complex list, so for list B, the occurrence of words can be represented as
{'home (private):1, 'school':1, 'bank': 1, 'shopping mall':1}?
Or do you know how to calculate the cosine similarity of these two lists?
Many thanks