I calculate the cosine similarity between the two feature vectors and wonder if anyone can have a clear solution below the problem around categorical functions.
I currently have (example):
cosineSim <- function(x){
as.matrix(x%*%t(x)/(sqrt(rowSums(x^2) %*% t(rowSums(x^2)))))
}
A <- c(1,1,0,0.5)
B <- c(1,1,0,0.5)
C <- c(1,1,0,1.2)
D <- c(1,0,0,0.7)
dataTest <- data.frame(A,B,C,D)
dataTest <- data.frame(t(dataTest))
dataMatrix <- as.matrix(dataTest)
cosineSim(dataMatrix)
which works great.
But I will say that I want to add a categorical variable, such as a city, to create a function that is 1 when two cities are equal, and 0 others.
In this case, exemplary feature vectors are:
A <- c(1,1,0,0.5,"Dublin")
B <- c(1,1,0,0.5,"London")
C <- c(1,1,0,1.2,"Dublin")
D <- c(1,0,0,0.7,"New York")
I am wondering if there is a neat way to generate pairwise equality of the last function on the fly inside the function so that it performs a vector implementation?
I tried preprocessing to create binary flags for each category, so that the above example becomes something like:
A <- c(1,1,0,0.5,1,0,0)
B <- c(1,1,0,0.5,0,1,0)
C <- c(1,1,0,1.2,1,0,0)
D <- c(1,0,0,0.7,0,0,1)
, , , , , . /, , , , 1 0 ( , , , ).
, , - ( , [is_same_city] = 1/0 1, , 0 ), - , .
, R- , , ...
,