Categorical features in the distance matrix

I calculate the cosine similarity between the two feature vectors and wonder if anyone can have a clear solution below the problem around categorical functions.

I currently have (example):

# define the similarity function
cosineSim <- function(x){
  as.matrix(x%*%t(x)/(sqrt(rowSums(x^2) %*% t(rowSums(x^2))))) 
}

# define some feature vectors
A <- c(1,1,0,0.5)
B <- c(1,1,0,0.5)
C <- c(1,1,0,1.2)
D <- c(1,0,0,0.7)

dataTest <- data.frame(A,B,C,D)
dataTest <- data.frame(t(dataTest))
dataMatrix <- as.matrix(dataTest)

# get similarity matrix
cosineSim(dataMatrix)

which works great.

But I will say that I want to add a categorical variable, such as a city, to create a function that is 1 when two cities are equal, and 0 others.

In this case, exemplary feature vectors are:

A <- c(1,1,0,0.5,"Dublin")
B <- c(1,1,0,0.5,"London")
C <- c(1,1,0,1.2,"Dublin")
D <- c(1,0,0,0.7,"New York")

I am wondering if there is a neat way to generate pairwise equality of the last function on the fly inside the function so that it performs a vector implementation?

I tried preprocessing to create binary flags for each category, so that the above example becomes something like:

A <- c(1,1,0,0.5,1,0,0)
B <- c(1,1,0,0.5,0,1,0)
C <- c(1,1,0,1.2,1,0,0)
D <- c(1,0,0,0.7,0,0,1)

, , , , , . /, , , , 1 0 ( , , , ). ​​

, , - ( ​​, [is_same_city] = 1/0 1, , 0 ), ​​- , .

, R- , , ...

,

+4

All Articles