I read about the Weighted Slope one algorithm (and more formally here (PDF) ), which should take position ratings from different users and, given a user vector containing at least 1 rating and 1 missing value, predict the missing ratings.
I found Python implementation of the algorithm , but it's hard for me to port it to R (which is more convenient for me). Below is my attempt. Any suggestions on how to make it work?
Thanks in advance guys.
# take a 'training' set, tr.set and a vector with some missing ratings, d pred=function(tr.set,d) { tr.set=rbind(tr.set,d) n.items=ncol(tr.set) # tally frequencies to use as weights freqs=sapply(1:n.items, function(i) { unlist(lapply(1:n.items, function(j) { sum(!(i==j)&!is.na(tr.set[,i])&!is.na(tr.set[,j])) })) }) # estimate product-by-product mean differences in ratings diffs=array(NA, dim=c(n.items,n.items)) diffs=sapply(1:n.items, function(i) { unlist(lapply(1:n.items, function(j) { diffs[j,i]=mean(tr.set[,i]-tr.set[,j],na.rm=T) })) }) # create an output vector with NAs for all the items the user has already rated pred.out=as.numeric(is.na(d)) pred.out[!is.na(d)]=NA a=which(!is.na(pred.out)) b=which(is.na(pred.out)) # calculated the weighted slope one estimate pred.out[a]=sapply(a, function(i) { sum(unlist(lapply(b,function (j) { sum((d[j]+diffs[j,i])*freqs[j,i])/rowSums(freqs)[i] }))) }) names(pred.out)=colnames(tr.set) return(pred.out) } # end function # test, using example from [3] alice=c(squid=1.0, octopus=0.2, cuttlefish=0.5, nautilus=NA) bob=c(squid=1.0, octopus=0.5, cuttlefish=NA, nautilus=0.2) carole=c(squid=0.2, octopus=1.0, cuttlefish=0.4, nautilus=0.4) dave=c(squid=NA, octopus=0.4, cuttlefish=0.9, nautilus=0.5) tr.set2=rbind(alice,bob,carole,dave) lucy2=c(squid=0.4, octopus=NA, cuttlefish=NA, nautilus=NA) pred(tr.set2,lucy2) # not correct # correct(?): {'nautilus': 0.10, 'octopus': 0.23, 'cuttlefish': 0.25}
python r recommendation-engine prediction
Nj torrance
source share