Interpolate Product Attributes

Question

Interpolate Product Attributes

I have a data set from a discrete choice task set that includes two alternatives with three attributes (brand, price, performance). From these data, I took 1,000 draws from the rear distribution, which I will then use to calculate utility and, ultimately, the share of preferences for each person and each draw.

Price and performance were tested at discrete levels (-.2, 0, .2) and (-.25, 0, .25), respectively. I need to be able to interpolate utility between tested attribute levels. Now suppose that linear interpolation is a reasonable statistical problem. In other words, what is the most efficient way to interpolate the price utility if I want to test a scenario with a price 10% lower? I could not come up with a smooth or efficient way to do interpolation. I applied the mapply () approach with the mdply function from plyr

Here are some data and my current approach:

library(plyr) #draws from posterior, 2 respondents, 2 draws each draw <- list(structure(c(-2.403, -2.295, 3.198, 1.378, 0.159, 1.531, 1.567, -1.716, -4.244, 0.819, -1.121, -0.622, 1.519, 1.731, -1.779, 2.84), .Dim = c(2L, 8L), .Dimnames = list(NULL, c("brand_1", "brand_2", "price_1", "price_2", "price_3", "perf_1", "perf_2", "perf_3"))), structure(c(-4.794, -2.147, -1.912, 0.241, 0.084, 0.31, 0.093, -0.249, 0.054, -0.042, 0.248, -0.737, -1.775, 1.803, 0.73, -0.505), .Dim = c(2L, 8L), .Dimnames = list(NULL, c("brand_1", "brand_2", "price_1", "price_2", "price_3", "perf_1", "perf_2", "perf_3")))) #define attributes for each brand: brand constant, price, performance b1 <- c(1, .15, .25) b2 <- c(2, .1, .2) #Create data.frame out of attribute lists. Wil use mdply to go through each interpolateCombos <- data.frame(xout = c(b1,b2), atts = rep(c("Brand", "Price", "Performance"), 2), i = rep(1:2, each = 3), stringsAsFactors = FALSE) #Find point along line. Tried approx(), but too slow findInt <- function(x1,x2,y1,y2,reqx) { range <- x2 - x1 diff <- reqx - x1 out <- y1 + ((y2 - y1)/range) * diff return(out) } calcInterpolate <- function(xout, atts, i){ if (atts == "Brand") { breaks <- 1:2 cols <- 1:2 } else if (atts == "Price"){ breaks <- c(-.2, 0, .2) cols <- 3:5 } else { breaks <- c(-.25, 0, .25) cols <- 6:8 } utils <- draw[[i]][, cols] if (atts == "Brand" | xout %in% breaks){ #Brand can't be interpolated or if level matches a break out <- data.frame(out = utils[, match(xout, breaks)]) } else{ #Must interpolate mi <- min(which(breaks <= xout)) ma <- max(which(breaks >= xout)) out <- data.frame(out = findInt(breaks[mi], breaks[ma], utils[, mi], utils[,ma], xout)) } out$draw <- 1:nrow(utils) return(out) } out <- mdply(interpolateCombos, calcInterpolate)

To provide the context of what I'm trying to do without interpolating attribute levels, here's how I do it. Note that brands are now defined in terms of column references. p1 and p2 refer to the product definition, u1 and u2 are utility, and s1, s2 are preferred shares for this draw.

Any push in the right direction would be appreciated. My real case has 10 products with 8 attributes each. In 10-kilo draws, my 8 GB ram crawls out, but I can’t get out of this rabbit hole that I dug myself.

 p1 <- c(1,2,1) p2 <- c(2,1,2) FUN <- function(x, p1, p2) { bases <- c(0,2,5) u1 <- rowSums(x[, bases + p1]) u2 <- rowSums(x[, bases + p2]) sumExp <- exp(u1) + exp(u2) s1 <- exp(u1) / sumExp s2 <- exp(u2) / sumExp return(cbind(s1,s2)) } lapply(draw, FUN, p1 = p1, p2 = p2) [[1]] s1 s2 [1,] 0.00107646039 0.9989235 [2,] 0.00009391749 0.9999061 [[2]] s1 s2 [1,] 0.299432858 0.7005671 [2,] 0.004123175 0.9958768

+58

r

Chase Oct 26 '11 at 4:50

source share

1 answer

Cris Stringfellow · Answer 1 · 2012-02-08 17:44

A somewhat unconventional way to get what you want is to build a global ranking of all your products using your 10k draws.

Use each draw as a source of binary competitions between 10 products and summarize the results of these competitions for all the draws.

This will give you the final leaderboard for your 10 products. From this, you have relative utility for all consumers, or you can assign an absolute value based on the number of winnings (and, optionally, the "strength" of the alternative in each competition) for each product.

If you want to test a new product with a different attribute profile, find its sparse (st) representation as a vector sum of (weighted) other product samples, and you can start the contest again with a probability of winning weighted by the contribution of the weight of the component attribute vectors.

The advantage of this is that the simulation of the competition is effective, and the global ranking, combined with the presentation of new products as sparse vector sums of existing data, allows you to think and interpret the results a lot, which is useful when you consider strategies that beat the attributes of the competition product.

To find the sparse (descriptive) representation of your new product (y), solve Ax = y, where A is your matrix of existing products (rows as their attribute vectors), and y is the contribution weight vector from your existing products. You want to minimize nonzero entries in y. Have a look at Donoho DL's article on the fast homotopy method (for example, power iteration) to quickly solve the minimization problem l0-l1 to find sparse representations.

If you have this (or the weighted average of sparse views), you can reasonably reason about the performance of your new product based on the model created by your existing preferences.

The advantage of sparseness as a representation is that it allows you to reason reasonably, plus the more functions or products you have, the better, because most likely the product should rarely be presented by them. This way you can scale to large matrices and get very useful results using a fast algorithm.

Interpolate Product Attributes

More articles: