Probability assessment, taking into account other probabilities from the previous one

Question

Probability assessment, taking into account other probabilities from the previous one

I have a bunch of data coming in (calls to an automated callcenter) about whether a person is buying a particular product, 1 for purchase, 0 for purchase.

I want to use this data to create the probable probability that a person will buy a certain product, but the problem is that I may need to do this with relatively little historical data on how many people bought / did not buy the product.

A friend recommended that, with Bayesian probability, you could “help” your probability estimate by coming up with a “preliminary probability distribution”, essentially this is information about what you expect to see before taking into account the evidence.

So, I would like to create a method that has something like this signature (Java):

double estimateProbability(double[] priorProbabilities, int buyCount, int noBuyCount);

beforeProbabilities is an array of probabilities that I saw for previous products that this method will use to create the previous distribution for this probability. buyCount and noBuyCount are actual data specific to this product, from which I want to assess the likelihood of a user buying, taking into account the data and previous ones. This returns from the method as double.

I don't need a mathematically perfect solution, just something that will be better than normal or flat before (i.e. probability = buyCount / (buyCount + noBuyCount)). Since I am much more familiar with the source code than with mathematical notation, I would appreciate it if people could use the code in their explanation.

+5

java statistics probability

sanity 09 . '09 0:54

4

- - buyCount noBuyCount , , , . , , .

:

def estimateProbability(priorProbs, buyCount, noBuyCount, faithInPrior=None):
    if faithInPrior is None: faithInPrior = [10 for x in buyCount]
    adjustedBuyCount = [b + p*f for b,p,f in 
                                zip(buyCount, priorProbs, faithInPrior]
    adjustedNoBuyCount = [n + (1-p)*f for n,p,f in 
                                zip(noBuyCount, priorProbs, faithInPrior]
    return [b/(b+n) for b,n in zip(adjustedBuyCount, adjustedNoBuyCount]

+2

Jouni K. Seppänen 09 . '09 7:05

, Association Rule Learning. , , WEKA, Java. , .

0

n3rd 09 . '09 1:06

As I see it, the best you can do is use uniform distribution if you don't have any hint regarding the distribution. Or are you talking about creating a relationship between these products and products previously bought by the same person in Amazon Fashion, "the people who buy this product also buy ..."

0

tekBlues Jun 09 '09 at 1:13

source share

Alex Martelli · Accepted Answer · 2009-06-09T02:28:03+0000

/:

def estimateProbability(priorProbs, buyCount, noBuyCount):
  # first, estimate the prob that the actual buy/nobuy counts would be observed
  # given each of the priors (times a constant that the same in each case and
  # not worth the effort of computing;-)`
  condProbs = [p**buyCount * (1.0-p)**noBuyCount for p in priorProbs]
  # the normalization factor for the above-mentioned neglected constant
  # can most easily be computed just once
  normalize = 1.0 / sum(condProbs)
  # so here the probability for each of the prior (starting from a uniform
  # metaprior)
  priorMeta = [normalize * cp for cp in condProbs]
  # so the result is the sum of prior probs weighed by prior metaprobs
  return sum(pm * pp for pm, pp in zip(priorMeta, priorProbs))

def example(numProspects=4):
  # the a priori prob of buying was either 0.3 or 0.7, how does it change
  # depending on how 4 prospects bought or didn't?
  for bought in range(0, numProspects+1):
    result = estimateProbability([0.3, 0.7], bought, numProspects-bought)
    print 'b=%d, p=%.2f' % (bought, result)

example()

:

b=0, p=0.31
b=1, p=0.36
b=2, p=0.50
b=3, p=0.64
b=4, p=0.69

. , ; , , , , "", (p = 0.0), , - (p = 1.0) - , , , . , :

b=0, p=0.06
b=1, p=0.36
b=2, p=0.50
b=3, p=0.64
b=4, p=0.94

( , , , , - , , ), ( 0,0 1.0 , beforeWeights estimateProbability).

, , , Business Intelligence, ...! -)

Probability assessment, taking into account other probabilities from the previous one

More articles: