Effectively determine the likelihood of a user clicking a hyperlink

Question

Effectively determine the likelihood of a user clicking a hyperlink

So, I have a bunch of hyperlinks on a web page. From a past observation, I know the probabilities that the user clicks on each of these hyperlinks. Therefore, I can calculate the mean and standard deviation of these probabilities.

Now I am adding a new hyperlink to this page. After a few tests, I find that of the 20 users who see this hyperlink, 5 click on it.

Taking into account the known average and standard deviation of the probability of passing through to other hyperlinks (this forms a “preliminary wait”), how can I effectively assess the probability of a user clicking a new hyperlink?

A naive decision would be to ignore other probabilities, in which case my score is only 5/20 or 0.25. However, this means that we throw away relevant information, namely our preliminary expectation of what is the probability of a click-through.

So I'm looking for a function that looks something like this:

double estimate(double priorMean, double priorStandardDeviation, int clicks, int views);

I would ask that, since I am more familiar with code than with mathematical notation, that any answers use code or pseudo-code, preferring math.

0

math probability

sanity Jul 15 '09 at 18:44

source share

4 answers

P / N is really correct in terms of time zone.

You can also use the Bayesian approach to incorporate prior knowledge, but since you don't seem to have that kind of knowledge, I think P / N is the way to go.

If you want, you can also use the Laplace rule, which iirc comes to form. Just give each link on the page a start 1 instead of 0. (So, if you count the number the link was clicked on, give everyone a +1 bonus and look like what's in your N.)

[UPDATE] Here is a Bayesian approach:

Let p (W) be the probability that a person is in a particular group W. Let p (L) be the probability that a particular link is clicked. then the probability you are looking for is p (L | W). By Bayes theorem, you can calculate this on

p (L | W) = p (W | L) * p (L) / p (W)

You can estimate p (L) by L by pressing the p (W) button by the size of this group in relation to other users, and p (W | L) = p (W and L) / p (L) by the number of persons defined W groups that pressed L divided by the probability that L clicks.

+2

bayer Jul 15 '09 at 19:00

source share

Bayes Theorem Proof:

 P(A,B) = P( A | B ) * P( B ) (1)

So,

 P(A,B) = P(B,A) (2)

And substituting (2) in (1),

 P(A | B) * P( B ) = P (B | A) * P(A)

in this way (Bayes theorem)

  P( B | A ) * P(A) P(A | B) = ----------------- P(B) P(A) -- prior/marginal probability of A, may or may not take into account B P(A|B) -- conditional/posterior probability of A, given B. P(B|A) -- conditional probability of B given A. P(B) -- prior/marginal probability of B

Effects,

 P( A | B ) = P( A ), then a and b are independent P( B | A ) = P( B ), and then

and there is a definition of independence,

 P(A,B) = P(A | B) * P( B ) = P( A )* P( B )

It should be noted that it is easy to manipulate probability to your liking, changing the suburbs and how the problem is thought, take a look at this discussion of the Anthropic Principle and Bayes Theorem .

0

nlucaroni Jul 15 '09 at 19:15

source share

You need to know how much X correlates with W.

Most likely, you will also want to have a more complex mathematical model if you want to create a large website. If you run a site, for example digg, you have a lot of preliminary knowledge that you should consider in your calculations. This leads to multidimensional statistics.

0

Christian Jul 15 '09 at 20:08

source share

bayer · Accepted Answer · 2009-07-25T08:39:27+0000

I made this new answer as it is fundamentally different.

This is based on Chris Bishop, “Machine Science” and “Pattern Recognition,” chapter 2 “Probability Distribution” p71 ++ and http://en.wikipedia.org/wiki/Beta_distribution .

First, we approach the beta distribution to a given average value and variance in order to construct a distribution over the parameters. Then we return the distribution mode, which is the expected parameter for the bernoulli variable.

 def estimate(prior_mean, prior_variance, clicks, views): c = ((prior_mean * (1 - prior_mean)) / prior_variance - 1) a = prior_mean * c b = (1 - prior_mean) * c return ((a + clicks) - 1) / (a + b + views - 2)

However, I am quite sure that the previous average / variance will not work for you, since you throw out information about how many samples you have and how good your previous one is.

Instead: given the set of (web pages, link_clicked) pairs, you can calculate the number of pages that a specific link was clicked on. Let it be m. Let the number of times this link has not been clicked be l.

Now let a be the number of clicks on your new link, and the number of visits to the site will be b. Then your likelihood of your new link

 def estimate(m, l, a, b): (m + a) / (m + l + a + b)

Which looks pretty trivial, but actually has a valid probabilistic foundation. In terms of implementation, you can store m and l all over the world.

Effectively determine the likelihood of a user clicking a hyperlink

More articles: