Python: calculating the probability that a point will fit a curve

You have a situation where they give me a total ticket score and cumulative ticket sales data as follows:

Total Tickets Available: 300 Day 1: 15 tickets sold to date Day 2: 20 tickets sold to date Day 3: 25 tickets sold to date Day 4: 30 tickets sold to date Day 5: 46 tickets sold to date 

The number of tickets sold is non-linear, and they ask me if anyone plans to buy a ticket on the 23rd day, what is the likelihood that he will receive a ticket?

I was looking through pretty libraries used for curve fitting like numpy, PyLab and sage, but I was a little overwhelmed as the statistics are not in my background. How would I easily calculate probability based on this dataset? If this helps, I also have data on the sale of tickets in other places, the curve should be slightly different.

+4
source share
1 answer

A better answer to this question will require more information about the problem - are people more or less likely to buy a ticket as the date approaches (and mow)? Are there promotional activities that will temporarily affect sales? Etc.

We do not have access to this information, so let me assume, to a first approximation, that the speed of ticket sales is constant. Since sales are mostly random, they can best be modeled as a Poisson process . Please note that this does not explain the fact that many people will buy more than one ticket, but I do not think that this will be of great importance for the results; perhaps real statistics could call back here. Also: I'm going to discuss the Poisson process with a constant speed, but note that since you mentioned that the speed is clearly not constant, you can consider Poisson processes with a variable speed as the next step.

To simulate the Poisson process, all you need is the average speed of ticket sales. In your example, the data is sales per day [15, 5, 5, 5, 16], so the average rate is about 9.2 tickets per day. We have already sold 46 tickets, so there are 254 left.

From here it’s easy to ask: “Given the 9.2 trillion bid, what is the probability of selling less than 254 tickets in 23 days?” (ignore the fact that you cannot sell more than 300 tickets). A way to compute this with a cumulative distribution function (see here for CDF for poisson distribution).

On average, we expect to sell 23 * 9.2 = 211.6 tickets in 23 days, so in the language of probability distribution the expected value is 211.6. CDF tells us, "given the expected value of λ, what is the probability of seeing the value <= x". You can do the math yourself or ask scipy to do it for you:

 >>> import scipy.stats >>> scipy.stats.poisson(9.2 * 23).cdf(254-1) 0.99747286634158705 

Thus, this tells us: IF ticket sales can be accurately represented as a Poisson process, and IF the average level of ticket sales is actually 9.2 trillion, then the probability of at least one ticket is available in 23 days - 99.7 %

Now let me say that someone wants to attract a group of 50 friends and wants to know the probability of receiving all 50 tickets if they buy them in 25 days (rephrase the question as "If we expect to sell 9.2 * 25 tickets on average, what is the probability selling <= (254-50) tickets? "):

 >>> scipy.stats.poisson(9.2 * 25).cdf(254-50) 0.044301801145630537 

Thus, the probability of having 50 tickets in 25 days is about 4%.

+2
source

All Articles