Accurate real-time update to predict distribution based on current statistics

As part of my current project, I need to find a beautiful Excel solution for the following: there is an online game in which people gain experience (similar to SO rep) and go to the next level as soon as enough experience is achieved (again, this is very similar to SO privileges ) Of course, this takes a different time for everyone: some get enough experience during the 1st week, others need 5 weeks to complete the level. The statistics are as follows:

Week from join: Week 1 Week 2 Week 3 Week 4 Week 5 Level up: 5% 15% 40% 30% 10% 

That is, if this week we joined the game, after 5 weeks they will all go to the next level (some will refuse and leave, but for simplicity I do not include this factor in the list of conditions).

As for every game, the number of current players matters. For a game to be popular and exciting, there must be a certain minimum number of players at each level. Obviously, this value is constantly changing: new people appear, while others move to the next level. However - and where the action begins! - this is the minimum value of current players is NOT constant and can be afraid depending on many factors. For simplicity, this means that for the next 10-15 weeks there is a set of predicted minimum amounts of players for each week that must be achieved (this is based on collected statistics, but for this task it is a known set).

So far so good, the picture looks like this:

Sample 1

In the far left column, the number of people joined the game every week , the row of bold numbers in each week is the number of people who have passed during this week to the next level .

Having the number of newbies, calculating the final totals for each week is trivial. The opposite is not so easy (well, at least for me) - and this is the question: how to predict the required number of newcomers for each week (leftmost column) - to get the desired number of people who will move to the next level in the next weeks ( bold row below)?

I still can’t find a suitable and accurate solution within 2 weeks. I tried all possible options - from complex and massive formulas to Solver addin, but with no luck: I always have more unknowns than equations. Nevertheless, I am sure that this is not an easy task: for some solutions, Solver can provide numbers that are close to the desired ones, but not for each set of desired values.

Some assumptions and (possibly) important conditions:

  • Although this should not be a problem, the actual values ​​of the weekly distribution %%% are updated upon request by the database request and further processing by Excel.
  • Actual values ​​of beginners, still available from the database, of course, are considered "strong" known values ​​for the equations. Obviously, they cannot be changed, and, for example, if in a week less than desired number of people joined the game, this should affect the estimated forecasts for the coming weeks to achieve a common goal.
  • The total amount of players actually moving to the next level should remain unchanged for a certain period of time: for example. if we want at least 100 people to join the game during each winter week, starting on January 1, which gives us 8 full weeks and, consequently, 800 gamers. However, if in January we actually got 80 new players every week - this is hard work in February). To be sure, the “milestone” subtotals are calculated every 10 weeks . In any case, any low values ​​should be compensated for in the coming weeks.
  • And vice versa: if we have more newcomers than expected for this week, we can attract fewer players over the next week to meet the next interim goal.
  • The flow of players is endless - the sample image is provided only for the purpose of clarity and better understanding. Desired values ​​can be calculated as far as possible in the future as necessary - the forecasting pattern is repeated. If it matters to the decision - let it be within 10 weeks from the current one .
  • A pure solution to the formula is preferable - even very difficult. However, VBA is still an acceptable option. In any case, an accurate update of the calculated values ​​in real time / on demand is absolutely better than a simple, but semi-manual or approximate solution.

As for my question here - I am not asking for a ready to play solution, but the right guide or valuable advice would be greatly appreciated. I intentionally do not post this on https://stats.stackexchange.com/ - this is more like a programming question, since the theory is obviously simple ... or maybe too simple, I see a very obvious solution.

Please feel free to request any further details or explanations with the help of comments: although I tried to be as clear and simple as possible, some aspects that are not important to me can really be important for an accurate solution.

Example file (with the above example): https://www.dropbox.com/s/q3bnagyet86i34r/StatPrognosis01.xlsx

+4
source share
1 answer

There is no need for VBA formulas or complex arrays - you will need simple simple math converted to formulas:

The basic idea is that you can determine the number for each week by “reversing” the calculation. The logic is as follows: (Note: cell links are based on a screenshot, i.e. yellow 5% will be B2):

  • If 5% of the weekly "joiners" go to the next level, and for 1 week (in total) there must be 10 people to do this, 10/5% = 200 people should join the first week.
    As a formula, this means: B5: =B12 , A5: =B5/B2
  • From there you can calculate how the joiners from the first week are distributed as you already do (i.e. no changes in the rest of line 5)
  • In the second week, you want 35 people to become the next level. From the first step, you know that from the 1st century you have 30 people who have reached this level. Thus, 5 people should come from 2 week joiners. Since they again make up 5% of this wave, the total wave should be 5/5% = 100
    In the formulas: C6: =C12-C5 , A6: =C6/B2
  • ...

Now, instead of creating an individual formula for each week, you can generalize it using some INDEX functions. To do this, I expanded your example with the week numbers of both the header and row columns:

enter image description here

To calculate the amount per "carpentry wave per week", that is, the range C5: O17, use the following formula:

  = IF ($ A5> C $ 4 + N ("Nobody from this wave joined yet:"), 0,
     IF (C $ 4 = $ A5 + N ("The first week for the new wave:"),
         IF ($ A5 = 1 + N ("Very first wave must be expected number in this week:"), C18,
         N ("All other weeks, this must be expected number minus 
           'levelups' from former weeks: ") + (C $ 18-SUM (C4: C $ 5))
     ),
     N ("For any later week after the joining, apply weekly uplevel ratio
        to total number of joiners: ") + INDEX ($ C $ 2: $ O $ 2, C $ 4- $ A5 + 1) * $ B5)
 )

and for the desired number of joiners per week, that is, the range B5: B17:

  = INDEX ($ C5: $ O5, $ A5) / $ C $ 2

File updated here

+1
source

All Articles