There is a conclusion:
http://www.deepfriedbrainproject.com/2010/07/magical-formula-of-pert.html
If the link does not work, I will provide a brief description here.
So, taking a step back from the question for a moment, the goal here is to come up with a single average (average) figure, which can be said - the expected indicator for any given 3-point assessment. That is, if I tried to execute the project X times and summed up all the costs of the project’s attempts for a total of $ Y, then I expect that the cost of one attempt will be $ Y / X. Please note that this number may or may not be the same as the result (most likely) depending on the probability distribution.
The expected result is useful because we can do things like adding a whole list of expected results to create the expected result for the project, even if we calculated each individual expected result differently.
The mode, on the other hand, is not even necessarily unique for each assessment, so one of the reasons that it may be less useful than the expected result. For example, each number from 1-6 is the “most likely” for a die roll, but 3.5 is the expected average result (only).
The rationale / study of the three-point assessment is that in many (most?) Real-world scenarios, these numbers can be more accurately / intuitively estimated by people than one expected value:
- Pessimistic Result (P)
- Optimistic result (O)
- Most likely result (M)
However, in order to convert these three numbers into the expected value, we need a probability distribution that interpolates all other (potentially infinite) possible results beyond the limits that we created.
The fact that we even perform a 3-point assessment suggests that we do not have enough historical data to simply search / calculate the expected value for what we are going to do, so we probably don’t know what the actual probability distribution is for what we value,.
The idea behind the PERT estimate is that if we don’t know the actual curve, we can connect some normal default values to the beta distribution (which basically is just a curve that we can configure in many different forms) and use these values by default for every problem we may run into. Of course, if we know the real distribution or have reason to believe that the standard beta distribution prescribed by PERT is wrong for this problem, we should NOT use the PERT equations for our project .. p>
The beta distribution has two parameters A and B , which specify the shape of the left and right sides of the curve, respectively. Conveniently, we can calculate the mode, average and standard deviation of the beta distribution, simply knowing the minimum / maximum values of the curve, as well as A and B
PERT sets A and B as follows for each project / evaluation:
If M > (O + P) / 2 then A = 3 + √2 and B = 3 - √2 , otherwise the values of A and B are interchanged.
Now it just so happens that if you make this specific assumption about the shape of your beta distribution, then the following formulas are true:
Average (expected value) = (O + 4M + P) / 6
Standard Deviation = (O - P) / 6
So, in the summary
- PERT formulas are not based on normal distribution, they are based on beta distribution with a very specific form
- If the project probability distribution corresponds to the PERT Beta distribution, then the PERT formula is exactly correct, they are not approximations
- It is highly unlikely that the particular curve chosen for PERT matches any given arbitrary design, so the PERT formulas will be approximate in practice
- If you don't know anything about the probability distribution of your assessment, you can also use PERT because it is documented, understood by many people, and relatively easy to use.
- If you know something about the probability distribution of your assessment, which says that something about PERT is inappropriate (for example, 4x weighting in relation to the mode), then do not use it, use what you consider necessary instead.
- The reason you multiply by 4 to get the Average (rather than 5, 6, 7, etc.) is because the number 4 is related to the shape of the underlying probability curve
- Of course, PERT could be based on a beta distribution that gives 5, 6, 7 or any other number when calculating the average or even normal distribution or even distribution or almost any other probability curve, but I would suggest that the question is why they chose the curve that they made is beyond the scope of this answer and maybe completely open / subjective anyway