Unfortunately, you cannot do machine learning without knowing any basic mathematics — it's like asking someone to help with programming, but you don’t want to know about “variables”, “subroutines” and all of these cases.
The best way to do this is called Bayesian integration, but there is a simpler approximation called "maximum postieri" (MAP). This is pretty much normal thinking, except that you can use the previous distribution.
Bizarre words, but you may ask, where did the formula h / (h + t) come from? Of course, this is obvious, but it turns out that this is the answer that you get when you have no. And the method below is the next level of difficulty when you add the previous one. The transition to Bayesian integration would be as follows, but more complex and possibly unnecessary.
As I understand it, the problem is twofold: first you draw a coin from a bag of coins. This coin has a “dizziness” called theta, so that it gives the head theta fraction flip. But theta for this coin comes from the main distribution, which, I believe, I consider to be Gaussian with an average value of P and a standard deviation of S.
Next, you must record the total unnormalized probability (called likelihood) to view all shebang, all data: (h head, t tails)
L = (theta) ^ h * (1-theta) ^ t * Gaussian (theta; P, S).
Gaussian (theta; P, S) = exp (- (theta-P) ^ 2 / (2 * S ^ 2)) / sqrt (2 * Pi * S ^ 2)
This is the value of "first drawing 1 theta value from gaussian," and then draw h heads and t tails from the coin using this theta.
In principle, MAP, if you don't know theta, find a value that maximizes L, given the data you know. You do this with calculus. The trick to make this easier is to take the logarithms first. Define LL = log (L). Where L is maximum, then LL will also be.
so LL = hlog (theta) + tlog (1-theta) + - (theta-P) ^ 2 / (2 * S ^ 2)) - 1/2 * log (2 * pi * S ^ 2)
By calculus for searching for extrema, you will find theta value such that dLL / dtheta = 0. Since the last member with the log does not have a theta in it, you can ignore it.
dLL / dtheta = 0 = (h / theta) + (P-theta) / S ^ 2 - (t / (1-theta)) = 0.
If you can solve this equation for theta, you will get the answer, a MAP estimate for theta given by the number of chapters h and the number of tails t.
If you want to quickly approximate, try one step of the Newton method, where you start with the proposed theta with an obvious (called maximum likelihood) estimate of theta = h / (h + t).
And where does this "obvious" estimate come from? If you do something higher, but do not set the Gaussian earlier: h / theta-t / (1-theta) = 0, you will come with theta = h / (h + t).
If your previous probabilities are really small, as often happens, instead of almost 0.5, then the Gaussian earlier on the aunt probably does not fit, because he predicts some weight with negative probabilities, clearly wrong. More appropriate is Gaussian earlier on log theta ("lognormal distribution"). Connect it in the same way and do the calculus.