Calculation of the generalized mean for extreme values โ€‹โ€‹of p

+6
statistics
source share
5 answers

I think the answer here should be to use a recursive solution. In the same way as average (1,2,3,4) = average (average (1,2), average (3,4)), you can do this recursion for generalized tools. What you buy is that you do not need to make so many sums of really big numbers, and you reduce the likelihood of creating an overflow. In addition, another danger when working with floating point numbers is to add numbers of very different values โ€‹โ€‹(or subtract the numbers of very similar values). Therefore, to avoid such rounding errors, this can help sort your data before you try to calculate a generalized value.

0
source share

In your reference , the limit for p equal to 0 is the geometric mean for which there are boundaries .

The limit for p going to infinity is maximal.

0
source share

Here is a hunch:

First convert all your numbers to a representation in the base p. Now, to increase to 1 / p or p, you just need to shift them so that you can easily perform all the actions without losing accuracy.

Develop your average in the p base, and then convert the result back to the base.


If this does not work, an even less practical hunch:

Try to develop a discrete Fourier transform and relate this to a discrete Fourier transform of the input vector.

0
source share

I struggled with the same problem. Here's how I dealt with this: Let gmean_p (x1, ..., xn) be the generalized mean, where p is real, but not 0, but x1, .. xn is non-negative. For M> 0, we have gmean_p (x1, ..., xn) = M * gmean_p (x1 / M, ..., xn / M), of which the last form can be used to reduce the computational error. For large p, I use M = max (x1, ..., xn), and for p close to 0, I use M = mean (x1, .. xn). In the case of M = 0, just add a small positive constant to it. It helps me.

0
source share

I suspect that if you are interested in very large or small p values, it might be best to do some form of algebraic manipulation by a generalized average formula before entering numerical values.

For example, in the small-p limit, it can be shown that the generalized mean tends to the nth root from the product x_1 * x_2 * ... x_n. Higher-order terms in p include sums and products of log (x_i), which should also be relatively numerically stable for computation. In fact, I believe that the first-order expansion in p has a simple connection with the variance log (x_i):

enter image description here

If we apply this formula to a set of 100 random numbers drawn evenly from the range [0.2, 2], we get the following tendency:

Comparison of a simple formula with an asymptotic approximation

which here shows that the asymptotic formula becomes quite accurate for p less than about 0.3, and a simple formula only fails when p is less than about 1e-10.

In the case of large p, the x_i that has the largest value prevails (let's call this index i_max). You can rearrange the generalized average formula to take the following form, which has less pathological behavior for large p:

Regrouped Generalized Average Formula

If this is applied (using standard numpy procedures, including numpy.log1p ), another 100 evenly distributed samples over [0.2, 2.0] that the rebuilt formula exactly matches the simple formula, but remains true for much larger p values, for which the simple formula overflows when calculating degrees x_i.

Generalized mean for large p

(Note that the left chart has a blue curve for a simple formula shifted by 0.1, so you can see where it ends due to overflow. For p less than about 1000, the two curves would otherwise be indistinguishable.)

0
source share

All Articles