Calculation of the average of two values, minimizing errors

I do floating point calculations and the results are not as accurate as I want them to be.

This is the algorithm:

 ...
 center = (max_x + min_x) / 2
 distance = old_x - center
 new_x = center + (distance * factor)

 return new_x

min_x, max_x and old_x are all floats. I believe that the biggest error occurs when I take the average value of max and min, and then the error is multiplied by a coefficient (which can be a float).

How can I minimize the error due to FP calculation so that new_x is as accurate as possible?

+8
floating-point algorithm numerical-analysis
source share
5 answers

If old_x and center are closed, you lose accuracy.

It's called Loss of Importance.

You can change the calculation so that the subtraction finally occurs at the end:

center = (max_x + min_x) / 2 new_x = (center + (old_x * factor)) - (center * factor) 
+4
source share

Depending on your language, there is probably a fixed / arbitrary exact numeric type that you can use, such as decimal in python or BigDecimal in Java .

+2
source share

All previous implementations do not use rounding and, therefore, have a big mistake: Here, how to do this in fixed-point math: I use X.1u prevision (1 LSB is used for a fraction).

 //center = (max_x + min_x) / 2 center = max_x + min_x // zero error here // distance = old_x - center distance = (old_x << 1) - center // zero error here //new_x = center + (distance * factor) new_x = (**1** + center + (distance * factor)) >> 1 return new_x 

If the factor is a fixed point (integer) also with N bits describing the fraction, then new_x can be calculated as:

 new_x = ( (1 << N) + (center << N) + (distance * factor) ) >> (N + 1) 
  • (center <N) has N + 1 bit fractions
  • Distance * Factor has bit bit N + 1
  • (1 <N) represents "half" as 1 <(N + 1) is "one" in the above fixed-point precision.

After understanding each part, the above line can be summarized:

 new_x = ( ((1 + center) << N) + (distance * factor) ) >> (N + 1) 

The integer type used should be large enough, of course. If the valid range is unknown, you should check the input of this function and something else. In most cases this is not required.

This is just as good as fixed-point math. This is how HW circuits perform entire mathematical operations.

+2
source share

This fixes at least one source of the error from your original algorithm:

 # Adding min and max can produce a value of larger magnitude, losing some low-order bits center = min_x + (max_x - min_x)/2 distance = old_x - center new_x = center + (distance * factor) return new_x 

If you have more knowledge of the relationship between old_x , min_x and max_x , you can probably do it better.

+1
source share

As Yohai says, your problem is probably caused by subtracting old_x - center . If old_x and center are close together, you lose accuracy.

A simple solution would be done using double instead of float , but I think this is not possible. In this case, you need to get rid of the subtraction. One of the possibilities is

 distance_max = max_x - center distance_min = min_x - center distance = (distance_max + distance_min) / 2 new_x = center + factor * distance 

This helps if max_x , min_x and center are far apart and the average value of max_x and min_x close to center . If this does not help, maybe you can adapt the calculation of max_x so that you actually calculate max_x - center , but this requires changes in the part that you did not show us.

+1
source share

All Articles