Calculate weighted average values ​​for large numbers

I am trying to get the weighted average of several numbers. I basically:

Price - 134.42 Quantity - 15236545 

There can be no more than one or two or as many as fifty or sixty pairs of prices and quantities. I need to find out the weighted average price. In principle, a weighted average should give very little weight to pairs, such as

 Price - 100000000.00 Quantity - 3 

and more for a couple above.

The formula I currently have is:

 ((price)(quantity) + (price)(quantity) + ...)/totalQuantity 

So far this has been done:

  double optimalPrice = 0; int totalQuantity = 0; double rolling = 0; System.out.println(rolling); Iterator it = orders.entrySet().iterator(); while(it.hasNext()) { System.out.println("inside"); Map.Entry order = (Map.Entry)it.next(); double price = (Double)order.getKey(); int quantity = (Integer)order.getValue(); System.out.println(price + " " + quantity); rolling += price * quantity; totalQuantity += quantity; System.out.println(rolling); } System.out.println(rolling); return rolling/totalQuantity; 

The problem is that I very quickly maximize the "moving" variable.

How can I get the weighted average?

+6
java average weighted-average
source share
7 answers

One solution is to use java.math.BigInteger for rolling and totalQuantity , and only divide them at the end. This has better numerical stability, because in the end you have only one floating point separation, and everything else is whole operations.

BigInteger is basically unlimited, so you should not run into overflows.

EDIT: Sorry, only after re-reading I noticed that your price is double . It might be worth getting around this by multiplying it by 100 and then converting to BigInteger - since I see in your example that it has exactly 2 digits to the right of the decimal point - and then divide it by 100 at the end, although this is a bit of a hack.

+3
source share

A double can contain a fairly large number (about 1.7 x 10 ^ 308, according to the docs), but you probably shouldn't use it for values ​​where precise accuracy is required (e.g. monetary values).

Take a look at the BigDecimal class. This SO question speaks more about this.

+3
source share

For maximum flexibility, use BigDecimal for rolling and BigInteger for totalQuantity . After dividing (note that you have the opposite, it should be roll / totalQuantity), you can either return BigDecimal, or use doubleValue with loss of precision.

+1
source share

At any given point, you wrote down both the total value ax + by + cz + ... = pq and the total weight a + b + c + ... = p . Knowing both gives you the average pq/p = q . The problem is that pq and p are large sums that overflow, even if you just want a moderate size q .

The next step adds, for example, weight r and value s . You want to find the new sum (pq + rs) / (p + r) using only the q value, which can only happen if p and pq somehow “canceled” while in the numerator and denominator of the same fraction . This is not possible, as I will show.

The value to be added to this iteration is naturally

 (pq + rs) / (p + r) - q 

Which cannot be simplified to the point where p*q and p disappear. You can also find

 (pq + rs) / q(p + r) 

the coefficient by which you multiply q to get the next average value; but again, pq and p remain. Therefore, there is no smart solution.

Others mentioned variables of arbitrary precision, and this is a good solution here. The size of p and pq grows linearly with the number of records, and memory usage and the speed of calculating integers / float grows logarithmically with the size of the values. Thus, performance equals O (log (n)), in contrast to the disaster that would happen if p were somehow a multiple of many numbers.

0
source share

First, I don’t see how you could “maximize” the rolling variable. As @Ash points out, it can represent values ​​up to about 1.7 x 10^308 . The only possibility I can think of is that you have some bad values ​​at your input. (Perhaps the real problem is that you are losing accuracy ...)

Secondly, your use of Map for submitting orders is strange and probably broken. As you use it, you cannot submit orders involving two or more items with the same price.

0
source share

Your final result is simply a weighted average accuracy, so you probably don’t need to follow the rules used in calculating account balances, etc. If I'm right about the above, you do not need to use BigDecimal , double will be enough.

The overflow problem can be solved by storing the "current average" and updating with each new record. Namely, let

a_n = (sum_ {i = 1} ^ n x_i * w_i) / (sum_ {i = 1} ^ n w_i)

for n = 1, ..., N. You start with a_n = x_n and then add

d_n: = a_ {n + 1} - a_n

to him. The formula for d_n is

d_n = (x_ {n + 1} - w_ {n + 1} * a_n) / W_ {n + 1}

where W_n: = sum_ {i = 1} ^ n w_n. You need to track W_n, but this problem can be solved by saving it as a double (this will be normal, since we are only interested in the average). You can also normalize your weight if you know that all your weights are a multiple of 1000, just divide them by 1000.

To get extra accuracy, you can use compensated summation .

Preventive explanation: here you can use floating point arithmetic. double has a relative accuracy of 2E-16. OP averages positive numbers, so there will be no undo error. What proponents of arbitrary precision arithmetic do not tell you is that, leaving aside the rounding rules, in cases where this gives you a lot of extra precision compared to IEEE754 floating point arithmetic, it will have significant cost and performance . Floating-point arithmetic was developed by very smart people (for example, Professor Kahan and others), and if there was a way to cheaply increase arithmetic accuracy compared to what is offered with floating-point, they would do it.

Disclaimer: if your weights are completely crazy (one is 1, the other is 10,000,000), then I'm not 100% sure if you get satisfactory accuracy, but you can check it with some example when you know that the answer should be.

0
source share

Make two loops: first calculate totalQuantity in the first loop. Then, in the second cycle, the price * (quantity / totalQuantity) accumulates.

0
source share

All Articles