Is it possible to force an increase in the exponent or value of float to match another melt (Python)?

This is an interesting question that I tried to work out the other day. Is it possible to make the value or exponent of one float be the same as another float in Python?

The question arises because I tried to rescale some data so that min and max correspond to another data set. However, my changed data was slightly disabled (about 6 decimal places), and that was enough to cause problems in the line.

To give an idea, I have f1 and f2 ( type(f1) == type(f2) == numpy.ndarray ). I want np.max(f1) == np.max(f2) and np.min(f1) == np.min(f2) . For this, I:

 import numpy as np f2 = (f2-np.min(f2))/(np.max(f2)-np.min(f2)) # f2 is now between 0.0 and 1.0 f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1) # f2 is now between min(f1) and max(f1) 

Result (as an example):

 np.max(f1) # 5.0230593 np.max(f2) # 5.0230602 but I need 5.0230593 

My initial thought was that a forced float would be the correct solution. I could not find much on it, so I made a workaround for my need:

 exp = 0 mm = np.max(f1) # find where the decimal is while int(10**exp*mm) == 0 exp += 1 # add 4 digits of precision exp += 4 scale = 10**exp f2 = np.round(f2*scale)/scale f1 = np.round(f1*scale)/scale 

now np.max(f2) == np.max(f1)

However, is there a better way? Did I do something wrong? Is it possible to change the float value to a similar value with another float (exponent or other means)?

EDIT: as suggested, I now use:

 scale = 10**(-np.floor(np.log10(np.max(f1))) + 4) 

While my solution above will work (for my application), I am interested to know if there is a solution that can somehow make the float have the same metric and / or significance so that the numbers become identical.

+8
python floating-point numpy floating-accuracy
source share
4 answers

TL; DR

Using

 f2 = f2*np.max(f1)-np.min(f1)*(f2-1) # f2 is now between min(f1) and max(f1) 

and make sure you use double precision, compare floating point numbers, looking at absolute or relative differences, avoid rounding to adjust (or compare) floating point numbers, and don't set the basic components of floating point numbers manually.

More details

This is not a very simple mistake to reproduce, as you have discovered. However, working with floating numbers is error prone. For example, adding together 1 000 000 000 + 0 . 000 000 000 1 1 000 000 000 + 0 . 000 000 000 1 gives 1 000 000 000 . 000 000 000 1 1 000 000 000 . 000 000 000 1 , but this is too many significant digits even for double precision (which supports about 15 significant digits ), so the final decimal number is discarded. Moreover, some β€œshort” numbers cannot be represented exactly as indicated in @Kevin's answer . See, for example, here , for more information. (Find something else like "floating-point rounding errors").

Here is an example that demonstrates the problem:

 import numpy as np numpy.set_printoptions(precision=16) dtype=np.float32 f1 = np.linspace(-1000, 0.001, 3, dtype=dtype) f2 = np.linspace(0, 1, 3, dtype=dtype) f2 = (f2-np.min(f2))/(np.max(f2)-np.min(f2)) # f2 is now between 0.0 and 1.0 f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1) # f2 is now between min(f1) and max(f1) print (f1) print (f2) 

Exit

 [ -1.0000000000000000e+03 -4.9999951171875000e+02 1.0000000474974513e-03] [ -1.0000000000000000e+03 -4.9999951171875000e+02 9.7656250000000000e-04] 

Following the @Mark Dickinson comment , I used a 32-bit floating point. This is consistent with the error message, relative error of about 10 ^ -7, around the 7th significant digit

 In: (5.0230602 - 5.0230593) / 5.0230593 Out: 1.791736760621852e-07 

Switching to dtype=np.float64 doing better, but it is still not perfect. Then the program above gives

 [ -1.0000000000000000e+03 -4.9999950000000001e+02 1.0000000000000000e-03] [ -1.0000000000000000e+03 -4.9999950000000001e+02 9.9999999997635314e-04] 

This is not ideal, but usually close enough. When comparing floating point numbers, you almost never want to use strict equality due to the possibility of small errors, as mentioned above. Instead, subtract one number from another and verify that the absolute difference is less than a certain tolerance, and / or look at the relative error. See, for example, numpy.isclose .

Returning to your problem, it seems like it should be possible to do better. In the end, f2 has a range from 0 to 1, so you should be able to replicate a maximum to f1 . The problem occurs in the line

 f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1) # f2 is now between min(f1) and max(f1) 

because when the element f2 is 1, you do a lot more than just multiplying 1 by the maximum of f1 , which leads to the occurrence of floating point arithmetic errors. Please note that you can break the brackets f2*(np.max(f1)-np.min(f1)) into f2*np.max(f1) - f2*np.min(f1) , and then decompose the result - f2*np.min(f1) + np.min(f1) to np.min(f1)*(f2-1) , specifying

 f2 = f2*np.max(f1)-np.min(f1)*(f2-1) # f2 is now between min(f1) and max(f1) 

So, when the element f2 is 1, we have 1*np.max(f1) - np.min(f1)*0 . Conversely, when the element f2 is 0, we have 0*np.max(f1) - np.min(f1)*1 . The numbers 1 and 0 can be accurately represented, so there should be no errors.

The modified program displays

 [ -1.0000000000000000e+03 -4.9999950000000001e+02 1.0000000000000000e-03] [ -1.0000000000000000e+03 -4.9999950000000001e+02 1.0000000000000000e-03] 

i.e. optional.

However, I would still strongly recommend using inaccurate floating point comparisons (with tight limits if you need to) if you have no good reason not to. There are all sorts of subtle errors that can occur in floating point arithmetic, and the easiest way to avoid them is to never use exact comparisons.

An alternative approach to the above, which may be preferable, would be to rescale both arrays between 0 and 1. This may be the most suitable form for use in the program. (And both arrays could be multiplied by a scaling factor, such as the original range f1 , if necessary.)

Re using rounding to solve your problem, I would not recommend this. The problem with rounding - in addition to not necessarily reducing the accuracy of your data, is that numbers that are very close can round off in different directions. For example.

 f1 = np.array([1.000049]) f2 = np.array([1.000051]) print (f1) print (f2) scale = 10**(-np.floor(np.log10(np.max(f1))) + 4) f2 = np.round(f2*scale)/scale f1 = np.round(f1*scale)/scale print (f1) print (f2) 

Exit

 [ 1.000049] [ 1.000051] [ 1.] [ 1.0001] 

This is due to the fact that although it is common to discuss numbers corresponding to many significant numbers, people do not actually compare them in this way on a computer. You calculate the difference and then divide by the correct number (for relative error).

Re mantissas and indicators, see math.frexp and math.ldexp , are documented here . I would not recommend installing them yourself (consider two numbers that are very close, but have different indicators, for example - you really want to install the mantissa). It is much better to just directly set the maximum f2 explicitly to the maximum f1 if you want the numbers to be exactly the same (and similarly for the minimum).

+2
source share

It depends on what you mean by mantissa.

Inside, the floats are stored using scientific notation in database 2. So, if you mean the base 2 of the mantissa, it is actually very simple: just multiply or divide by two (not degrees) 10, and the mantissa will remain the same (with provided that the indicator does not fall outside the range, and if this happens, you will get sandwiched to infinity or zero or, possibly, will enter denormal numbers depending on the architectural details). It is important to understand that decimal decompositions will not match when you rescale the power of two. This is a binary extension that is saved using this method.

But if you mean the base 10 of the mantissa, no, this is not possible with floats, because the changed value cannot be accurately represented. For example, 1.1 cannot be accurately depicted in base 2 (with a finite number of digits) in much the same way as 1/3 cannot be represented in base 10 (with a finite number of digits). Thus, scaling 11 down to 1/10 cannot be done exactly:

 >>> print("%1.29f" % (11 * 0.1)) 1.10000000000000008881784197001 

You can, however, do the latter with decimal s. Decimal numbers work in base 10 and will behave as expected in terms of scaling base 10. They also provide a fairly large number of specialized functions for detecting and processing various types of accuracy loss. But decimals do not use NumPy acceleration , so if you have a very large amount of data to work with, they may not be efficient enough for your use. Since NumPy depends on hardware support for floating point, and most (all?) Modern architectures do not support hardware support for base 10, this is not easy to fix.

+7
source share

Try replacing the second line with

 f2 = f2*np.max(f1) + (1.0-f2)*np.min(f1) 

Explanation: There are 2 places where the difference can be painted over:

Step 1) f2 = (f2-np.min(f2))/(np.max(f2)-np.min(f2))

When you check np.min(f2) and np.max(f2) , do you get exactly 0 and 1 or something like 1.0000003?

Step 2) f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1)

An expression of type (ab)+b does not always produce exactly a due to a rounding error. The proposed expression is somewhat more stable.

For a very detailed explanation, see What Every Computer Scientist Should Know About David Goldberg's Floating-Point Arithmetic .

+3
source share

Here is one with decimal places

 from decimal import Decimal, ROUND_05UP num1 = Decimal('{:.5f}'.format(5.0230593)) ## Decimal('5.02306') num2 = Decimal('{}'.format(5.0230602)) ## Decimal('5.0230602') print num2.quantize(num1, rounding=ROUND_05UP) ## 5.02306 

EDIT ** I'm a little confused why I get so much negative feedback, so here is another solution not using decimals:

 a = 5.0230593 b = 5.0230602 if abs(a - b) < 1e-6: b = a 
-2
source share

All Articles