The mysterious behavior of the operator seq and ==. The exact problem?

I came across some strange (or just not expected?) Behavior of the seq function. When creating a simple sequence, some values ​​cannot be matched correctly with the == operator. See This Minimal Example:

 my.seq <- seq(0, 0.4, len = 5) table(my.seq) # ok! returns 0 0.1 0.2 0.3 0.4 # 1 1 1 1 1 which(my.seq == 0.2) # ok! returns 3 which(my.seq == 0.3) # !!! returns integer(0) 

When creating my sequence manually, it seems to work:

 my.seq2 <- c(0.00, 0.10, 0.20, 0.30, 0.40) which(my.seq2 == 0.3) # ok! returns 4 

Do you have any explanation? I solved the problem using which(round(my.seq, 2) == 0.3) , but I would be wondering what the problem is.

Thanks in advance for your comments.

+4
source share
3 answers

Computers simply do not represent floating point numbers. The general tendencies of spreadsheets to hide this, as the main way that most people deal with numbers on computers, lead to many problems.

Never match exact floating point values. There are functions in R to handle this (e.g. all.equal ), but I prefer the following.

Let's say you have an unknown floating point variable A, and you want to see if it is equal to 0.5.

 abs(A - 0.5) < tol 

Set the tolerance to how much you need it to 0.5. For example, tol <- 0.0001 may be good for you.

If your values ​​look like this, they must be integers. Or, if you know the decimal level you want to check, then you can round to that decimal level.

+3
source

Computers have a hard time keeping exact values.

 > options(digits=22) > seq(0, .4, len = 5) [1] 0.0000000000000000000000 0.1000000000000000055511 0.2000000000000000111022 [4] 0.3000000000000000444089 0.4000000000000000222045 > .4 [1] 0.4000000000000000222045 > c(0, .1, .2, .3, .4) [1] 0.0000000000000000000000 0.1000000000000000055511 0.2000000000000000111022 [4] 0.2999999999999999888978 0.4000000000000000222045 

Since we are using a binary floating-point representation, we cannot accurately represent the values ​​of interest. It seems that the value for .4 is slightly higher than 0.4, that the value for .3 is slightly higher than if you typed .3. I'm sure someone else will provide a better explanation for this, but hopefully this sheds light on the problem.

+2
source

This is FAQ 7.31 , which also has a link to a longer discussion of the problem as a whole.

+2
source

All Articles