In general, there are two main reasons why you get this error message:
- If the data frame contains a character vector column instead of factors. Just convert character column to factor
2. If the data contains bad values, applying a random forest will also cause this error. The chapter does not display outlier values. For example:
x = rep (x = sample (c (0,1)), times = 24)
y = c(sample.int(n=50,size = 40),Inf,Inf) df = data.frame(col1 = x , col2 = y ) head(df) col1 col2 > 1 1 26 > 2 0 33 > 3 1 23 > 4 0 21 > 5 1 45 > 6 0 27
Now applying randomForest to df will result in the same error:
model = randomForest (data = df, col2 ~ col1, ntree = 10)
Error in randomForest.default (m, y, ...): NA / NaN / Inf in an external function call (arg 2)
Decision. Allows you to identify bad values ββin df. As stated above, the is.finite () method checks whether the input vector contains the correct final values ββor not. For example:
is.finite (s (5,6,1000000, NaN Inf))
[1] TRUE TRUE TRUE FALSE FALSE
Now let's identify the columns containing the bad values ββin our data frame and count them.
sum (! is.finate (as.vector (df [, names (df)% in% c ("col2")])))
[14
sum (! is.finate (as.vector (df [, names (df)% in% c ("col1")])))
[10
Allows you to delete these entries and just take good entries:
df1 = df [is.finite (as.vector (df [, names (df)% in% c ("col2")])) &
is.finite (as.vector (df [, names (df)% in% c ("col1")])),]
And run randomForest again:
model1 = randomForest (data = df1, col2 ~ col1, ntree = 10)
Call:
randomForest (formula = col2 ~ col1, data = df1, ntree = 10)