Getting the "(index) logical index too long" error while learning SVM from package e1071 to R

I train svm with my traindata. (package e1071 in R). Below is information about my data.

> str(train) 'data.frame': 891 obs. of 10 variables: $ survived: int 0 1 1 1 0 0 0 0 1 1 ... $ pclass : int 3 1 3 1 3 3 1 3 3 2 ... $ name : Factor w/ 15 levels "capt","col","countess",..: 12 13 9 13 12 12 12 8 13 13 $ sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ... $ age : num 22 38 26 35 35 ... $ ticket : Factor w/ 533 levels "110152","110413",..: 516 522 531 50 473 276 86 396 $ fare : num 7.25 71.28 7.92 53.1 8.05 ... $ cabin : Factor w/ 9 levels "a","b","c","d",..: 9 3 9 3 9 9 5 9 9 9 ... $ embarked: Factor w/ 4 levels "","C","Q","S": 4 2 4 4 4 3 4 4 4 2 ... $ family : int 1 1 0 1 0 0 0 4 2 1 ... 

I train him as follows.

 library(e1071) model1 <- svm(survived~.,data=train, type="C-classification") 

There are no problems. But when I predict how:

 pred <- predict(model1,test) 

I get the following error:

 Error in newdata[, object$scaled, drop = FALSE] : (subscript) logical subscript too long 

I also tried to remove the ticket predictor from both train and test data. But still the same mistake. What is the problem?

+8
r svm
source share
3 answers

There may be a difference in the number of levels in one of the factors in the test data set.

run str (test) and verify that the factor variables have the same levels corresponding to the variables in the train dataset.

those. the example below shows that my.test $ foo has only 4 levels .....

 str(my.train) 'data.frame': 554 obs. of 7 variables: .... $ foo: Factor w/ 5 levels "C","Q","S","X","Z": 2 2 4 3 4 4 4 4 4 4 ... str(my.test) 'data.frame': 200 obs. of 7 variables: ... $ foo: Factor w/ 4 levels "C","Q","S","X": 3 3 3 3 1 3 3 3 3 3 ... 
+14
source share

The fact that the train data correctly contains 2 spaces for entry, because of this there is one additional categorical value for spaces, and you get this error.

$ Issued: coefficient with 4 levels "," C "," Q "," S ": 4 2 4 4 4 3 4 4 4 2 ...

First empty

+2
source share

I also played with this dataset. I know this was a long time ago, but one of the things you can do explicitly includes only those columns that you think will be added to the model, for example:

 fit <- svm(Survived~Pclass + Sex + Age + SibSp + Parch + Fare + Embarked, data=train) 

This fixed the problem for me by excluding columns that don't give anything (like a ticket number) that don't have the corresponding data.

0
source share

All Articles