KNN in R: "train and class have different lengths"?

Here is my code:

train_points <- read.table("kaggle_train_points.txt", sep="\t") train_labels <- read.table("kaggle_train_labels.txt", sep="\t") test_points <- read.table("kaggle_test_points.txt", sep="\t") #uses package 'class' library(class) knn(train_points, test_points, train_labels, k = 5); 

dim(train_points) - 42000 x 784
dim(train_labels) - 42000 x 1

I do not see the problem, but I get the error:

Error in knn (train_points, test_points, train_labels, k = 5):
"train" and "class" have different lengths.

What is the problem?

+11
r
source share
5 answers

Without access to data, it is very difficult. However, I suspect train_labels should be a vector. Therefore try

 cl = train_labels[,1] knn(train_points, test_points, cl, k = 5) 

Also double check:

 dim(train_points) dim(test_points) length(cl) 
+15
source share

I recently came across a very similar question. I wanted to give only one column as a predictor. In such cases, when choosing a column, you should remember the drop argument and set it to FALSE. The knn() function accepts only matrices or data frames as arguments to the train and test. Not vectors.

knn(train = trainSet[, 2, drop = FALSE], test = testSet[, 2, drop = FALSE], cl = trainSet$Direction, k = 5)

+2
source share

Try converting the data into a data frame using as.dataframe (). I had the same problem & after that everything worked fine:

 train_pointsdf <- as.data.frame(train_points) train_labelsdf <- as.data.frame(train_labels) test_pointsdf <- as.data.frame(test_points) 
+2
source share

Just set drop = TRUE while you exclude cl from the data frame, this will remove the dimension from the array, which has only one level:

 cl = train_labels[,1, drop = TRUE] knn(train_points, test_points, cl, k = 5) 
+1
source share

I had the same problem when trying to use knn to diagnose breast cancer from the Wisconsin dataset, I found that the problem was due to the fact that the cl argument must be a vector factor (my mistake was to write cl = labels I thought that this predicted vector was actually a data frame from one column), so the solution was to use the following syntax: knn (train, test, cl = labels $ diagnosis, k = 21) the diagnosis was the heading of one label column data frame and it worked well o Hope this helps!

+1
source share

All Articles