I'm trying to use knn in R (used multiple packages ( knnflex, class)) to predict the probability of default based on 8 variables. The data set is about 100 thousand rows of 8 columns, but my machine seems to have difficulty with a sample of 10 thousand rows. Any suggestions for doing knn on a dataset> 50 rows (i.e. iris)?
EDIT:
To clarify, there are several problems.
1) Examples in packets classand knnflexa little unclear, and I was curious if there was something similar to the implementation of the package randomForest, where you give it a variable you want to predict, and the data you want to use for training model:
RF <- randomForest(x, y, ntree, type,...)
then turn around and use the model to predict data using a test data set:
pred <- predict(RF, testData)
2) I do not understand why I knnwant to train and test the data to build the model. From what I can tell, the package creates a ~ do matrix nrows(trainingData)^2, which also appears to be the upper limit of the size of the predicted data. I created a model using 5000 lines (above that # I received memory allocation errors) and was unable to predict a test suite> 5000 lines. So I will need:
a) find a way to use> 5000 lines in the training set
or
b) find a way to use the model on all 100k lines.
source
share