R knn large data set

Question

R knn large data set

I'm trying to use knn in R (used multiple packages ( knnflex, class)) to predict the probability of default based on 8 variables. The data set is about 100 thousand rows of 8 columns, but my machine seems to have difficulty with a sample of 10 thousand rows. Any suggestions for doing knn on a dataset> 50 rows (i.e. iris)?

EDIT:

To clarify, there are several problems.

1) Examples in packets classand knnflexa little unclear, and I was curious if there was something similar to the implementation of the package randomForest, where you give it a variable you want to predict, and the data you want to use for training model:

RF <- randomForest(x, y, ntree, type,...)

then turn around and use the model to predict data using a test data set:

pred <- predict(RF, testData)

2) I do not understand why I knnwant to train and test the data to build the model. From what I can tell, the package creates a ~ do matrix nrows(trainingData)^2, which also appears to be the upper limit of the size of the predicted data. I created a model using 5000 lines (above that # I received memory allocation errors) and was unable to predict a test suite> 5000 lines. So I will need:

a) find a way to use> 5000 lines in the training set

or

b) find a way to use the model on all 100k lines.

+5

r knn

screechOwl Nov 21 '11 at 21:30

source share

1 answer

joran · Accepted Answer · 2011-11-21T22:33:25+0000

, knn ( ) , , , , "", , .

.

, knn (, , , ). , , .

ipred , , , , , "" . "". , .

, , . , ( ..), , ..

knn 10 . , 8 . , , knn , knnflex, .

R knn large data set

More articles: