Array numpy from csv file for lasagna

I started learning how to use theano with lasagne, and started with the mnist example. Now I want to try my own example: I have a train.csv file in which each line starts with 0 or 1, which represents the correct answer, followed by 773 0 and 1, which represent the input. I did not understand how I can turn this file into the necessary numpy arrays in the load_database () function. this is part of the original function for the mnist database:

... with gzip.open(filename, 'rb') as f: data = pickle_load(f, encoding='latin-1') # The MNIST dataset we have here consists of six numpy arrays: # Inputs and targets for the training set, validation set and test set. X_train, y_train = data[0] X_val, y_val = data[1] X_test, y_test = data[2] ... # We just return all the arrays in order, as expected in main(). # (It doesn't matter how we do this as long as we can read them again.) return X_train, y_train, X_val, y_val, X_test, y_test 

and I need to get X_train (input) and y_train (beginning of each line) from my csv files.

Thanks!

+6
source share
1 answer

You can use numpy.genfromtxt() or numpy.loadtxt() as follows:

 from sklearn.cross_validation import KFold Xy = numpy.genfromtxt('yourfile.csv', delimiter=",") # the next section provides the required # training-validation set splitting but # you can do it manually too, if you want skf = KFold(len(Xy)) for train_index, valid_index in skf: ind_train, ind_valid = train_index, valid_index break Xy_train, Xy_valid = Xy[ind_train], Xy[ind_valid] X_train = Xy_train[:, 1:] y_train = Xy_train[:, 0] X_valid = Xy_valid[:, 1:] y_valid = Xy_valid[:, 0] ... # you can simply ignore the test sets in your case return X_train, y_train, X_val, y_val #, X_test, y_test 

In the code snippet, we ignored passing test .

Now you can import your data set into the main module or script or something else, but keep in mind that you also delete the entire test part.

Or you can just pass valid sets as test :

 # you can simply pass the valid sets as `test` set return X_train, y_train, X_val, y_val, X_val, y_val 

In the latter case, we do not need to worry about the sections of the main modules related to the excluded test set, but as estimates (if any) you will get validation scores twice, i.e. test scores .

Note. . I don’t know what mnist example is this, but probably after you have prepared your data as indicated above, you also need to make additional changes to your trainer module to match your data. For example: a data entry form, an output form, that is, the number of classes, for example. in your case the first 773 , the last 2 .

+1
source

All Articles