Suppose your data is a list of rows, i.e.
data = ["....", "...", ]
Then you can divide it into training (80%) and test (20%) using train_test_split , for example. by doing:
from sklearn.cross_validation import train_test_split train, test = train_test_split(data, train_size = 0.8)
Before rushing with this, read these documents . 2500 is not a "big case", and you probably want to do something like cross-validating k-fold, rather than sharing a single bay.
KT.
source share