How to set up a very simple LSTM with Keras / Theano for regression

I am trying to configure Keras LSTM for a simple regression task. There are several key explanations on the official page: Keras RNN Documentation

But to fully understand, example configurations with sample data would be extremely helpful.

I barely found examples for regression with Keras-LSTM. Most examples relate to classification (text or images). I studied the LSTM examples that come with the Keras distribution, and one example that I found on a Google search: http://danielhnyk.cz/ It offers some insight, although the author assumes that the approach is quite memory inefficient, as the data samples must be kept very redundant.

Although the comment was added by a commenter (Taha), the data store is still redundant, I doubt that it was the way it was done by the Keras developers.

I downloaded some simple examples of serial data, which turned out to be stock data from Yahoo finance. It is freely available from Yahoo Finance Data.

Date, Open, High, Low, Close, Volume, Adj Close 2016-05-18, 94.160004, 95.209999, 93.889999, 94.559998, 41923100, 94.559998 2016-05-17, 94.550003, 94.699997, 93.010002, 93.489998, 46507400, 93.489998 2016-05-16, 92.389999, 94.389999, 91.650002, 93.879997, 61140600, 93.879997 2016-05-13, 90.00, 91.669998, 90.00, 90.519997, 44188200, 90.519997 

The table consists of over 8900 such Apple stock data rows. For each day, there are 7 columns = data. The forecast value will be "AdjClose", which is the value at the end of the day.

Thus, the goal would be to predict AdjClose the next day, based on the sequence of the previous few days. (This is probably almost impossible, but it's always good to see how the tool behaves in difficult conditions.)

I think this should be a very standard case of prediction / regression for LSTM and easily ported to other problem areas.

So, how should data be formatted (X_train, y_train) for minimal redundancy, and how to initialize a sequential model with just one LSTM layer and several hidden neurons?

Regards, Theo

PS: I started to code this:

 ... X_train Out[6]: array([[ 2.87500000e+01, 2.88750000e+01, 2.87500000e+01, 2.87500000e+01, 1.17258400e+08, 4.31358010e-01], [ 2.73750019e+01, 2.73750019e+01, 2.72500000e+01, 2.72500000e+01, 4.39712000e+07, 4.08852011e-01], [ 2.53750000e+01, 2.53750000e+01, 2.52500000e+01, 2.52500000e+01, 2.64320000e+07, 3.78845006e-01], ..., [ 9.23899994e+01, 9.43899994e+01, 9.16500015e+01, 9.38799973e+01, 6.11406000e+07, 9.38799973e+01], [ 9.45500031e+01, 9.46999969e+01, 9.30100021e+01, 9.34899979e+01, 4.65074000e+07, 9.34899979e+01], [ 9.41600037e+01, 9.52099991e+01, 9.38899994e+01, 9.45599976e+01, 4.19231000e+07, 9.45599976e+01]], dtype=float32) y_train Out[7]: array([ 0.40885201, 0.37884501, 0.38822201, ..., 93.87999725, 93.48999786, 94.55999756], dtype=float32) 

While the data is ready. The introduction of redundancy is absent. Now the question is how to describe the KERS LSTM model / learning process from this data.

EDIT 3:

Here is the updated code with the three-dimensional data structure needed for repeating networks. (See Lorrit's Answer). However, this does not work.

EDIT 4: remove the extra comma after activation ("sigmoid"), form Y_train correctly. All the same mistakes.

 import numpy as np from keras.models import Sequential from keras.layers import Dense, Activation, LSTM nb_timesteps = 4 nb_features = 5 batch_size = 32 # load file X_train = np.genfromtxt('table.csv', delimiter=',', names=None, unpack=False, dtype=None) # delete the first row with the names X_train = np.delete(X_train, (0), axis=0) # invert the order of the rows, so that the oldest # entry is in the first row and the newest entry # comes last X_train = np.flipud(X_train) # the last column is our Y Y_train = X_train[:,6].astype(np.float32) Y_train = np.delete(Y_train, range(0,6)) Y_train = np.array(Y_train) Y_train.shape = (len(Y_train), 1) # we don't use the timestamps. convert the rest to Float32 X_train = X_train[:, 1:6].astype(np.float32) # shape X_train X_train.shape = (1,len(X_train), nb_features) # Now comes Lorrit code for shaping the 3D-input-data # http://stackoverflow.com/questions/36992855/keras-how-should-i-prepare-input-data-for-rnn flag = 0 for sample in range(X_train.shape[0]): tmp = np.array([X_train[sample,i:i+nb_timesteps,:] for i in range(X_train.shape[1] - nb_timesteps + 1)]) if flag==0: new_input = tmp flag = 1 else: new_input = np.concatenate((new_input,tmp)) X_train = np.delete(new_input, len(new_input) - 1, axis = 0) X_train = np.delete(X_train, 0, axis = 0) X_train = np.delete(X_train, 0, axis = 0) # X successfully shaped # free some memory tmp = None new_input = None # split data for training, validation and test # 50:25:25 X_train, X_test = np.split(X_train, 2, axis=0) X_valid, X_test = np.split(X_test, 2, axis=0) Y_train, Y_test = np.split(Y_train, 2, axis=0) Y_valid, Y_test = np.split(Y_test, 2, axis=0) print('Build model...') model = Sequential([ Dense(8, input_dim=nb_features), Activation('softmax'), LSTM(4, dropout_W=0.2, dropout_U=0.2), Dense(1), Activation('sigmoid') ]) model.compile(loss='mse', optimizer='RMSprop', metrics=['accuracy']) print('Train...') print(X_train.shape) print(Y_train.shape) model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=15, validation_data=(X_test, Y_test)) score, acc = model.evaluate(X_test, Y_test, batch_size=batch_size) print('Test score:', score) print('Test accuracy:', acc) 

There is still a data problem, Keras says:

 Using Theano backend. Using gpu device 0: GeForce GTX 960 (CNMeM is disabled, cuDNN not available)Build model... Traceback (most recent call last): File "<ipython-input-1-3a6e9e045167>", line 1, in <module> runfile('C:/Users/admin/Documents/pycode/lstm/lstm5.py', wdir='C:/Users/admin/Documents/pycode/lstm') File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile execfile(filename, namespace) File "C:\Users\admin\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile exec(compile(scripttext, filename, 'exec'), glob, loc) File "C:/Users/admin/Documents/pycode/lstm/lstm5.py", line 79, in <module> Activation('sigmoid') File "d:\git\keras\keras\models.py", line 93, in __init__ self.add(layer) File "d:\git\keras\keras\models.py", line 146, in add output_tensor = layer(self.outputs[0]) File "d:\git\keras\keras\engine\topology.py", line 441, in __call__ self.assert_input_compatibility(x) File "d:\git\keras\keras\engine\topology.py", line 382, in assert_input_compatibility str(K.ndim(x))) Exception: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=2 
+6
source share
3 answers

In the model definition, you placed the Dense layer in front of the LSTM layer. For the Dense layer, you must use the TimeDistributed layer.

Try to change

 model = Sequential([ Dense(8, input_dim=nb_features), Activation('softmax'), LSTM(4, dropout_W=0.2, dropout_U=0.2), Dense(1), Activation('sigmoid') ]) 

to

 model = Sequential([ TimeDistributed(Dense(8, input_dim=nb_features, Activation='softmax')), LSTM(4, dropout_W=0.2, dropout_U=0.2), Dense(1), Activation('sigmoid') ]) 
+2
source

Before submitting data to LSTM, one preprocessing step is still missing. You will need to decide how many previous data samples (previous days) you want to include in your current AdjClose calculation. See my answer here on how to do this. Your data should be 3-dimensional in form (nb_samples, nb_included_previous_days, functions).

Then you can apply 3D to a standard LSTM layer with one output. This value can be compared with y_train and try to minimize the error. Remember to choose a loss function that is suitable for regression, for example. root mean square error.

+1
source

Not sure if this is still relevant, but there is a great example of how to use LSTM networks for time series forecasting on Dr. Jason Braunles' blog here

I prepared an example for three variable noise sinusoids with different amplitudes. Not market data, but I assume that you assume that one stock will say something about another.

 import numpy import matplotlib.pyplot as plt import pandas import math from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.layers import Reshape from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error # generate sine wavepip def make_sine_with_noise(_start, _stop, _step, _phase_shift, gain): x = numpy.arange(_start, _stop, step = _step) noise = numpy.random.uniform(-0.1, 0.1, size = len(x)) y = gain*0.5*numpy.sin(x+_phase_shift) y = numpy.add(noise, y) return x, y # convert an array of values into a dataset matrix def create_dataset(dataset, look_back=1, look_ahead=1): dataX, dataY = [], [] for i in range(len(dataset) - look_back - look_ahead - 1): a = dataset[i:(i + look_back), :] dataX.append(a) b = dataset[(i + look_back):(i + look_back + look_ahead), :] dataY.append(b) return numpy.array(dataX), numpy.array(dataY) # fix random seed for reproducibility numpy.random.seed(7) # generate sine wave x1, y1 = make_sine_with_noise(0, 200, 1/24, 0, 1) x2, y2 = make_sine_with_noise(0, 200, 1/24, math.pi/4, 3) x3, y3 = make_sine_with_noise(0, 200, 1/24, math.pi/2, 20) # plt.plot(x1, y1) # plt.plot(x2, y2) # plt.plot(x3, y3) # plt.show() #transform to pandas dataframe dataframe = pandas.DataFrame({'y1': y1, 'y2': y2, 'x3': y3}) dataset = dataframe.values dataset = dataset.astype('float32') # normalize the dataset scaler = MinMaxScaler(feature_range=(0, 1)) dataset = scaler.fit_transform(dataset) #split into train and test sets train_size = int(len(dataset) * 0.67) test_size = len(dataset) - train_size train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:] # reshape into X=t and Y=t+1 look_back = 10 look_ahead = 5 trainX, trainY = create_dataset(train, look_back, look_ahead) testX, testY = create_dataset(test, look_back, look_ahead) print(trainX.shape) print(trainY.shape) # reshape input to be [samples, time steps, features] trainX = numpy.reshape(trainX, (trainX.shape[0], trainX.shape[1], trainX.shape[2])) testX = numpy.reshape(testX, (testX.shape[0], testX.shape[1], testX.shape[2])) # create and fit the LSTM network model = Sequential() model.add(LSTM(look_ahead, input_shape=(trainX.shape[1], trainX.shape[2]), return_sequences=True)) model.add(LSTM(look_ahead, input_shape=(look_ahead, trainX.shape[2]))) model.add(Dense(trainY.shape[1]*trainY.shape[2])) model.add(Reshape((trainY.shape[1], trainY.shape[2]))) model.compile(loss='mean_squared_error', optimizer='adam') model.fit(trainX, trainY, epochs=1, batch_size=1, verbose=1) # make prediction trainPredict = model.predict(trainX) testPredict = model.predict(testX) #save model model.save('my_sin_prediction_model.h5') trainPredictPlottable = trainPredict[::look_ahead] trainPredictPlottable = [item for sublist in trainPredictPlottable for item in sublist] trainPredictPlottable = scaler.inverse_transform(numpy.array(trainPredictPlottable)) # create single testPredict array concatenating every 'look_ahed' prediction array testPredictPlottable = testPredict[::look_ahead] testPredictPlottable = [item for sublist in testPredictPlottable for item in sublist] testPredictPlottable = scaler.inverse_transform(numpy.array(testPredictPlottable)) # testPredictPlottable = testPredictPlottable[:-look_ahead] # shift train predictions for plotting trainPredictPlot = numpy.empty_like(dataset) trainPredictPlot[:, :] = numpy.nan trainPredictPlot[look_back:len(trainPredictPlottable)+look_back, :] = trainPredictPlottable # shift test predictions for plotting testPredictPlot = numpy.empty_like(dataset) testPredictPlot[:, :] = numpy.nan testPredictPlot[len(dataset)-len(testPredictPlottable):len(dataset), :] = testPredictPlottable # plot baseline and predictions dataset = scaler.inverse_transform(dataset) plt.plot(dataset, color='k') plt.plot(trainPredictPlot) plt.plot(testPredictPlot) plt.show() 
0
source

All Articles