Keras using Tensorflow backend - mask loss function

I am trying to implement a sequence to sequence task using Keras LSTM with a Tensorflow backend. The inputs are English sentences with variable lengths. To build a data set with a two-dimensional form [batch_number, max_sentence_length], I add EOF at the end of the line and put each sentence with enough placeholders, for example. "#". And then each character in the sentence is converted into one hot vector, now the data set has a 3-dimensional form [batch_number, max_sentence_length, character_number]. After the LSTM and decoder layers, the transverse entropy of softmax between the output and the target is calculated.

To eliminate the indentation effect when training the model, masking can be used for the input and loss functions. Entering a mask in Keras can be done using "layer.core.Masking". In Tensorflow, the loss masking function can be done as follows: the custom masked loss function in Tensorflow

However, I did not find a way to implement this in Keras, since the loss function used in keras only accepts the parameters y_true and y_pred. So how to enter true sequence_lengths in the loss function and mask?

In addition, I find the function "_weighted_masked_objective (fn)" in \ keras \ engine \ training.py. Its definition: "Adds support for masking and weighting samples to the objective function." But it seems that the function can only accept fn (y_true, y_pred). Is there a way to use this feature to solve my problem?

To be specific, I modify the Yu-Young example.

from keras.models import Model from keras.layers import Input, Masking, LSTM, Dense, RepeatVector, TimeDistributed, Activation import numpy as np from numpy.random import seed as random_seed random_seed(123) max_sentence_length = 5 character_number = 3 # valid character 'a, b' and placeholder '#' input_tensor = Input(shape=(max_sentence_length, character_number)) masked_input = Masking(mask_value=0)(input_tensor) encoder_output = LSTM(10, return_sequences=False)(masked_input) repeat_output = RepeatVector(max_sentence_length)(encoder_output) decoder_output = LSTM(10, return_sequences=True)(repeat_output) output = Dense(3, activation='softmax')(decoder_output) model = Model(input_tensor, output) model.compile(loss='categorical_crossentropy', optimizer='adam') model.summary() X = np.array([[[0, 0, 0], [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]], [[0, 0, 0], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]]) y_true = np.array([[[0, 0, 1], [0, 0, 1], [1, 0, 0], [0, 1, 0], [0, 1, 0]], # the batch is ['##abb','#babb'], padding '#' [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 0], [0, 1, 0]]]) y_pred = model.predict(X) print('y_pred:', y_pred) print('y_true:', y_true) print('model.evaluate:', model.evaluate(X, y_true)) # See if the loss computed by model.evaluate() is equal to the masked loss import tensorflow as tf logits=tf.constant(y_pred, dtype=tf.float32) target=tf.constant(y_true, dtype=tf.float32) cross_entropy = tf.reduce_mean(-tf.reduce_sum(target * tf.log(logits),axis=2)) losses = -tf.reduce_sum(target * tf.log(logits),axis=2) sequence_lengths=tf.constant([3,4]) mask = tf.reverse(tf.sequence_mask(sequence_lengths,maxlen=max_sentence_length),[0,1]) losses = tf.boolean_mask(losses, mask) masked_loss = tf.reduce_mean(losses) with tf.Session() as sess: c_e = sess.run(cross_entropy) m_c_e=sess.run(masked_loss) print("tf unmasked_loss:", c_e) print("tf masked_loss:", m_c_e) 

The result in Keras and Tensorflow is compared as follows: enter image description here

As shown above, masking is disabled after some types of layers. So, how to mask the loss function in cores when adding these layers?

+7
keras masking lstm loss-function
source share
2 answers

If your model has a mask, it will be distributed in stages and will eventually be applied to the loss. Therefore, if you correctly fill and mask sequences, the loss on fill placeholders will be ignored.

Some details:

A little bit to explain the whole process, so I just unlock it in a few steps:

  • In compile() mask is collected by calling compute_mask() and applied to losses (for clarity, irrelevant lines are ignored).
 weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions] # Prepare output masks. masks = self.compute_mask(self.inputs, mask=None) if masks is None: masks = [None for _ in self.outputs] if not isinstance(masks, list): masks = [masks] # Compute total loss. total_loss = None with K.name_scope('loss'): for i in range(len(self.outputs)): y_true = self.targets[i] y_pred = self.outputs[i] weighted_loss = weighted_losses[i] sample_weight = sample_weights[i] mask = masks[i] with K.name_scope(self.output_names[i] + '_loss'): output_loss = weighted_loss(y_true, y_pred, sample_weight, mask) 
  1. Inside Model.compute_mask() , run_internal_graph() called.
  2. Inside run_internal_graph() masks in the model propagate in stages from model inputs to outputs, calling Layer.compute_mask() for each layer iteratively.

So, if you use the Masking layer in your model, you don’t have to worry about losses in fill placeholders. Loss of these entries will be masked, as you probably already saw inside _weighted_masked_objective() .

A small example:

 max_sentence_length = 5 character_number = 2 input_tensor = Input(shape=(max_sentence_length, character_number)) masked_input = Masking(mask_value=0)(input_tensor) output = LSTM(3, return_sequences=True)(masked_input) model = Model(input_tensor, output) model.compile(loss='mae', optimizer='adam') X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]], [[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]]) y_true = np.ones((2, max_sentence_length, 3)) y_pred = model.predict(X) print(y_pred) [[[ 0. 0. 0. ] [ 0. 0. 0. ] [-0.11980877 0.05803877 0.07880752] [-0.00429189 0.13382857 0.19167568] [ 0.06817091 0.19093043 0.26219055]] [[ 0. 0. 0. ] [ 0.0651961 0.10283815 0.12413475] [-0.04420842 0.137494 0.13727818] [ 0.04479844 0.17440712 0.24715884] [ 0.11117355 0.21645413 0.30220413]]] # See if the loss computed by model.evaluate() is equal to the masked loss unmasked_loss = np.abs(1 - y_pred).mean() masked_loss = np.abs(1 - y_pred[y_pred != 0]).mean() print(model.evaluate(X, y_true)) 0.881977558136 print(masked_loss) 0.881978 print(unmasked_loss) 0.917384 

As you can see from this example, the loss on the masked part (zeros in y_pred ) is ignored, and the output of model.evaluate() is masked_loss .


EDIT:

If there is a repeating level with return_sequences=False , then the mask stop propagates (i.e. the returned mask is None ). In RNN.compute_mask() :

 def compute_mask(self, inputs, mask): if isinstance(mask, list): mask = mask[0] output_mask = mask if self.return_sequences else None if self.return_state: state_mask = [None for _ in self.states] return [output_mask] + state_mask else: return output_mask 

In your case, if I understand correctly, you need a mask based on y_true , and whenever the value of y_true is [0, 0, 1] (hot coding "#"), you want the loss to be masked. If so, you need to mask the loss values ​​in a somewhat similar way to Daniel's answer.

The main difference is the final average. The average value should be taken as the number of unsaturated values, which is equal to K.sum(mask) . And also y_true can be directly compared with one hot coded vector [0, 0, 1] .

 def get_loss(mask_value): mask_value = K.variable(mask_value) def masked_categorical_crossentropy(y_true, y_pred): # find out which timesteps in `y_true` are not the padding character '#' mask = K.all(K.equal(y_true, mask_value), axis=-1) mask = 1 - K.cast(mask, K.floatx()) # multiply categorical_crossentropy with the mask loss = K.categorical_crossentropy(y_true, y_pred) * mask # take average wrt the number of unmasked entries return K.sum(loss) / K.sum(mask) return masked_categorical_crossentropy masked_categorical_crossentropy = get_loss(np.array([0, 0, 1])) model = Model(input_tensor, output) model.compile(loss=masked_categorical_crossentropy, optimizer='adam') 

The result of the above code shows that the loss is calculated only by unoiled values:

 model.evaluate: 1.08339476585 tf unmasked_loss: 1.08989 tf masked_loss: 1.08339 

The value is different from yours because I changed the axis argument in tf.reverse from [0,1] to [1] .

+2
source share

If you don't use masks, as in Yu-Yang's answer, you can try this.

If you have Y target data with a length and complemented by a mask value, you can:

 import keras.backend as K def custom_loss(yTrue,yPred): #find which values in yTrue (target) are the mask value isMask = K.equal(yTrue, maskValue) #true for all mask values #since y is shaped as (batch, length, features), we need all features to be mask values isMask = K.all(isMask, axis=-1) #the entire output vector must be true #this second line is only necessary if the output features are more than 1 #transform to float (0 or 1) and invert isMask = K.cast(isMask, dtype=K.floatx()) isMask = 1 - isMask #now mask values are zero, and others are 1 #multiply this by the inputs: #maybe you might need K.expand_dims(isMask) to add the extra dimension removed by K.all yTrue = yTrue * isMask yPred = yPred * isMask return someLossFunction(yTrue,yPred) 

If you have a padding for input only, or if Y has no length, you can have your own mask outside the function:

 masks = [ [1,1,1,1,1,1,0,0,0], [1,1,1,1,0,0,0,0,0], [1,1,1,1,1,1,1,1,0] ] #shape (samples, length). If it fails, make it (samples, length, 1). import keras.backend as K masks = K.constant(masks) 

Since masks depend on your input, you can use your mask value to know where to put zeros, for example:

 masks = np.array((X_train == maskValue).all(), dtype='float64') masks = 1 - masks #here too, if you have a problem with dimensions in the multiplications below #expand masks dimensions by adding a last dimension = 1. 

And make your function by taking the masks from the outside (you have to recreate the loss function if you change the input):

 def customLoss(yTrue,yPred): yTrue = masks*yTrue yPred = masks*yPred return someLossFunction(yTrue,yPred) 

Does anyone know if keras will automatically mask the loss function? Since it provides a masking layer and says nothing about the conclusions, maybe it does it automatically?

-one
source share

All Articles