If your model has a mask, it will be distributed in stages and will eventually be applied to the loss. Therefore, if you correctly fill and mask sequences, the loss on fill placeholders will be ignored.
Some details:
A little bit to explain the whole process, so I just unlock it in a few steps:
- In
compile() mask is collected by calling compute_mask() and applied to losses (for clarity, irrelevant lines are ignored).
weighted_losses = [_weighted_masked_objective(fn) for fn in loss_functions]
- Inside
Model.compute_mask() , run_internal_graph() called. - Inside
run_internal_graph() masks in the model propagate in stages from model inputs to outputs, calling Layer.compute_mask() for each layer iteratively.
So, if you use the Masking layer in your model, you don’t have to worry about losses in fill placeholders. Loss of these entries will be masked, as you probably already saw inside _weighted_masked_objective() .
A small example:
max_sentence_length = 5 character_number = 2 input_tensor = Input(shape=(max_sentence_length, character_number)) masked_input = Masking(mask_value=0)(input_tensor) output = LSTM(3, return_sequences=True)(masked_input) model = Model(input_tensor, output) model.compile(loss='mae', optimizer='adam') X = np.array([[[0, 0], [0, 0], [1, 0], [0, 1], [0, 1]], [[0, 0], [0, 1], [1, 0], [0, 1], [0, 1]]]) y_true = np.ones((2, max_sentence_length, 3)) y_pred = model.predict(X) print(y_pred) [[[ 0. 0. 0. ] [ 0. 0. 0. ] [-0.11980877 0.05803877 0.07880752] [-0.00429189 0.13382857 0.19167568] [ 0.06817091 0.19093043 0.26219055]] [[ 0. 0. 0. ] [ 0.0651961 0.10283815 0.12413475] [-0.04420842 0.137494 0.13727818] [ 0.04479844 0.17440712 0.24715884] [ 0.11117355 0.21645413 0.30220413]]]
As you can see from this example, the loss on the masked part (zeros in y_pred ) is ignored, and the output of model.evaluate() is masked_loss .
EDIT:
If there is a repeating level with return_sequences=False , then the mask stop propagates (i.e. the returned mask is None ). In RNN.compute_mask() :
def compute_mask(self, inputs, mask): if isinstance(mask, list): mask = mask[0] output_mask = mask if self.return_sequences else None if self.return_state: state_mask = [None for _ in self.states] return [output_mask] + state_mask else: return output_mask
In your case, if I understand correctly, you need a mask based on y_true , and whenever the value of y_true is [0, 0, 1] (hot coding "#"), you want the loss to be masked. If so, you need to mask the loss values in a somewhat similar way to Daniel's answer.
The main difference is the final average. The average value should be taken as the number of unsaturated values, which is equal to K.sum(mask) . And also y_true can be directly compared with one hot coded vector [0, 0, 1] .
def get_loss(mask_value): mask_value = K.variable(mask_value) def masked_categorical_crossentropy(y_true, y_pred):
The result of the above code shows that the loss is calculated only by unoiled values:
model.evaluate: 1.08339476585 tf unmasked_loss: 1.08989 tf masked_loss: 1.08339
The value is different from yours because I changed the axis argument in tf.reverse from [0,1] to [1] .