How can I implement maximum norm limits on scales in MLP in tensor flow? What Hinton and Dean describe in their work on dark knowledge. That is, tf.nn.dropout implements default weight limits or we need to do this explicitly, as in
https://arxiv.org/pdf/1207.0580.pdf
“If these networks have the same weights for the hidden units that are present. We use the standard stochastic gradient descent procedure to train dropout neural networks in mini-batches of training cases, but we change the penalty period, which is usually used to prevent too large weights. Instead in order to fine the square of the length (Norm L2) of the entire weight vector, we set the upper limit of the norm L2 of the incoming weight vector for each individual hidden unit. If updating the weight violates this restriction "we renormalize the weight of the hidden unit by division."
Keras seems to have this
http://keras.io/constraints/
source share