Weight loss does not necessarily increase productivity. In my own experience, I quite often found that my models work worse (as measured by some metrics on a discreet set) with any significant amount of weight decay. This is a useful form of regularization that you need to know about, but you donโt need to add it to each model without considering whether it seems necessary or comparing performance with and without.
As for whether the decay of the weight only on the part of the model can be good compared to the mass consumption on the whole model, it seems that it is less likely to distribute only some of the weights in this way. However, I do not know what the theoretical reason for this is. In general, neural networks already have too many hyperparameters to configure. Whether the use of weight loss or not is already a question, and how much weight can be regulated if you do it. If you also wonder which layers should change this way, you quickly run out of time to test the performance of all the different ways you could turn on and off for each layer.
I assume that there are models that will benefit from the breakdown of weight only on part of the model; I donโt think it was done often because itโs hard to check all the possibilities and figure out which one works best.
Nathan
source share