Batch Normalization uses the average mini-batch and variance to normalize the output level. If I train a network with a batch size of, say 100, but then you want to use the trained network for one-shot predictions (batch size 1), should you expect problems? Do I have to punish the level of party norms in order to converge to the transformation of identity during training in order to avoid this?
No, there are no problems in doing this, during testing, the level of normalization of the party only scales and shifts inputs, taking into account factors obtained during training.