Definition
Let's start with a strict definition of both:
Batch normalization 
Instance normalization 
As you can see, they do the same, with the exception of the number of input tensors, which are normalized together. The batch version normalizes all images in batch and spatial locations (in the case of CNN, in the usual case it is different ); The instance version normalizes each batch independently, i.e. only for spatial locations.
In other words, where the batch norm calculates one mean and standard deviation (thus making the distribution of the entire layer Gaussian), the instance norm calculates T of them, as a result of which each individual image distribution looks Gaussian, but not together.
A simple analogy: at the stage of data preprocessing, you can normalize the data for each image or normalize the entire data set.
Credit: formulas from here .
Which normalization is better?
The answer depends on the network architecture, in particular on what is done after the normalization level. Image classification networks usually combine object maps together and link them to the FC layer, which share the weights by package (the modern way is to use the CONV layer instead of FC, but the argument still applies).
It is here that the nuances of distribution begin to matter: the same neuron will receive information from all images. If the dispersion between batches is high, the gradient from small activations will be completely suppressed by high activations, which is exactly the problem that the batch norm is trying to solve. Therefore, it is possible that normalization for each instance does not improve network convergence at all.
On the other hand, batch normalization adds extra noise to learning, because the result for a particular instance depends on neighboring instances. As it turns out, such noise can be both good and bad for the network. This is well explained in the article “weight normalization” by Tim Salimmans et al., In which recurrent neural networks and DQNs for gain training are called noise-sensitive applications. I'm not quite sure, but I think that the same sensitivity to noise was the main problem in the stylization problem that they tried to deal with the instance norm. It would be interesting to check whether the weight norm is better for this particular task.
Can you combine batch and instance normalization?
Although he makes a valid neural network, there is no practical use for it. Batch noise normalization either helps the learning process (in this case, it is preferable), or does harm to it (in this case it is better to skip it). In both cases, logging out with one type of normalization can improve performance.