Rendering will depend on the exact fonts used by browsers, and on the exact ascent and descent indicators that they choose to use for these fonts. Note that you do not even tell them to use the same font family in your violin.
But even in the past, what CSS2.1 has to say on this matter (from http://www.w3.org/TR/CSS21/visudet.html#inline-non-replaced ):
The height of the content area should be based on the font, but this specification does not specify how. A UA may, eg, use the em-box or the maximum ascender and descender of the font.
I suspect that the browsers you are looking at are actually just using different definitions of the inline content area.
source share