I have no answer, but I can point you in the direction in which I would take myself.
The MIDI file format is pretty well standardized and documented. Wikipedia does not provide a link to it, but I remember how it found a good 10 years ago on the network (before even Google was born!), So I do not expect that you will have problems finding it today.
The format is "chunked", which means that karaoke information is most likely stored in a special form. The rest is reverse engineering. Take a karaoke file (.kar, as I understand it, this is a .mid with fragments of texts), drop the known pieces, and soon you will find fragments of text. The information there should not be too complicated to decrypt.
Added: It is also said that KMid supports them, so its source code can shed some light.
Vilx- source share