Not the best answer I know, but a very similar solution was discussed in a WWDC 2011 video using Core Animation.
Find a video called Core Animation Essentials (there should be 421 video on iTunes U for WWDC 2011). An example should be about 29 minutes. Basically, the scenario was that there was a ball bouncing over the lyrics that were played for the song. The project talks about how to revive time and position over words using an interesting idea ...
Although it will take a little time to develop, implement the project with NSTimer and your song, starting from the same time. Implement the project so that every time you click on the screen, the offset from the last time interval is inserted into the NSMutableSArray, which is then written to the file. Now start the project and make timestamps for the words by clicking down when each word is pronounced. (This assumes that you already know what songs you are going to implement in advance, and the speed of singing is not too intense). Ok ... now you have your metadata.
I would recommend trying a bouncing ball first, because the implementation is already described, and I see a couple of problems with what you are trying to implement. First of all, (maybe I'm wrong), but I donβt think that UIlabel / NSString has any methods of selecting substrings. This means that you may have to make a shortcut for each individual word, which can become really tedious ... So check out this video and hopefully you can do something. Good luck
Lolgrep
source share