I want to add timestamps for booking offers, customizing the appropriate audiobook. In different languages ββperfect.
Here is an example:
Pride and Prejudice
text from gutenberg project
audio from librivox
My idea was to find a voice recognition tool that puts timestamps on sentences (step 1), and then matches the messy transcription to the source text using levenshtein distances (step 2).
The website https://speechlogger.appspot.com/ offers a solution for the 1st step, but it is limited to the output of the character. I could theoretically use web automation to do the job, starting every new post every minute or so, but it's really dirty.
I followed step 2 in R and tested it on the sample I received from the speech device, and it works fine, but it can be greatly improved if the program knows the text, for example, when you read, to prepare the speech recognition software. I do not use all my information here, first rewriting.
So my questions are: what alternative methods can I use to create temporary audio file files, and is there a way to make my process smarter by letting the recognition engine know that it should recognize?
Moody_Mudskipper
source share