How can I generate timed-text (e.g. for subtitles) synchronized with text to speech ( TTS ) one word at a time?
I would like to do this using high-quality SAPI5 voices (for example, those that are available from IVONA here ) and what I used in Windows 10.
On Windows, we already have good free TTS programs:
- Read4Me - Open Source
- Balabolka - closed source
- TTSApp Microsoft has its own very simple GUI - currently here - it looks like it has been since 2001.
TTSApp can create audio files in WAV. Chatterbox creates MP3 files along with synchronized time text in the form of LRC files used in karaoke, but only on a line by line basis. However, both show highlighting while they speak out loud on the screen - in real time.
If I had the TTS / SAPI5 source code, I could just check the clock every time a new word starts to be generated and write the time and that word to a file. Does anyone know of any project that reveals this level of programming - so what could I start from there?
UPDATE SEPT 2016
Since then, I found that TTSApp redefined using AutoHotKey by a specific jballi in 2012.
, onWord.
:
2.
BTW VisualBasic .