How can I encode and segment audio files without gaps (or sounds) between segments when it is restored?

I am working on a web application that requires the streaming and synchronization of several audio files. For this, I use the Web Audio API via HTML5 tags because of the importance of sound synchronization.

I am currently using the FFMPEG segmentation function to encode and segment audio files into smaller pieces. The reason I separate them is because I can start streaming from the middle of the file, rather than starting from the very beginning (otherwise I would just split the files using UNIX split as shown here . The problem is that when I stitch segments of audio data together, I get audio pop between the segments.

If I encode segments using PCM encoding (pcm_s24le) in a .wav file, the playback will be smooth, which makes me think that the encoder fills either the beginning or the end of the file. Since I will be dealing with many different audio files, using .wav will require too much bandwidth.

I am looking for one of the following solutions to the problem:

  • How you can easily segment encoded audio files,
  • How to force the encoder NOT to connect audio frames using ffmpeg (or another utility) or
  • What is better for streaming audio (starting from an arbitrary track time) without using an audio tag?

System Information

  • Custom node.js server
  • After downloading the audio file, node.js transfers the data to the ffmpeg encoder
  • You must use the supported HTML5 encoding Web Audio API
  • The server sends sound fragments 1 at a time through the WebSockets socket

Thanks in advance. I tried to be as clear as possible, but if you need clarification, I would more than want to provide it.

+8
html5 ffmpeg audio-streaming web-audio
source share
1 answer

Since PCM is an uncompressed format, smooth playback is expected. There is nothing that could create a failure. The same thing happens if you use some kind of lossless codec, for example flac. On the other hand, if you use any lossy codec, such as mp3, wma, etc., there is no way to avoid glitches without any intervention. A WMA decoder, for example, will always give you more PCM than you originally provided during encoding. These extra bytes will cause a glitch, and it will also ruin the duration. In addition, such a combined playback (list) will have a longer duration than necessary. You can try to smooth out the failure with some DSP filtering. You can even try some simple actions, such as crosshatch transitions, etc. Perhaps this will give some useful results.

If any lossy codec is unacceptable due to the bandwidth, another approach would be to create compressed files with the lost codec, such as mp3, and start streaming from the calculated position. Of course, you cannot exactly search for a sample, as in PCM, and you will get a small amount of useless PCM when decoding, because you will start to decode the compressed data in the middle without the β€œprevious data” required by the decoder. I would suggest a constant bitrate when encoding such files, because you can quickly calculate the search position in a compressed file before streaming begins.

As for the glitches here, if you start to encode such mp3 files, and you create these files WITHOUT stopping the encoder, then there will be no failures when switching files, because you just split the compressed data into more files. Of course, you probably have to implement this yourself.

+2
source share

All Articles