The easiest and fastest way to detect sound activity?

This array is an array of 320 elements (int16), which is an audio signal (16-bit LPCM) lasting 20 ms. I am looking for the simplest and fastest method that should decide if this array contains active sound (like speech or music), but not noise or silence. I do not need a very high quality solution, but it must be very fast.

At first it occurred to me to add all the squares or absolute values ​​of the elements and compare their sum with the threshold, but this method is very slow in my system, even if it is O (n).

+7
c algorithm embedded audio signal-processing
source share
4 answers

You won’t get much faster than the sum of squares approach.

One optimization that you may not be doing so far is to use the total amount. That is, at each time step, instead of summing the squares of the last n samples, save the total number and update it using the square of the most recent sample. To avoid your overall rate increasing and increasing over time, add an exponential decline. In pseudo code:

decay_constant=0.999; // Some suitable value smaller than 1 total=0; for t=1,... // Exponential decay total=total*decay_constant; // Add in latest sample total+=current_sample; if total>threshold // do something end end 

Of course, you will need to adjust the decay constant and threshold according to your application. If it's not fast enough to work in real time, you have a seriously weakened DSP ...

+4
source share

You can try to calculate two simple "statistics" - they will be distributed first (max-min). Silence will be very low. Secondly, there will be a variation - divide the range of possible values ​​in 16 brackets (= range of values), and when you go through the elements, determine which bracket this element belongs to. Noise will have the same numbers for all brackets, while music or speech should prefer some of them, neglecting others.

This can be done in just one pass through the array, and you do not need complicated arithmetic, just adding and comparing values.

We also consider some approximation, for example, we take only every fourth value, thus reducing the number of tested elements to 80. For an audio signal, this should be good.

+2
source share

I did something like this a while ago. After some experimentation, I came up with a solution that worked well enough in my case.

I used a rate of change in a running average cube of about 120 ms. When there is silence (only noise), the expression should soar around zero. Once the course begins to increase over several runs, you are likely to take action.

 rate = cur_avg^3 - prev_avg^3 

I used the cube because the square was simply not aggressive enough. If the cube slows down for you, try using a square and a bit shift instead. Hope this helps.

+1
source share

Obviously, the complexity should be at least O (n). Probably some simple algorithms that calculate a certain range of values ​​are good at the moment, but I would look for Voice Activity Detection on the Internet and for related code samples .

0
source share

All Articles