Sound spectrogram

I made an application that paints FFT on the screen in real time (from the microphone). The time along the x axis, the frequency along the y axis, and the color of the pixel represent the amplitude (to a large extent FFT vanilla spectroscopy).

My problem is that although I see a drawing from the music, there is also a lot of noise. As a result of this, I see people applying a logarithmic calculation to amplitude. Should I do this? And if so, what would the formula look like? (I use C #, but I can translate the math into code so that any pattern is fine.)

I can get around this problem by applying a color scheme showing lower values ​​as darker colors. I'm just not sure if the sound is correctly presented without a logarithmic calculation on it.

+4
source share
2 answers

Representing the amplitude on a logarithmic scale approximates the sensitivity of the human auditory system and, therefore, gives a better idea of ​​what you hear compared to the non-logarithmic scale. Mathematically, all you have to do is:

Alog = 20*log10 (abs (A)) 

Where A is the amplitude of the FFT data, and Alog is the output. factor 20 is merely convention and does not affect the image, which you probably scale in the color scheme.

EDIT

Explanation regarding the coefficient 20 : the dB (decibel) block is a logarithmic unit measuring the ratios : it is a scale on which the distance between 100 and 10 is the same as between 1000 and 100 (since they have the same ratio: 1000/100 = 100/10). If you measure it in dB, you get:

 10*log10 (1000/100) = 10*log10 (100/10) = 10 

A factor of 10 is that deci means tenth , which means that 1 Bel is 10 decibels (for example, 1 kilogram is 1000 grams)

Since the human auditory system also (approximately) measures the ratio, it makes sense to measure the sound level on a logarithmic scale, i.e. measures the ratio of sound level to some reference value. Since the sound level is related to the power (in watts) of the sound wave, you are actually measuring the P / Pref power ratio. In addition, power is proportional to the square of the amplitude, so all you get is:

 10*log10 (P/Pref) = 10*log10 (A^2 / Aref^2) = 20*log10 (A/Aref) 

according to the rules of the magazine. What is the origin of factor 20 - remember that in a computer the sound is represented by the instantaneous amplitude of the sound wave.

+8
source

Viewing your spectrogram logarithmically is really the best way to view audio signals. Keep in mind also that you need a good resolution both in the time direction and in the frequency direction. If you have too few boxes in one or the other, this may look strange.

Another important point is that viewing your STFT on the log scale is not a noise reduction method. What you see as β€œnoise” may be actual noise , or it may be such things as transients , spectral leakage and other things that are expected to be there. Depending on your application, if you need to conduct a brief analysis of the signal, wavelet conversion may be more suitable for your needs. This eliminates some of the disadvantages of STFT, such as a constant window size.

+3
source

All Articles