How to prepare a machine learning algorithm using coefficient vectors MFCC?

Question

How to prepare a machine learning algorithm using coefficient vectors MFCC?

For my project last year, I am trying to determine the sound of a dog / bark / bird in real time (by recording sound clips). I use MFCC as audio functions. Initially, I extracted a total of 12 MFCC vectors from a sound clip using the jAudio library. Now I am trying to teach machine learning algorithm (at the moment I have not solved the algorithm, but it is rather SVM). The size of the sound clip is approximately 3 seconds. I need to clarify some information about this process. They,

Do I need to train this algorithm using MFCC with frames (12 per frame) or or a common clip based on MFCC (12 per frame)?
To train the algorithm, should I consider all 12 MFCCs as 12 different attributes, or should I consider these 12 MFCCs as one attribute?

These MFCCs are common MFCCS for the clip,

-9.598802712290967 -21.644963856237265 -7.405551798816725 -11.638107212413201 -19.441831623156144 -2.780967392843105 -0.5792847321137902 -13.14237288849559 -4.920408873192934 -2.7111507999281925 -7.336670942457227 2.4687330348335212

Any help would be truly appreciated to overcome these problems. I could not find good help on Google. :)

+6

machine-learning signal-processing audio-processing audio-fingerprinting mfcc

nayakPan Feb 07 '16 at 12:05

source share

1 answer

Lukasz Tracewski · Accepted Answer · 2016-02-07T14:27:21+0000

You must calculate the MFCC per frame. Since your signal changes over time, taking them all over the clip does not make sense. Worse, you might have a dog and a bird with a similar view. I would experiment with a few frames. In general, they will be in the order of milliseconds.
All must be separate functions. Let the machine learning algorithm determine which of them are the best predictors.

Remember that MFCCs are sensitive to noise, so first check how your samples sound. A further selection of audio functions for extraction is proposed, for example, the Yaafe library , many of which will be better in your case. What exactly? Here is what I found most useful in classifying bird calls:

spectral flatness
perceptual spread
spectral rolloff
spectral reduction
spectral form statistics
spectral tilt
Linear Predictive Coding (LPC)
Linear Spectral Pairs (LSP)

You might find it interesting to check out this project , especially the part where I interact with Yaafe.

In the days when I used SVM, exactly the same as you plan. Today I would finally go with an increase in the gradient.

How to prepare a machine learning algorithm using coefficient vectors MFCC?

More articles: