Speex Echo Cancellation Configuration

I am making an Android-to-Android VoIP (speaker) application using its AudioRecord and AudioTrack class, as well as Speex via NDK to cancel the echo. I was able to successfully navigate and retrieve data from the Speex speex_echo_cancellation () function, but the echo remains.

Here is the corresponding android stream code that records / transmits and receives / plays audio:

//constructor public MyThread(DatagramSocket socket, int frameSize, int filterLength){ this.socket = socket; nativeMethod_initEchoState(frameSize, filterLength); } public void run(){ short[] audioShorts, recvShorts, recordedShorts, filteredShorts; byte[] audioBytes, recvBytes; int shortsRead; DatagramPacket packet; //initialize recorder and player int samplingRate = 8000; int managerBufferSize = 2000; AudioTrack player = new AudioTrack(AudioManager.STREAM_MUSIC, samplingRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT, managerBufferSize, AudioTrack.MODE_STREAM); recorder = new AudioRecord(MediaRecorder.AudioSource.MIC, samplingRate, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, managerBufferSize); recorder.startRecording(); player.play(); //record first packet audioShorts = new short[1000]; shortsRead = recorder.read(audioShorts, 0, audioShorts.length); //convert shorts to bytes to send audioBytes = new byte[shortsRead*2]; ByteBuffer.wrap(audioBytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(audioShorts); //send bytes packet = new DatagramPacket(audioBytes, audioBytes.length); socket.send(packet); while (!this.isInterrupted()){ //recieve packet/bytes (received audio data should have echo cancelled already) recvBytes = new byte[2000]; packet = new DatagramPacket(recvBytes, recvBytes.length); socket.receive(packet); //convert bytes to shorts recvShorts = new short[packet.getLength()/2]; ByteBuffer.wrap(packet.getData(), 0, packet.getLength()).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(recvShorts); //play shorts player.write(recvShorts, 0, recvShorts.length); //record shorts recordedShorts = new short[1000]; shortsRead = recorder.read(recordedShorts, 0, recordedShorts.length); //send played and recorded shorts into speex, //returning audio data with the echo removed filteredShorts = nativeMethod_speexEchoCancel(recordedShorts, recvShorts); //convert filtered shorts to bytes audioBytes = new byte[shortsRead*2]; ByteBuffer.wrap(audioBytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(filteredShorts); //send off bytes packet = new DatagramPacket(audioBytes, audioBytes.length); socket.send(packet); }//end of while loop } 

Here is the corresponding NDK / JNI code:

 void nativeMethod_initEchoState(JNIEnv *env, jobject jobj, jint frameSize, jint filterLength){ echo_state = speex_echo_state_init(frameSize, filterLength); } jshortArray nativeMethod_speexEchoCancel(JNIEnv *env, jobject jObj, jshortArray input_frame, jshortArray echo_frame){ //create native shorts from java shorts jshort *native_input_frame = (*env)->GetShortArrayElements(env, input_frame, NULL); jshort *native_echo_frame = (*env)->GetShortArrayElements(env, echo_frame, NULL); //allocate memory for output data jint length = (*env)->GetArrayLength(env, input_frame); jshortArray temp = (*env)->NewShortArray(env, length); jshort *native_output_frame = (*env)->GetShortArrayElements(env, temp, 0); //call echo cancellation speex_echo_cancellation(echo_state, native_input_frame, native_echo_frame, native_output_frame); //convert native output to java layer output jshortArray output_shorts = (*env)->NewShortArray(env, length); (*env)->SetShortArrayRegion(env, output_shorts, 0, length, native_output_frame); //cleanup and return (*env)->ReleaseShortArrayElements(env, input_frame, native_input_frame, 0); (*env)->ReleaseShortArrayElements(env, echo_frame, native_echo_frame, 0); (*env)->ReleaseShortArrayElements(env, temp, native_output_frame, 0); return output_shorts; } 

This code works fine, and audio data is, of course, sent / received / processed / played from android-and-android. Given that the sampling frequency is 8000 Hz and the packet size is 2000 bytes / 1000 images, I found that in order for the audio to play back smoothly, a frame size of 1000 is required. Most filterLength values ​​(also tail length according to Speex doc) will work, but do not seem to affect echo removal.

Does anyone understand enough AEC to provide me some guidance on implementing or configuring Speex? Thank you for reading.

+6
source share
2 answers

Your code is right, but something is missing in your own codes, I modified the init method and added the speex preprocess after canceling the echo, then your code worked fine (I tried on Windows) Here is the native code

 #include <jni.h> #include "speex/speex_echo.h" #include "speex/speex_preprocess.h" #include "EchoCanceller_jniHeader.h" SpeexEchoState *st; SpeexPreprocessState *den; JNIEXPORT void JNICALL Java_speex_EchoCanceller_open (JNIEnv *env, jobject jObj, jint jSampleRate, jint jBufSize, jint jTotalSize) { //init int sampleRate=jSampleRate; st = speex_echo_state_init(jBufSize, jTotalSize); den = speex_preprocess_state_init(jBufSize, sampleRate); speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate); speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st); } JNIEXPORT jshortArray JNICALL Java_speex_EchoCanceller_process (JNIEnv * env, jobject jObj, jshortArray input_frame, jshortArray echo_frame) { //create native shorts from java shorts jshort *native_input_frame = (*env)->GetShortArrayElements(env, input_frame, NULL); jshort *native_echo_frame = (*env)->GetShortArrayElements(env, echo_frame, NULL); //allocate memory for output data jint length = (*env)->GetArrayLength(env, input_frame); jshortArray temp = (*env)->NewShortArray(env, length); jshort *native_output_frame = (*env)->GetShortArrayElements(env, temp, 0); //call echo cancellation speex_echo_cancellation(st, native_input_frame, native_echo_frame, native_output_frame); //preprocess output frame speex_preprocess_run(den, native_output_frame); //convert native output to java layer output jshortArray output_shorts = (*env)->NewShortArray(env, length); (*env)->SetShortArrayRegion(env, output_shorts, 0, length, native_output_frame); //cleanup and return (*env)->ReleaseShortArrayElements(env, input_frame, native_input_frame, 0); (*env)->ReleaseShortArrayElements(env, echo_frame, native_echo_frame, 0); (*env)->ReleaseShortArrayElements(env, temp, native_output_frame, 0); return output_shorts; } JNIEXPORT void JNICALL Java_speex_EchoCanceller_close (JNIEnv *env, jobject jObj) { //close speex_echo_state_destroy(st); speex_preprocess_state_destroy(den); } 

You can find useful examples such as encoding, decoding, echo cancellation in the speex library source (http://www.speex.org/downloads/)

+2
source

Are you combining the far-end signal (what you call recv) and the near-end signal (what you call recording) correctly? There is always some waiting time for playback / recording to be considered. This typically requires buffering the far end signal in the ring buffer for a certain period of time. On a PC, this is usually about 50 - 120 ms. On Android, I suspect this is much higher. Probably in the range of 150 to 400 ms. I would recommend using a 100ms taillength with speex and adjust the size of your far buffer until the AEC converges. These changes should allow AEC to converge regardless of the inclusion of a preprocessor, which is not required here.

+2
source

Source: https://habr.com/ru/post/927035/


All Articles