I am making an Android-to-Android VoIP (speaker) application using its AudioRecord and AudioTrack class, as well as Speex via NDK to cancel the echo. I was able to successfully navigate and retrieve data from the Speex speex_echo_cancellation () function, but the echo remains.
Here is the corresponding android stream code that records / transmits and receives / plays audio:
//constructor public MyThread(DatagramSocket socket, int frameSize, int filterLength){ this.socket = socket; nativeMethod_initEchoState(frameSize, filterLength); } public void run(){ short[] audioShorts, recvShorts, recordedShorts, filteredShorts; byte[] audioBytes, recvBytes; int shortsRead; DatagramPacket packet; //initialize recorder and player int samplingRate = 8000; int managerBufferSize = 2000; AudioTrack player = new AudioTrack(AudioManager.STREAM_MUSIC, samplingRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT, managerBufferSize, AudioTrack.MODE_STREAM); recorder = new AudioRecord(MediaRecorder.AudioSource.MIC, samplingRate, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, managerBufferSize); recorder.startRecording(); player.play(); //record first packet audioShorts = new short[1000]; shortsRead = recorder.read(audioShorts, 0, audioShorts.length); //convert shorts to bytes to send audioBytes = new byte[shortsRead*2]; ByteBuffer.wrap(audioBytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(audioShorts); //send bytes packet = new DatagramPacket(audioBytes, audioBytes.length); socket.send(packet); while (!this.isInterrupted()){ //recieve packet/bytes (received audio data should have echo cancelled already) recvBytes = new byte[2000]; packet = new DatagramPacket(recvBytes, recvBytes.length); socket.receive(packet); //convert bytes to shorts recvShorts = new short[packet.getLength()/2]; ByteBuffer.wrap(packet.getData(), 0, packet.getLength()).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(recvShorts); //play shorts player.write(recvShorts, 0, recvShorts.length); //record shorts recordedShorts = new short[1000]; shortsRead = recorder.read(recordedShorts, 0, recordedShorts.length); //send played and recorded shorts into speex, //returning audio data with the echo removed filteredShorts = nativeMethod_speexEchoCancel(recordedShorts, recvShorts); //convert filtered shorts to bytes audioBytes = new byte[shortsRead*2]; ByteBuffer.wrap(audioBytes).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().put(filteredShorts); //send off bytes packet = new DatagramPacket(audioBytes, audioBytes.length); socket.send(packet); }//end of while loop }
Here is the corresponding NDK / JNI code:
void nativeMethod_initEchoState(JNIEnv *env, jobject jobj, jint frameSize, jint filterLength){ echo_state = speex_echo_state_init(frameSize, filterLength); } jshortArray nativeMethod_speexEchoCancel(JNIEnv *env, jobject jObj, jshortArray input_frame, jshortArray echo_frame){ //create native shorts from java shorts jshort *native_input_frame = (*env)->GetShortArrayElements(env, input_frame, NULL); jshort *native_echo_frame = (*env)->GetShortArrayElements(env, echo_frame, NULL); //allocate memory for output data jint length = (*env)->GetArrayLength(env, input_frame); jshortArray temp = (*env)->NewShortArray(env, length); jshort *native_output_frame = (*env)->GetShortArrayElements(env, temp, 0); //call echo cancellation speex_echo_cancellation(echo_state, native_input_frame, native_echo_frame, native_output_frame); //convert native output to java layer output jshortArray output_shorts = (*env)->NewShortArray(env, length); (*env)->SetShortArrayRegion(env, output_shorts, 0, length, native_output_frame); //cleanup and return (*env)->ReleaseShortArrayElements(env, input_frame, native_input_frame, 0); (*env)->ReleaseShortArrayElements(env, echo_frame, native_echo_frame, 0); (*env)->ReleaseShortArrayElements(env, temp, native_output_frame, 0); return output_shorts; }
This code works fine, and audio data is, of course, sent / received / processed / played from android-and-android. Given that the sampling frequency is 8000 Hz and the packet size is 2000 bytes / 1000 images, I found that in order for the audio to play back smoothly, a frame size of 1000 is required. Most filterLength values ββ(also tail length according to Speex doc) will work, but do not seem to affect echo removal.
Does anyone understand enough AEC to provide me some guidance on implementing or configuring Speex? Thank you for reading.