I have similar problems. I deal with a lot of audio over the Internet. I have to automate sound degradation checks, sound reproduction, etc. I could not find a library that could help me in Groovy or Ruby (I did not check Python), so I made a call to the installed program (sox) and it will check the real time if the sound was heard or not.
This was my Groovy / script call:
def audioCheck = "sox -t coreaudio Soundflower /Users/me/project/record.wav silence 1 0.1 1% 1 .1 1%".execute() audioCheck.waitFor() println "EXIT VALUE FOR SOX IS: ${audioCheck.exitValue()}" if(audioCheck.exitValue() == 0){ // some stuff would happen now, if exit code is 0 }
About Core Audio and "Soundflower"
The Sox call specifies a sound card with -t coreaudio (on Linux, you will probably use alsa instead of "coreaudio" - coreauduio is the OSX audio interface.) "Soundflower The parameter is the channel I created in Soundflower. It was important for me to record the sound separately (since I did sound quality checks.) For example, if you used the default channel, for example alsa default or coreaudio default , it will pick up your microphone ... so if the guy next to you sneezes ... he corrupts the test.
Using a virtual audio channel (for example, from Soundflower to OSX), you can disable all the "Soundflower" audio output and then run the sox command on that channel ... and you will only listen to the system sound coming out of the virtual channel.
Instead, you can use your default channel (but it will end when it hears other sounds in the room.)
I ran this asynchronous code using Groovy tasks ... so the code was enclosed in something like:
def listener = task { ... my script above ... }
Since he listened asynchronously, he would not block the rest of the test.
Sox Team
The actual sox real-time command that I used came from this guy (check it out as it uses several different parameters):
https://www.youtube.com/watch?v=Q5ntlKE0ze4
Expansion
I pushed it a bit, trying to automate the recording of sound played through the browser, and then use the PESQ algorithm to determine how close the recorded sound was to the original sound being played. If you are interested, feel free to check out my post:
http://sdet.us/webrtc-audio-quality-automation-with-pesq/
Python PCAP Scraping
It may not be related to what you are doing, but I also played using the Pyshark library to recover audio from packet captures ... it is more complex and perhaps more fragile. But if it is interesting, here is my implementation:
http://sdet.us/python-pcap-parsing-audio-from-sip-call/