Customize audio device format and render callback for interleaved PCM audio signals

I am currently trying to play the audio that I receive in a series of UDP packets. They are decoded into PCM frames with the following properties:

  • 2 channels
  • interleaved
  • 2 bytes per sample in one channel (so 4 bytes per frame)
  • with a sampling rate of 48000.

Each UDP packet contains 480 frames, so the buffer size is 480 * 2 (channels) * 2 (bytes per channel).

I need to configure the Audio Unit to play these packages. So my first question is: how do I set up the AudioStreamBasicDescription structure for the Audio Unit? Looking at the documentation, I'm not even sure that interleaved PCM is an acceptable format.

This is what I have so far:

struct AudioStreamBasicDescription { Float64 mSampleRate; //48000 UInt32 mFormatID; //????? UInt32 mFormatFlags; //????? UInt32 mBytesPerPacket; //Not sure what "packet" means here UInt32 mFramesPerPacket; //Same as above UInt32 mBytesPerFrame; //Same UInt32 mChannelsPerFrame; //2? UInt32 mBitsPerChannel; //16? UInt32 mReserved; //??? }; typedef struct AudioStreamBasicDescription AudioStreamBasicDescription; 

Secondly, after setting up, I'm not sure how to get frames from a UDP callback to the actual audio rendering function.

I currently have a callback function from a socket listener in which I generate int16 * buffers containing the audio I want to play. As I understand it, I also need to implement a render callback for an audio unit of the following form:

 OSStatus RenderFrames( void *inRefCon, AudioUnitRenderActionFlags *ioActionFlags, const AudioTimeStamp *inTimeStamp, UInt32 inBusNumber, UInt32 inNumberFrames, AudioBufferList *ioData) { //No idea what I should do here. return noErr; } 

Combining all this, I think that a socket reception callback should be executed, it is to decode the frames and put them in the buffer structure so that the RenderFrames callback can extract the frames from this buffer and play them back. Is it correct? And if so, as soon as I get the next frame in the RenderFrames function , how do I actually β€œsend it” to play ?

+4
source share
1 answer

Taking this section at a time

AudioStreamBasicDescriptor

Apple's documentation for ASBD is here . To clarify:

  • An audio frame is a matching set of audio recordings. In other words, one sample per channel. Therefore, for Stereo, this means 2 .
  • There is no packetization for PCM formats. Presumably mBytesPerPacket = mBytesPerFrame , mFramesPerPacket=1 , but I'm not sure if this is ever used.
  • mReserved not used and should be 0
  • Refer to the Documentation for mFormatID and mFormatFlags . In CoreAudioTypes.h there is a convenient helper function CalculateLPCMFlags for calculating the last of them in CoreAudioTypes.h .
  • Multichannel sound is usually alternated (you can set the bit in mFormatFlags if you really don't want to).
  • There is another helper function that can fill the entire ASBD - FillOutASBDForLPCM() for ordinary cases of linear PCM.
  • Many combinations of mFormatID and mFormatFlags not supported by remote units - I found that experimentation is necessary for iOS.

Here is some working code from one of my projects:

 AudioStreamBasicDescription inputASBL = {0}; inputASBL.mSampleRate = static_cast<Float64>(sampleRate); inputASBL.mFormatID = kAudioFormatLinearPCM; inputASBL.mFormatFlags = kAudioFormatFlagIsPacked | kAudioFormatFlagIsSignedInteger, inputASBL.mFramesPerPacket = 1; inputASBL.mChannelsPerFrame = 2; inputASBL.mBitsPerChannel = sizeof(short) * 8; inputASBL.mBytesPerPacket = sizeof(short) * 2; inputASBL.mBytesPerFrame = sizeof(short) * 2; inputASBL.mReserved = 0; 

Reject Callbacks

CoreAudio governs what Apple describes as a pull model. That is, the render callback is called the real-time stream form when CoreAudio needs to fill up the buffer. From your question, it seems that you expect the opposite - push the data to the audio output.

There are two implementation options:

  • Perform non-blocking reads from a UDP socket in a render callback (generally, everything you do here should be fast and non-blocking).
  • Support audio FIFO, into which samples are inserted during reception and consumption using a callback rendering.

The second option is probably the best choice, but you will need to manage the over- and under-run buffer yourself.

The ioData argument indicates the scatter assembly control structure. In the simplest case, it points to a single buffer containing all frames, but may contain several that have enough frames between them to satisfy inNumberFrames . Typically, one inNumberFrames buffer large enough for inNumberFrames , copies samples into it, and then modifies the AudioBufferList object that indicates the purchase of ioData to point to it.

In your application, you can potentially use the scatter method on your decoded audio packets, allocating buffers as they are decoded. However, you do not always get the required delay, and you may not be able to arrange for inNumberFrames the same as your decoded UDP audio frames.

+6
source

All Articles