How to align stereo input and apply sound effect to only one channel on iOS?

I need to handle stereo audio in iOS as follows:

  • Both channels should have equal intensity, i.e. make stereo mono
  • Tune the monaural sound to the left and right channels.
  • Apply effects to the sound that is output to the right channel

I currently have:

+-------------------+ | AVAudioPlayerNode +------------------------+ +--------^----------+ | | | +--------+---------+ +--------v---------+ File ---> AVAudioPCMBuffer | | AVAudioMixerNode +---> Output +--------+---------+ +--------^---------+ | | +--------v----------+ +-------------------+ | | AVAudioPlayerNode +--> AVAudioUnitEffect +-+ +-------------------+ +-------------------+ 

An effect is a subclass of AVAudioUnitEffect.

I'm having trouble inputting stereo input as mono and output AVAudioPlayerNode for channel separation.

I tried to set the PlayerNodes volume to 0.5 and pan to -1.0 and 1.0, but since the input is stereo, this does not give the desired effects.

With AVFoundation, I suppose I have at least two options: either I ...

(1) align channels for PlayerNodes, so both types of PlayerNodes are displayed as mono - after which I could use the same logic as before: with equal volume on both players, another pan left and right and apply the effect on one PlayerNode, after MixerNode result will be displayed only in the right output channel.

(2) Hold PlayerNodes as stereo (pan = 0.0), apply the effect to only one PlayerNode, and then tell MixerNode to use one PlayerNode channel for both channels as the source for the left channel, and other channels for the correct channel, I assume that then MixerNode will effectively align the input channels so that it looks when the mono input and the effect can be heard from only one output channel.

The question arises: is any of the above strategies possible and how? Is there any other option that I forgot?

I use Swift for a project, but I can handle Objective-C.


Judging by the lack of answers and my own research, it seems to me that AVFoundation may not be so. The ease of use of AVFoundation is tempting, but I am open to alternatives. I am currently studying MTAudioProcessingTap classes, and they can be useful. Help is still appreciated.

+8
ios objective-c swift avfoundation audio
source share
1 answer

I managed to get the desired result using the two AVPlayers that I play at the same time. One AVPlayer has an input that has averaged audio data on the left channel and silence on the right; and vice versa in another AVPlayer. Finally, the effect applies to only one instance of AVPlayer.

Because applying the patented effect on an AVPlayer instance was trivial, the biggest obstacle was aligning the stereo input.

I found a couple of related questions ( Panning a mono signal using MultiChannelMixer and MTAudioProcessingTap , Playing AVPlayer single-channel stereo → mono sound ) and a tutorial ( Processing AVPlayers audio using MTAudioProcessingTap - which was referenced in almost all other tutorials I tried to google), all of which indicate that the solution is probably in the MTAudioProcessingTap.

Unfortunately, the official documentation for the MTAudioProcessing tap (or any other aspect of MediaToolbox) is more or less null. I mean, only some sample code was found online and the headers (MTAudioProcessingTap.h) through Xcode. But with the above tutorial I managed to run.

To make things not too easy, I decided to use Swift rather than Objective-C, in which existing tutorials were available. Call conversion is not so bad, and I even found an almost ready-made example of creating an MTAudioProcessingTap in Swift 2 . I managed to connect to the processing of taps and easily manipulate the sound with it (well, I could bring the stream as it is and, at least, completely reset it). However, to align the channels, the task was to speed up the framework , namely vDSP .

However, using C APIs that use pointers (case in point: vDSP) widely with Swift gets pretty cumbersome pretty quickly - at least compared to how it is done using Objective-C. This was also a problem when I originally wrote MTAudioProcessingTaps in Swift: I could not pass AudioTapContext without problems (in Obj-C, getting context is as simple as AudioTapContext *context = (AudioTapContext *)MTAudioProcessingTapGetStorage(tap); ), and all UnsafeMutablePointers make me think that Swift is not the right tool for the job.

So, for the processing class, I dropped Swift and reorganized it into Objective-C.
And, as mentioned earlier, I use two AVPlayers; so in AudioPlayerController.swift I have:

 var left = AudioTap.create(TapType.L) var right = AudioTap.create(TapType.R) asset = AVAsset(URL: audioList[index].assetURL!) // audioList is [MPMediaItem]. asset is class property let leftItem = AVPlayerItem(asset: asset) let rightItem = AVPlayerItem(asset: asset) var leftTap: Unmanaged<MTAudioProcessingTapRef>? var rightTap: Unmanaged<MTAudioProcessingTapRef>? MTAudioProcessingTapCreate(kCFAllocatorDefault, &left, kMTAudioProcessingTapCreationFlag_PreEffects, &leftTap) MTAudioProcessingTapCreate(kCFAllocatorDefault, &right, kMTAudioProcessingTapCreationFlag_PreEffects, &rightTap) let leftParams = AVMutableAudioMixInputParameters(track: asset.tracks[0]) let rightParams = AVMutableAudioMixInputParameters(track: asset.tracks[0]) leftParams.audioTapProcessor = leftTap?.takeUnretainedValue() rightParams.audioTapProcessor = rightTap?.takeUnretainedValue() let leftAudioMix = AVMutableAudioMix() let rightAudioMix = AVMutableAudioMix() leftAudioMix.inputParameters = [leftParams] rightAudioMix.inputParameters = [rightParams] leftItem.audioMix = leftAudioMix rightItem.audioMix = rightAudioMix // leftPlayer & rightPlayer are class properties leftPlayer = AVPlayer(playerItem: leftItem) rightPlayer = AVPlayer(playerItem: rightItem) leftPlayer.play() rightPlayer.play() 

I use "TapType" to highlight channels and is defined (in Objective-C) as simple as:

 typedef NS_ENUM(NSUInteger, TapType) { TapTypeL = 0, TapTypeR = 1 }; 

MTAudioProcessingTap callbacks are created in much the same way as in the tutorial . However, when I create it, I save TapType in context, so I can check it in ProcessCallback:

 static void tap_InitLeftCallback(MTAudioProcessingTapRef tap, void *clientInfo, void **tapStorageOut) { struct AudioTapContext *context = calloc(1, sizeof(AudioTapContext)); context->channel = TapTypeL; *tapStorageOut = context; } 

And finally, the actual weightlifting happens in the process callback using the vDSP functions:

 static void tap_ProcessCallback(MTAudioProcessingTapRef tap, CMItemCount numberFrames, MTAudioProcessingTapFlags flags, AudioBufferList *bufferListInOut, CMItemCount *numberFramesOut, MTAudioProcessingTapFlags *flagsOut) { // output channel is saved in context->channel AudioTapContext *context = (AudioTapContext *)MTAudioProcessingTapGetStorage(tap); // this fetches the audio for processing (and for output) OSStatus status; status = MTAudioProcessingTapGetSourceAudio(tap, numberFrames, bufferListInOut, flagsOut, NULL, numberFramesOut); // NB: we assume the audio is interleaved stereo, which means the length of mBuffers is 1 and data alternates between L and R in `size` intervals. // If audio wasn't interleaved, then L would be in mBuffers[0] and R in mBuffers[1] uint size = bufferListInOut->mBuffers[0].mDataByteSize / sizeof(float); float *left = bufferListInOut->mBuffers[0].mData; float *right = left + size; // this is where we equalize the stereo // basically: L = (L + R) / 2, and R = (L + R) / 2 // which is the same as: (L + R) * 0.5 // "vasm" = add two vectors (L & R), multiply by scalar (0.5) float div = 0.5; vDSP_vasm(left, 1, right, 1, &div, left, 1, size); vDSP_vasm(right, 1, left, 1, &div, right, 1, size); // if we would end the processing here the audio would be virtually mono // however, we want to use distinct players for each channel, so here we zero out (multiply the data by 0) the other float zero = 0; if (context->channel == TapTypeL) { vDSP_vsmul(right, 1, &zero, right, 1, size); } else { vDSP_vsmul(left, 1, &zero, left, 1, size); } } 
+3
source share

All Articles