AAC encoding using AudioConverter and recording in AVAssetWriter

Question

AAC encoding using AudioConverter and recording in AVAssetWriter

I am trying to encode audio buffers derived from AVCaptureSession using AudioConverter and then adding them to AVAssetWriter .

I am not getting any errors (including OSStatus answers), and the CMSampleBuffer Created seems to have valid data, however the resulting file simply has no reproducible sound. When recording along with video, video frames stop joining several frames ( appendSampleBuffer() returns false, but without AVAssetWriter.error ), probably because the writer waits for the asset to catch up with the sound. I suspect this is due to the fact that I am setting up priming for AAC.

The application uses RxSwift, but I removed parts of RxSwift to make it easier to understand for a wider audience.

Please see the comments in the code below for comments ...

Given the settings of the struct:

 import Foundation import AVFoundation import CleanroomLogger public struct AVSettings { let orientation: AVCaptureVideoOrientation = .Portrait let sessionPreset = AVCaptureSessionPreset1280x720 let videoBitrate: Int = 2_000_000 let videoExpectedFrameRate: Int = 30 let videoMaxKeyFrameInterval: Int = 60 let audioBitrate: Int = 32 * 1024 /// Settings that are `0` means variable rate. /// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time /// to values based on the input stream. let audioOutputABSD = AudioStreamBasicDescription( mSampleRate: AVAudioSession.sharedInstance().sampleRate, mFormatID: kAudioFormatMPEG4AAC, mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue), mBytesPerPacket: 0, mFramesPerPacket: 1024, mBytesPerFrame: 0, mChannelsPerFrame: 1, mBitsPerChannel: 0, mReserved: 0) let audioEncoderClassDescriptions = [ AudioClassDescription( mType: kAudioEncoderComponentType, mSubType: kAudioFormatMPEG4AAC, mManufacturer: kAppleSoftwareAudioCodecManufacturer) ] }

Some helper functions:

 public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) { switch (settings.sessionPreset, settings.orientation) { case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920) case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280) default: fatalError("Unsupported session preset and orientation") } } public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription { var result = noErr var absd = settings.audioOutputABSD var description: CMAudioFormatDescription? withUnsafePointer(&absd) { absdPtr in result = CMAudioFormatDescriptionCreate(nil, absdPtr, 0, nil, 0, nil, nil, &description) } if result != noErr { Log.error?.message("Could not create audio format description") } return description! } public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription { var result = noErr var description: CMVideoFormatDescription? let (width, height) = getVideoDimensions(fromSettings: settings) result = CMVideoFormatDescriptionCreate(nil, kCMVideoCodecType_H264, Int32(width), Int32(height), [:], &description) if result != noErr { Log.error?.message("Could not create video format description") } return description! }

Here's how the resource initiator is initialized:

 guard let audioDevice = defaultAudioDevice() else { throw RecordError.MissingDeviceFeature("Microphone") } guard let videoDevice = defaultVideoDevice(.Back) else { throw RecordError.MissingDeviceFeature("Camera") } let videoInput = try AVCaptureDeviceInput(device: videoDevice) let audioInput = try AVCaptureDeviceInput(device: audioDevice) let videoFormatHint = createVideoFormatDescription(fromSettings: settings) let audioFormatHint = createAudioFormatDescription(fromSettings: settings) let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo, outputSettings: nil, sourceFormatHint: videoFormatHint) let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio, outputSettings: nil, sourceFormatHint: audioFormatHint) writerVideoInput.expectsMediaDataInRealTime = true writerAudioInput.expectsMediaDataInRealTime = true let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true) .URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString) .URLByAppendingPathExtension("mp4") let assetWriter = try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4) if !assetWriter.canAddInput(writerVideoInput) { throw RecordError.Unknown("Could not add video input") } if !assetWriter.canAddInput(writerAudioInput) { throw RecordError.Unknown("Could not add audio input") } assetWriter.addInput(writerVideoInput) assetWriter.addInput(writerAudioInput)

And this is how audio samples are encoded, the problem area is likely to be here. I rewrote it so that it does not use any Rx-isms.

 var outputABSD = settings.audioOutputABSD var outputFormatDescription: CMAudioFormatDescription! = nil CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription) var converter: AudioConverter? // Indicates whether priming information has been attached to the first buffer var primed = false func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? { // Create the audio converter if it not available if converter == nil { var classDescriptions = settings.audioEncoderClassDescriptions var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory var outputABSD = settings.audioOutputABSD outputABSD.mSampleRate = inputABSD.mSampleRate outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame var converter: AudioConverterRef = nil var result = noErr result = withUnsafePointer(&outputABSD) { outputABSDPtr in return withUnsafePointer(&inputABSD) { inputABSDPtr in return AudioConverterNewSpecific(inputABSDPtr, outputABSDPtr, UInt32(classDescriptions.count), &classDescriptions, &converter) } } if result != noErr { throw RecordError.Unknown } // At this point I made an attempt to retrieve priming info from // the audio converter assuming that it will give me back default values // I can use, but ended up with `nil` var primeInfo: AudioConverterPrimeInfo? = nil var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo)) // The following returns a `noErr` but `primeInfo` is still `nil`` AudioConverterGetProperty(converter, kAudioConverterPrimeInfo, &primeInfoSize, &primeInfo) // I've also tried to set `kAudioConverterPrimeInfo` so that it knows // the leading frames that are being primed, but the set didn't seem to work // (`noErr` but getting the property afterwards still returned `nil`) } let converter = converter! // Need to give a big enough output buffer. // The assumption is that it will always be <= to the input size let numSamples = CMSampleBufferGetNumSamples(buffer) // This becomes 1024 * 2 = 2048 let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket) let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize) defer { outputBufferPtr.destroy() outputBufferPtr.dealloc(1) } var result = noErr var outputPacketCount = UInt32(1) var outputData = AudioBufferList( mNumberBuffers: 1, mBuffers: AudioBuffer( mNumberChannels: outputABSD.mChannelsPerFrame, mDataByteSize: UInt32(outputBufferSize), mData: outputBufferPtr)) // See below for `EncodeAudioUserData` var userData = EncodeAudioUserData(inputSampleBuffer: buffer, inputBytesPerPacket: inputABSD.mBytesPerPacket) withUnsafeMutablePointer(&userData) { userDataPtr in // See below for `fetchAudioProc` result = AudioConverterFillComplexBuffer( converter, fetchAudioProc, userDataPtr, &outputPacketCount, &outputData, nil) } if result != noErr { Log.error?.message("Error while trying to encode audio buffer, code: \(result)") return nil } // See below for `CMSampleBufferCreateCopy` guard let newBuffer = CMSampleBufferCreateCopy(buffer, fromAudioBufferList: &outputData, newFromatDescription: outputFormatDescription) else { Log.error?.message("Could not create sample buffer from audio buffer list") return nil } if !primed { primed = true // Simply picked 2112 samples based on convention, is there a better way to determine this? let samplesToPrime: Int64 = 2112 let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate) let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond) // Without setting the attachment the asset writer will complain about the // first buffer missing the `TrimDurationAtStart` attachment, is there are way // to infer the value from the given `AudioBufferList`? CMSetAttachment(newBuffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, CMTimeCopyAsDictionary(primingDuration, nil), kCMAttachmentMode_ShouldNotPropagate) } return newBuffer }

Below is the process that retrieves the samples for the audio converter, and the data that is passed to it:

 private class EncodeAudioUserData { var inputSampleBuffer: CMSampleBuffer? var inputBytesPerPacket: UInt32 init(inputSampleBuffer: CMSampleBuffer, inputBytesPerPacket: UInt32) { self.inputSampleBuffer = inputSampleBuffer self.inputBytesPerPacket = inputBytesPerPacket } } private let fetchAudioProc: AudioConverterComplexInputDataProc = { (inAudioConverter, ioDataPacketCount, ioData, outDataPacketDescriptionPtrPtr, inUserData) in var result = noErr if ioDataPacketCount.memory == 0 { return noErr } let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory // If its already been processed guard let buffer = userData.inputSampleBuffer else { ioDataPacketCount.memory = 0 return -1 } var inputBlockBuffer: CMBlockBuffer? var inputBufferList = AudioBufferList() result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer( buffer, nil, &inputBufferList, sizeof(AudioBufferList), nil, nil, 0, &inputBlockBuffer) if result != noErr { Log.error?.message("Error while trying to retrieve buffer list, code: \(result)") ioDataPacketCount.memory = 0 return result } let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket ioDataPacketCount.memory = packetsCount ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData if outDataPacketDescriptionPtrPtr != nil { outDataPacketDescriptionPtrPtr.memory = nil } return noErr }

This is how I convert AudioBufferList to CMSampleBuffer s:

 public func CMSampleBufferCreateCopy( buffer: CMSampleBuffer, inout fromAudioBufferList bufferList: AudioBufferList, newFromatDescription formatDescription: CMFormatDescription? = nil) -> CMSampleBuffer? { var result = noErr var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)] // Copy timing info from the previous buffer var timingInfo = CMSampleTimingInfo() result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo) if result != noErr { return nil } var newBuffer: CMSampleBuffer? result = CMSampleBufferCreateReady( kCFAllocatorDefault, nil, formatDescription ?? CMSampleBufferGetFormatDescription(buffer), Int(bufferList.mNumberBuffers), 1, &timingInfo, 1, &sizeArray, &newBuffer) if result != noErr { return nil } guard let b = newBuffer else { return nil } CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList) return newBuffer }

Is there something that I'm obviously doing wrong? Is there any suitable way to build a CMSampleBuffer from an AudioBufferList ? How do you transfer primer information from the converter to the CMSampleBuffer that you create?

In my use case, I need to do the encoding manually, as the buffers will manipulate further down the pipeline (although I turned off all conversions after the encoding to make sure it works.)

Any help would be greatly appreciated. Sorry there is so much code for digest, but I wanted to provide as much context as possible.

Thanks in advance:)

Some related questions:

CMSampleBufferRef kCMSampleBufferAttachmentKey_TrimDurationAtStart crash
Can I use AVCaptureSession to encode an AAC stream to memory?
Writing video + generated sound on AVAssetWriterInput, stuttering sound
How to use CoreAudio's AudioConverter to encode AAC in real time?

Some links that I used:

+7

ios encoding swift aac core-audio

Nathan kot Apr 1 '16 at 8:15

source share

1 answer

Nathan kot · Accepted Answer · 2016-06-25T23:26:30+0000

Turns out there were a lot of things that I did wrong. Instead of posting the code for the code, I'm going to try to organize this into pieces of the things that I discovered.

Patterns vs. Packages and Frames

This was a huge source of confusion for me:

Each CMSampleBuffer can have 1 or more sample buffers (detected through CMSampleBufferGetNumSamples )
Each CMSampleBuffer containing 1 sample is one audio packet .
Therefore, CMSampleBufferGetNumSamples(sample) will return the number of packets contained in this buffer.
Packages contain frames . This is determined by the mFramesPerPacket property for the AudioStreamBasicDescription buffer. For linear PCM buffers, the total size of each sample buffer is frames * bytes per frame . For compressed buffers (e.g. AAC), there is no relationship between the total size and the number of frames.

`AudioConverterComplexInputDataProc`

This callback is used to get more linear PCM audio data for encoding. Mandatory to deliver at least the number of packages specified by ioNumberDataPackets . Since I used the converter for real-time encoding in push-style, I needed to make sure that every click of the data contained the minimum number of packets. Something like this (pseudo code):

 let minimumPackets = outputFramesPerPacket / inputFramesPerPacket var buffers: [CMSampleBuffer] = [] while getTotalSize(buffers) < minimumPackets { buffers = buffers + [getNextBuffer()] } AudioConverterFillComplexBuffer(...)

Slicing `CMSampleBuffer`

In fact, you can slice a CMSampleBuffer if they contain multiple buffers. The tool for this is CMSampleBufferCopySampleBufferForRange . This is good, so you can provide AudioConverterComplexInputDataProc exact number of packets it requests, which makes it easier to process time synchronization information for the resulting encoded buffer. Because if you transfer the converter 1500 frames of data when it expects 1024 , the sample buffer will have a duration of 1024/sampleRate as opposed to 1500/sampleRate .

Shelf life and trimmer

When performing AAC encoding, you must set the trimmer duration as follows:

 CMSetAttachment(buffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, CMTimeCopyAsDictionary(primingDuration, kCFAllocatorDefault), kCMAttachmentMode_ShouldNotPropagate)

One thing that I did wrong was that I added the duration of the cut during coding. This should be handled by your author so that he can guarantee that information is added to the leading sound frames.

In addition, the value of kCMSampleBufferAttachmentKey_TrimDurationAtStart never exceed the duration of the fetch buffer. Fill example:

Primer frames: 2112
Sampling Rate: 44100
Fill Duration: 2112 / 44100 = ~0.0479s
First frame, frames: 1024 , download duration: 1024 / 44100
Second frame, frames: 1024 , download duration: 1088 / 41100

Creating a New `CMSampleBuffer`

AudioConverterFillComplexBuffer has an optional outputPacketDescriptionsPtr . You must use it. It will point to a new array of package descriptions containing information about sample sizes. You need this sample size information to create a new compressed buffer:

 let bufferList: AudioBufferList let packetDescriptions: [AudioStreamPacketDescription] var newBuffer: CMSampleBuffer? CMAudioSampleBufferCreateWithPacketDescriptions( kCFAllocatorDefault, // allocator nil, // dataBuffer false, // dataReady nil, // makeDataReadyCallback nil, // makeDataReadyRefCon formatDescription, // formatDescription Int(bufferList.mNumberBuffers), // numSamples CMSampleBufferGetPresentationTimeStamp(buffer), // sbufPTS (first PTS) &packetDescriptions, // packetDescriptions &newBuffer)

AAC encoding using AudioConverter and recording in AVAssetWriter

Patterns vs. Packages and Frames

AudioConverterComplexInputDataProc

Slicing CMSampleBuffer

Shelf life and trimmer

Creating a New CMSampleBuffer

More articles:

`AudioConverterComplexInputDataProc`

Slicing `CMSampleBuffer`

Creating a New `CMSampleBuffer`