I am trying to encode audio buffers derived from AVCaptureSession using AudioConverter and then adding them to AVAssetWriter .
I am not getting any errors (including OSStatus answers), and the CMSampleBuffer Created seems to have valid data, however the resulting file simply has no reproducible sound. When recording along with video, video frames stop joining several frames ( appendSampleBuffer() returns false, but without AVAssetWriter.error ), probably because the writer waits for the asset to catch up with the sound. I suspect this is due to the fact that I am setting up priming for AAC.
The application uses RxSwift, but I removed parts of RxSwift to make it easier to understand for a wider audience.
Please see the comments in the code below for comments ...
Given the settings of the struct:
import Foundation import AVFoundation import CleanroomLogger public struct AVSettings { let orientation: AVCaptureVideoOrientation = .Portrait let sessionPreset = AVCaptureSessionPreset1280x720 let videoBitrate: Int = 2_000_000 let videoExpectedFrameRate: Int = 30 let videoMaxKeyFrameInterval: Int = 60 let audioBitrate: Int = 32 * 1024
Some helper functions:
public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) { switch (settings.sessionPreset, settings.orientation) { case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920) case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280) default: fatalError("Unsupported session preset and orientation") } } public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription { var result = noErr var absd = settings.audioOutputABSD var description: CMAudioFormatDescription? withUnsafePointer(&absd) { absdPtr in result = CMAudioFormatDescriptionCreate(nil, absdPtr, 0, nil, 0, nil, nil, &description) } if result != noErr { Log.error?.message("Could not create audio format description") } return description! } public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription { var result = noErr var description: CMVideoFormatDescription? let (width, height) = getVideoDimensions(fromSettings: settings) result = CMVideoFormatDescriptionCreate(nil, kCMVideoCodecType_H264, Int32(width), Int32(height), [:], &description) if result != noErr { Log.error?.message("Could not create video format description") } return description! }
Here's how the resource initiator is initialized:
guard let audioDevice = defaultAudioDevice() else { throw RecordError.MissingDeviceFeature("Microphone") } guard let videoDevice = defaultVideoDevice(.Back) else { throw RecordError.MissingDeviceFeature("Camera") } let videoInput = try AVCaptureDeviceInput(device: videoDevice) let audioInput = try AVCaptureDeviceInput(device: audioDevice) let videoFormatHint = createVideoFormatDescription(fromSettings: settings) let audioFormatHint = createAudioFormatDescription(fromSettings: settings) let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo, outputSettings: nil, sourceFormatHint: videoFormatHint) let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio, outputSettings: nil, sourceFormatHint: audioFormatHint) writerVideoInput.expectsMediaDataInRealTime = true writerAudioInput.expectsMediaDataInRealTime = true let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true) .URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString) .URLByAppendingPathExtension("mp4") let assetWriter = try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4) if !assetWriter.canAddInput(writerVideoInput) { throw RecordError.Unknown("Could not add video input") } if !assetWriter.canAddInput(writerAudioInput) { throw RecordError.Unknown("Could not add audio input") } assetWriter.addInput(writerVideoInput) assetWriter.addInput(writerAudioInput)
And this is how audio samples are encoded, the problem area is likely to be here. I rewrote it so that it does not use any Rx-isms.
var outputABSD = settings.audioOutputABSD var outputFormatDescription: CMAudioFormatDescription! = nil CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription) var converter: AudioConverter? // Indicates whether priming information has been attached to the first buffer var primed = false func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? { // Create the audio converter if it not available if converter == nil { var classDescriptions = settings.audioEncoderClassDescriptions var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory var outputABSD = settings.audioOutputABSD outputABSD.mSampleRate = inputABSD.mSampleRate outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame var converter: AudioConverterRef = nil var result = noErr result = withUnsafePointer(&outputABSD) { outputABSDPtr in return withUnsafePointer(&inputABSD) { inputABSDPtr in return AudioConverterNewSpecific(inputABSDPtr, outputABSDPtr, UInt32(classDescriptions.count), &classDescriptions, &converter) } } if result != noErr { throw RecordError.Unknown } // At this point I made an attempt to retrieve priming info from // the audio converter assuming that it will give me back default values // I can use, but ended up with `nil` var primeInfo: AudioConverterPrimeInfo? = nil var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo)) // The following returns a `noErr` but `primeInfo` is still `nil`` AudioConverterGetProperty(converter, kAudioConverterPrimeInfo, &primeInfoSize, &primeInfo) `kAudioConverterPrimeInfo` so that it knows `noErr` but getting the property afterwards still returned `nil`) } let converter = converter! `EncodeAudioUserData` var userData = EncodeAudioUserData(inputSampleBuffer: buffer, inputBytesPerPacket: inputABSD.mBytesPerPacket) withUnsafeMutablePointer(&userData) { userDataPtr in `fetchAudioProc` result = AudioConverterFillComplexBuffer( converter, fetchAudioProc, userDataPtr, &outputPacketCount, &outputData, nil) } if result != noErr { Log.error?.message("Error while trying to encode audio buffer, code: \(result)") return nil } `CMSampleBufferCreateCopy` guard let newBuffer = CMSampleBufferCreateCopy(buffer, fromAudioBufferList: &outputData, newFromatDescription: outputFormatDescription) else { Log.error?.message("Could not create sample buffer from audio buffer list") return nil } if !primed { primed = true `TrimDurationAtStart` attachment, is there are way `AudioBufferList`? CMSetAttachment(newBuffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, CMTimeCopyAsDictionary(primingDuration, nil), kCMAttachmentMode_ShouldNotPropagate) } return newBuffer }
Below is the process that retrieves the samples for the audio converter, and the data that is passed to it:
private class EncodeAudioUserData { var inputSampleBuffer: CMSampleBuffer? var inputBytesPerPacket: UInt32 init(inputSampleBuffer: CMSampleBuffer, inputBytesPerPacket: UInt32) { self.inputSampleBuffer = inputSampleBuffer self.inputBytesPerPacket = inputBytesPerPacket } } private let fetchAudioProc: AudioConverterComplexInputDataProc = { (inAudioConverter, ioDataPacketCount, ioData, outDataPacketDescriptionPtrPtr, inUserData) in var result = noErr if ioDataPacketCount.memory == 0 { return noErr } let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory // If its already been processed guard let buffer = userData.inputSampleBuffer else { ioDataPacketCount.memory = 0 return -1 } var inputBlockBuffer: CMBlockBuffer? var inputBufferList = AudioBufferList() result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer( buffer, nil, &inputBufferList, sizeof(AudioBufferList), nil, nil, 0, &inputBlockBuffer) if result != noErr { Log.error?.message("Error while trying to retrieve buffer list, code: \(result)") ioDataPacketCount.memory = 0 return result } let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket ioDataPacketCount.memory = packetsCount ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData if outDataPacketDescriptionPtrPtr != nil { outDataPacketDescriptionPtrPtr.memory = nil } return noErr }
This is how I convert AudioBufferList to CMSampleBuffer s:
public func CMSampleBufferCreateCopy( buffer: CMSampleBuffer, inout fromAudioBufferList bufferList: AudioBufferList, newFromatDescription formatDescription: CMFormatDescription? = nil) -> CMSampleBuffer? { var result = noErr var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)] // Copy timing info from the previous buffer var timingInfo = CMSampleTimingInfo() result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo) if result != noErr { return nil } var newBuffer: CMSampleBuffer? result = CMSampleBufferCreateReady( kCFAllocatorDefault, nil, formatDescription ?? CMSampleBufferGetFormatDescription(buffer), Int(bufferList.mNumberBuffers), 1, &timingInfo, 1, &sizeArray, &newBuffer) if result != noErr { return nil } guard let b = newBuffer else { return nil } CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList) return newBuffer }
Is there something that I'm obviously doing wrong? Is there any suitable way to build a CMSampleBuffer from an AudioBufferList ? How do you transfer primer information from the converter to the CMSampleBuffer that you create?
In my use case, I need to do the encoding manually, as the buffers will manipulate further down the pipeline (although I turned off all conversions after the encoding to make sure it works.)
Any help would be greatly appreciated. Sorry there is so much code for digest, but I wanted to provide as much context as possible.
Thanks in advance:)
Some related questions:
- CMSampleBufferRef kCMSampleBufferAttachmentKey_TrimDurationAtStart crash
- Can I use AVCaptureSession to encode an AAC stream to memory?
- Writing video + generated sound on AVAssetWriterInput, stuttering sound
- How to use CoreAudio's AudioConverter to encode AAC in real time?
Some links that I used: