If the input audio files are uncompressed (i.e., WAV files, etc.), then the sound library I like to use is libsndfile. There seems to be a python shell here: https://code.google.com/p/libsndfile-python/ . With this in mind, the rest can be done as follows:
Open the audio output to record audio using libsndfile
For each input audio file, open the input stream with libsndfile
Retrieve metadata information for a given audio file based on your 'script' text description
Write any silence your main output stream needs, and then write data from the input stream to the output stream. Pay attention to the current position / time. Repeat this step for each input audio file, checking that the target start time for audio clips is always> = current position / time noted earlier. If not, then you have an overlap.
Of course, you need to worry about matching the samples, etc., but that should be enough to get started. Also, I'm not quite sure if you are trying to write one output file or one for each input file, but this answer should be tweekable enough. libsndfile will provide you with all the information you need (e.g. clip length, etc.) if it supports the input file format.
source share