Interpreting WAV Data

I am trying to write a program to display PCM data. I am very upset trying to find a library with the right level of abstraction, but I found the python wave library and used this. However, I am not sure how to interpret the data.

The wave.getparams function returns (2 channels, 2 bytes, 44100 Hz, 96333 frames, No compression, No compression). It all seems funny, but then I tried to print one frame: '\ xc0 \ xff \ xd0 \ xff', which is 4 bytes. I believe that it is possible that the frame is 2 samples, but the uncertainties do not end there.

96333 frames * 2 counts / frame * (1 / 44.1k s / sample) = 4.3688 seconds

However, iTunes reports the time as close to 2 seconds as possible, and calculations based on file size and bitrate are in a step of 2.7 seconds. What's going on here?

Also, how do I know if bytes are signed or not?

Many thanks!

+6
source share
6 answers

β€œTwo channels” means stereo, so it makes no sense to summarize each channel duration - that's why you disconnected twice (2.18 seconds, not 4.37). As for subscription, as described, for example, here , and I quote:

8-bit samples are stored as unsigned bytes, from 0 to 255. 16-bit samples are stored as 2'-complement of signed integers: from -32768 to 32767.

This is part of the specifications of the WAV format (actually its superset of RIFF) and, therefore, does not depend on which library you use to work with the WAV file.

+8
source

Thanks for the help! I got his job and I will post a solution here for everyone to use it if he needs another poor soul:

import wave import struct def pcm_channels(wave_file): """Given a file-like object or file path representing a wave file, decompose it into its constituent PCM data streams. Input: A file like object or file path Output: A list of lists of integers representing the PCM coded data stream channels and the sample rate of the channels (mixed rate channels not supported) """ stream = wave.open(wave_file,"rb") num_channels = stream.getnchannels() sample_rate = stream.getframerate() sample_width = stream.getsampwidth() num_frames = stream.getnframes() raw_data = stream.readframes( num_frames ) # Returns byte data stream.close() total_samples = num_frames * num_channels if sample_width == 1: fmt = "%iB" % total_samples # read unsigned chars elif sample_width == 2: fmt = "%ih" % total_samples # read signed 2 byte shorts else: raise ValueError("Only supports 8 and 16 bit audio formats.") integer_data = struct.unpack(fmt, raw_data) del raw_data # Keep memory tidy (who knows how big it might be) channels = [ [] for time in range(num_channels) ] for index, value in enumerate(integer_data): bucket = index % num_channels channels[bucket].append(value) return channels, sample_rate 
+17
source

I know that the answer has already been accepted, but I have done something with audio a long time ago, and you need to unzip the wave, doing something like this.

 pcmdata = wave.struct.unpack("%dh"%(wavedatalength),wavedata) 

Also, the one package I used was called PyAudio, although I still had to use the wave package with it.

+4
source

Each sample is 16 bits, and there are 2 channels, so the frame takes 4 bytes

+2
source

Duration is simply the number of frames divided by the number of frames per second. From your data, this is: 96333 / 44100 = 2.18 seconds .

+2
source

Based on this answer , you can get a good performance boost using numpy.fromstring or numpy.fromfile . Also see this answer .

Here is what I did:

 def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True): if sample_width == 1: dtype = np.uint8 # unsigned char elif sample_width == 2: dtype = np.int16 # signed 2-byte short else: raise ValueError("Only supports 8 and 16 bit audio formats.") channels = np.fromstring(raw_bytes, dtype=dtype) if interleaved: # channels are interleaved, ie sample N of channel M follows sample N of channel M-1 in raw data channels.shape = (n_frames, n_channels) channels = channels.T else: # channels are not interleaved. All samples from channel M occur before all samples from channel M-1 channels.shape = (n_channels, n_frames) return channels 

Assigning a new value to a form will throw an error if it is required that the data be copied to memory. This is good because you want to use the data in place (using less time and memory in general). The ndarray.T function also does not copy (i.e. returns a view), if possible, but I'm not sure how you ensure that it does not copy.

Reading directly from a file using np.fromfile will be even better, but you will need to skip the header using a custom dtype. I have not tried it yet.

+1
source

All Articles