Python Audio Pitch

I am trying to use pyaudio to create a voice mask. With the way I installed it right now, the only thing I need to do is enter the sound, change the pitch on the fly and completely delete it. The first and last part work, and I think I'm getting closer to changing the step ... emphasis on "think."

Unfortunately, I am not too familiar with the type of data I work with, and how to manipulate it exactly the way I want. I went through the audio documentation and did not find what I needed (I thought there were some things that I could definitely use there). I guess I ask ...

How data is formatted in these sound frames.

How can I change the height of the frame (if possible), or is it even close to working that way?

import pyaudio import sys import numpy as np import wave import audioop import struct chunk = 1024 FORMAT = pyaudio.paInt16 CHANNELS = 1 RATE = 41000 RECORD_SECONDS = 5 p = pyaudio.PyAudio() stream = p.open(format = FORMAT, channels = CHANNELS, rate = RATE, input = True, output = True, frames_per_buffer = chunk) swidth = 2 print "* recording" while(True): data = stream.read(chunk) data = np.array(wave.struct.unpack("%dh"%(len(data)/swidth), data))*2 data = np.fft.rfft(data) #MANipulation data = np.fft.irfft(data) stream.write(data3, chunk) print "* done" stream.stop_stream() stream.close() p.terminate() 
+4
source share
2 answers

To change the pitch, you need to perform FFT for several frames, and then shift the data by frequency (move data to different frequency boxes) and perform the inverse FFT.

If you do not mind that a piece of sound becomes longer with decreasing pitch (or higher with increasing pitch), you could repeat the selection of frames. For example, you can double each frame (insert a copy of each frame in the stream), thereby reducing the playback speed and pitch. You can then improve the sound quality by improving the oversampling algorithm to use some interpolation and / or filtering.

+3
source

After the irfft line and before the stream.write line stream.write you need to convert the data back to 16-bit integers with a call to struct.pack .

 data = np.fft.irfft(data) dataout = np.array(data*0.5, dtype='int16') #undo the *2 that was done at reading chunkout = struct.pack("%dh"%(len(dataout)), *list(dataout)) #convert back to 16-bit data stream.write(chunkout) 
+5
source

All Articles