7.1. Read, Write and Play Sound¶

We assume that PyAudio and soundfile are installed. There are basically two ways to read a file: reading all data into memory at once and secondly reading the data in small chunks (buffers or blocks). We will show both of them.

import numpy as np
import soundfile as sf

FILENAME = "source/data/sowhat.ogg"

print(sf.info(FILENAME))

data, samplerate = sf.read(FILENAME)
print()
print("data: ", data.shape)
print("sample rate: ", samplerate)

source/data/sowhat.ogg
samplerate: 44100 Hz
channels: 2
duration: 09:23.356 min
format: OGG (OGG Container format) [OGG]
subtype: Vorbis [VORBIS]

data:  (24844000, 2)
sample rate:  44100

Please note that the ordering of dimensions in the data array is (no_frames, no_channels), other packages may use the other ordering. In some cases you need to transpose the data (for instance to play it).

To play the audio data is dependent on the machine/program you are working with. For instance in an IPython notebook it is relatively easy (using the browser capabilities to deal with sound).

Fig. 7.1 Playing an audio fragment (numpy array) in a Jupyter notebook.¶

Another way of feeding the audio data to the sound system we can also use the portaudio library.

Instead of reading an entire audio file into a numpy array as done above. We could read the file an chunks (buffers or blocks) and feed these one after the other to the audio system.

To do so we switch to using the hardware audio system (Alsa on Linux systems) through the portaudio package. Portaudio is a cross platform library to access the audio system of the machine you are working on. In short, an audio signal is chopped up in blocks and those are obtained (using soundfile in case of a file and portaudio in case of for instance a microphone attached to your computer), then optionally the blocks are processed (DSP) and then fed to the audio system in blocks for audio rendering.

import soundfile as sf
import pyaudio
import numpy as np


pa = pyaudio.PyAudio()

FILE = "../../data/sowhat.wav"

chunk = 1024

f = sf.SoundFile(FILE)
print(f.channels)
print(f.samplerate)

stream = pa.open(format=pyaudio.paInt32,
                 channels=f.channels,
                 rate=f.samplerate,
                 frames_per_buffer=chunk,
                 output=True)

duration = 120 # play the file for the first 2 minutes
nblocks = duration * f.samplerate // chunk

for i, b in enumerate(f.blocks(blocksize=CHUNK, dtype=np.int32)):
    if i >= nblocks:
        break
    stream.write(b.tobytes())

stream.stop_stream()
stream.close()

pa.terminate()

The logic of the program above is quite simple.

open the audio file
open the default output stream
read the blocks (chunks) sequentially from the file and feed them to the output stream.

Note that the soundfile package returns numpy arrays whereas the pyaudio package works with raw databuffers. That is why we need the tobytes method for conversion.

Also note that in the loop where we ‘feed’ the output stream processing other than this audio processing is not simple to do. You can do some processing in the for loop but it is your responsibility to be in time feeding the output stream with new data. This blocking mode audio processing is therefore in most cases not the recommended way to do things.

It is better to let the output stream ‘ask’ for data when needed: the callback mode. In this mode you pass a function when opening the stream. The callback function then gets called whenever the stream needs data.

import soundfile as sf
import pyaudio
import numpy as np
import time


FILE = "../../data/sowhat.wav"

chunk = 1024
pa = pyaudio.PyAudio()

f = sf.SoundFile(FILE)

def callback(in_data, frame_count, time_info, status):
    data = f.read(chunk, dtype='int32').tobytes()
    return (data, pyaudio.paContinue)


stream = pa.open(format=pyaudio.paInt32,
                 channels=f.channels,
                 rate=f.samplerate,
                 frames_per_buffer=chunk,
                 output=True,
                 stream_callback=callback)

stream.start_stream()
start_time = time.time()
duration = 120 # play the first 2 minutes

while stream.is_active() and time.time() - start_time < duration:
    time.sleep(0.1) # just waiting for the stream to finish

stream.stop_stream()
stream.close()

pa.terminate()

Observe that the while loop is only used to wait either for the entire file to be played or the set duration to be exceeded. In an application that time would be spent on calculations, the user interface etc.

On my (RvdB) computer the above program ran fine. If you have a slower computer and you experience hick-ups in the sound rendering you might consider the enlarge the chunk size. Then you will need less callbacks from the audio system and thus less time will be spend in relatively slow Python code.

We can also use soundfile to write audio data to a file (in several formats, see the documentation).