Use Whisper with Python

Then they said, “Come, let us build ourselves a city, with a tower that reaches to the heavens, so that we may make a name for ourselves; otherwise we will be scattered over the face of the whole earth.”

Transcribe and translate

OpenAI’s free language model can take audio from 99 languages and transcribe them to text or translate them into english. I will show you how to use Python to record a 30 second clip and feed it through the model.

Install dependencies

Paste this in your command line:

pip install git+https://github.com/openai/whisper.git

you may need to use pip3

Import Libraries

import pyaudio
import wave
import whisper

Load the model

#The model may take a lil bit to load
model = whisper.load_model("base") 

models include tiny, base, small, medium, and large

Assign variables, create Pyaudio object

# Record in chunks of 1024 samples
chunk = 1024 
# 16 bits per sample
sample_format = pyaudio.paInt16 
channels = 1
# Record at 44100 samples per second
fs = 44100 
seconds = 30
# pick the filename you want
filename = "output.wav" 

p = pyaudio.PyAudio() # Create an interface to PortAudio

Record a clip

Here is where you will want to start speaking into the mic.

print('Recording')

stream = p.open(
	format=sample_format,
	channels=channels,
	rate=fs,
	frames_per_buffer=chunk,
	input=True
)

frames = [] # Initialize an array to store frames

# Store data in chunks for 30 seconds
for i in range(0, int(fs / chunk * seconds)):
	data = stream.read(chunk)
	frames.append(data)

# Stop and close the stream
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
p.terminate()

# Save the recorded data as a WAV file
wf = wave.open(filename, 'wb')
wf.setnchannels(channels)
wf.setsampwidth(p.get_sample_size(sample_format))
wf.setframerate(fs)
wf.writeframes(b''.join(frames))
wf.close()

print('Finished recording') # Good job!

Get the transcription

The model autodetects the language(!)

result = model.transcribe(filename,fp16=False)
print("transcribing")
print(result["text"])

Get a translation

translation = model.transcribe(filename, fp16=False, task="translate")["text"]
print(translation)

Congrats! You did it. You can learn more about it from their github page.