Use Whisper with Python
Then they said, “Come, let us build ourselves a city, with a tower that reaches to the heavens, so that we may make a name for ourselves; otherwise we will be scattered over the face of the whole earth.”
Transcribe and translate
OpenAI’s free language model can take audio from 99 languages and transcribe them to text or translate them into english. I will show you how to use Python to record a 30 second clip and feed it through the model.
Install dependencies
Paste this in your command line:
pip install git+https://github.com/openai/whisper.git
you may need to use pip3
Import Libraries
import pyaudio
import wave
import whisper
Load the model
#The model may take a lil bit to load
model = whisper.load_model("base")
models include tiny, base, small, medium, and large
Assign variables, create Pyaudio object
# Record in chunks of 1024 samples
chunk = 1024
# 16 bits per sample
sample_format = pyaudio.paInt16
channels = 1
# Record at 44100 samples per second
fs = 44100
seconds = 30
# pick the filename you want
filename = "output.wav"
p = pyaudio.PyAudio() # Create an interface to PortAudio
Record a clip
Here is where you will want to start speaking into the mic.
print('Recording')
stream = p.open(
format=sample_format,
channels=channels,
rate=fs,
frames_per_buffer=chunk,
input=True
)
frames = [] # Initialize an array to store frames
# Store data in chunks for 30 seconds
for i in range(0, int(fs / chunk * seconds)):
data = stream.read(chunk)
frames.append(data)
# Stop and close the stream
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
p.terminate()
# Save the recorded data as a WAV file
wf = wave.open(filename, 'wb')
wf.setnchannels(channels)
wf.setsampwidth(p.get_sample_size(sample_format))
wf.setframerate(fs)
wf.writeframes(b''.join(frames))
wf.close()
print('Finished recording') # Good job!
Get the transcription
The model autodetects the language(!)
result = model.transcribe(filename,fp16=False)
print("transcribing")
print(result["text"])
Get a translation
translation = model.transcribe(filename, fp16=False, task="translate")["text"]
print(translation)
Congrats! You did it. You can learn more about it from their github page.