Kalebu Jordan

Become a Pro Python Developer

Speech to text

Hi guys, In this article I’m going to share with you how to easily convert speech to text in python, commonly known as speech recognition but firstly;

what is speech recognition?

Speech Recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format.

You probably have seen it being heavily used on Sci-fi, and personal assistants like Siri, Cortana, and Google Assistant.

You don’t have to be expert

In this tutorial, you’re going to learn how to build your own python program that is capable of performing converting your sound to textual information.

You don’t need to Pro in Python to be able to complete this tutorial,
even If you’re still a beginner you can still successfully complete this tutorial.

how is it done ?

Our Python will be capable of recording sound through a Microphone in your Computer, and then it will send the speech to google speech recognition API and return a decoded text to us.

Requirements

You might need to install the following python libraries to able to successful run the examples on this tutorial ;

Installation

pip install PyAudio
pip install SpeechRecognition

Through SpeechRecognition Library you can perform speech recognition, with support for several engines and APIs, online and offline.

In this tutorial, we are going to use Google Speech recognition API which is free for basic uses perhaps it has a limit of requests you can send over a certain time.

Throughout this tutorial, you will learn performing Speech Recognition using sound that is being directly fed from Microphone also using 
Audio Source from a file.

Speech Recognition from Microphone 

When Performing Speech Recognition from Microphone, we need to record the audio from the microphone and then send it to google 
Speech to text recognition engine
and then it will give us the textual output which will print out to the Screen.

Steps involved 

  • Recording Audio from Microphone ( PyAudio)
  • Sending Audio to the Speech recognition engine 
  • Printing the Recognized text to the screen 

app.py

import speech_recognition as sr

recognizer = sr.Recognizer()

''' recording the sound '''

with sr.Microphone() as source:
    print("Adjusting noise ")
    recognizer.adjust_for_ambient_noise(source, duration=1)
    print("Recording for 4 seconds")
    recorded_audio = recognizer.listen(source, timeout=4)
    print("Done recording")

''' Recorgnizing the Audio '''
try:
    print("Recognizing the text")
    text = recognizer.recognize_google(
            recorded_audio, 
            language="en-US"
        )
    print("Decoded Text : {}".format(text))

except Exception as ex:
    print(ex)

Speech Recognition from Audio file 

When it comes to performing Speech Recognition from Audio line only one line of code is going to change instead of using a Microphone as a source of Audio, we will give a path to our Audio File we want to transcribe to text

In case you wanna use the same sample audio that I used for the tutorial download it, through the below link.

when you compile the above concepts, it gonna look as shown in the app_audio.py code, now copy it and run to

app_audio.py

import speech_recognition as sr

recognizer = sr.Recognizer()

''' recording the sound '''

with sr.AudioFile("./sample_audio/speech.wav") as source:
    recorded_audio = recognizer.listen(source)
    print("Done recording")

''' Recorgnizing the Audio '''
try:
    print("Recognizing the text")
    text = recognizer.recognize_google(
            recorded_audio, 
            language="en-US"
        )
    print("Decoded Text : {}".format(text))

except Exception as ex:
    print(ex)

Output :

kalebu@kalebu-PC:~$ python3 app_audio.py 
Done recording
Recognizing the text
Decoded Text : python programming is the best of all by Jordan

Speech Recognition from Long Audio Source

When you have a very long audio , loading the whole audio to Memory and sending it over API it can be very slow process , to overcome that we have to split the long audio source into small chunks and then performing speech recognition on those individual chunks

We are going to use pydub to split the Long Audio Source into those small chunks

To install pydub just use pip

pip install pydub

The Below is a sample Python code that load the Long Audio , Split into the segment and then performing the Speech recognition on those individual chunks to to learn more about splitting the audio you can check out DataCamp Tutorial

long_audio.py

import os 
from pydub import AudioSegment
import speech_recognition as sr
from pydub.silence import split_on_silence

recognizer = sr.Recognizer()

def load_chunks(filename):
    long_audio = AudioSegment.from_mp3(filename)
    audio_chunks = split_on_silence(
        long_audio, min_silence_len=1800,
        silence_thresh=-17
    )
    return audio_chunks

for audio_chunk in load_chunks('./sample_audio/long_audio.mp3'):
    audio_chunk.export("temp", format="wav")
    with sr.AudioFile("temp") as source:
        audio = recognizer.listen(source)
        try:
            text = recognizer.recognize_google(audio)
            print("Chunk : {}".format(text))
        except Exception as ex:
            print("Error occured")
            print(ex)

print("++++++")

Output :

$ python long_audio.py
    Chunk : by the time you finish reading this tutorial you have already covered several techniques and natural then
    Chunk : learn more
    Chunk : forgetting to subscribe to be updated on upcoming tutorials
    ++++++

Hope you had good time playing with Speech Recognition in Python

Knowledge is good when shared, Now share this article with fellow coders to other developer communities, press Tweet now to share it on twitter.

Based on your interest I recommend you to also read this

In case of anything, comment, suggestion, difficulties, drop it in the comment box and I will get back to you ASAP.

Don’t forget to Subscribe to this blog to stay updated on upcoming Python Tutorials

The full code for is this tutorial is available on MY GIITHUB

error

Enjoy this blog? Please spread the word :)