1573 words

8 minutes

Getting Started with Text-to-Speech in Python Using pyttsx3 and gTTS

2025-06-30

Tutorial

Python

/

Text to Speech

/

Automation

/

Speech

/

Accessibility

Python Text-to-Speech: Getting Started with pyttsx3 and gTTS#

Text-to-Speech (TTS) technology converts written text into spoken audio. This capability is fundamental to various applications, including accessibility tools, automated voice systems, and interactive software. Python offers multiple libraries for implementing TTS, providing developers with flexible options for adding voice output to their programs. Two widely used libraries are pyttsx3 and gTTS. Understanding the strengths and appropriate use cases for each library is essential for successful implementation.

Essential Concepts in Text-to-Speech#

Before exploring specific libraries, a few core concepts provide context for TTS in Python:

Speech Synthesis: The process by which a computer generates human-like speech. This involves converting text into phonetic representations and then synthesizing these phonemes into waveforms.
Offline TTS: Systems that perform speech synthesis locally on the user’s device without requiring an internet connection. These systems typically rely on voices installed on the operating system.
Online TTS: Systems that use remote servers (APIs) to perform speech synthesis. The text is sent to a server, processed, and the resulting audio is streamed back or provided as an audio file. These systems often offer higher-quality voices and support for multiple languages.
Speech Engine/API: The underlying software or service that performs the actual text-to-speech conversion. Libraries like pyttsx3 and gTTS act as interfaces to these engines or APIs.

Getting Started with pyttsx3 (Offline TTS)#

pyttsx3 is a cross-platform, offline Text-to-Speech library. It interfaces with the speech engines available on the user’s operating system, such as SAPI on Windows, NSSpeechSynthesizer on macOS, and eSpeak or Festival on Linux. This makes it suitable for applications where an internet connection is not guaranteed or desired.

Installation#

Installation is straightforward using pip:

1
pip install pyttsx3

Basic Usage#

A minimal script to speak text using pyttsx3 involves initializing the engine, providing text, and running the speech process.

1
import pyttsx3
2

3
# Initialize the TTS engine
4
engine = pyttsx3.init()
5

6
# Provide the text to speak
7
text_to_speak = "Hello, this is pyttsx3 speaking."
8

9
# Queue the text to be spoken
10
engine.say(text_to_speak)
11

12
# Block while the engine processes all queued commands
13
engine.runAndWait()
14

15
# Stop the engine (optional, but good practice)
16
engine.stop()

This script initializes the default speech engine, queues the phrase “Hello, this is pyttsx3 speaking.” for synthesis, and then waits for the speech to complete before exiting.

Controlling Speech Properties#

pyttsx3 allows control over properties like voice, speech rate, and volume.

1
import pyttsx3
2

3
engine = pyttsx3.init()
4

5
# Get current properties
6
rate = engine.getProperty('rate')
7
volume = engine.getProperty('volume')
8
voices = engine.getProperty('voices')
9

10
print(f"Current Rate: {rate}")
11
print(f"Current Volume: {volume}")
12
# print(f"Available Voices: {voices}") # Uncomment to see available voices
13

14
# Set properties
15
engine.setProperty('rate', 150) # Speed of speech (words per minute)
16
engine.setProperty('volume', 0.9) # Volume (0.0 to 1.0)
17

18
# Often, multiple voices are available. Select one by ID.
19
# You would typically inspect the 'voices' list to find an appropriate ID.
20
# Example (replace with actual voice ID from your system):
21
# if voices:
22
#    engine.setProperty('voice', voices[0].id) # Use the first voice found
23

24
engine.say("Changing speech properties.")
25
engine.runAndWait()
26
engine.stop()

The engine.getProperty() method retrieves current settings.
The engine.setProperty(name, value) method adjusts settings. Common properties include ‘rate’, ‘volume’, and ‘voice’.
The ‘voices’ property returns a list of available voice objects, each with an id and name. Selecting a voice requires setting the ‘voice’ property to the desired voice’s id. The available voices depend entirely on the operating system and installed speech packs.

Pros and Cons of pyttsx3#

Pros:

Offline Operation: Does not require an internet connection after installation.
Cross-Platform: Works on Windows, macOS, and Linux.
Direct Speaking: Synthesizes and plays audio directly without saving to a file first (by default).
Platform Native Voices: Utilizes the system’s installed voices.

Cons:

Voice Quality: Voice quality is dependent on the operating system’s installed voices, which can vary significantly and may sound less natural than online services.
Limited Voice/Language Options: Access is limited to the voices installed on the specific machine.

Getting Started with gTTS (Google Text-to-Speech - Online TTS)#

gTTS (Google Text-to-Speech) is a Python library that interfaces with Google’s Text-to-Speech API. It is an online service, meaning it requires an internet connection to function. It excels in providing high-quality, natural-sounding voices and support for a wide range of languages. Unlike pyttsx3 which speaks directly, gTTS saves the synthesized speech to an audio file (typically MP3).

Installation#

Install gTTS using pip:

1
pip install gTTS

Basic Usage#

Using gTTS involves creating a gTTS object with the text and desired language, then calling the save() method to write the audio to a file.

1
from gtts import gTTS
2
import os
3

4
# Provide the text to speak
5
text_to_speak = "Hello, this is gTTS speaking."
6

7
# Specify the language (e.g., 'en' for English)
8
language = 'en'
9

10
# Create a gTTS object
11
tts = gTTS(text=text_to_speak, lang=language, slow=False)
12

13
# Save the audio to an MP3 file
14
audio_file = "hello_gtts.mp3"
15
tts.save(audio_file)
16

17
print(f"Audio saved to {audio_file}")
18

19
# Optional: Play the saved file (requires a separate audio player)
20
# Example for playing on a system with 'os.system' support:
21
# os.system(f"start {audio_file}") # For Windows
22
# os.system(f"afplay {audio_file}") # For macOS
23
# os.system(f"mpg321 {audio_file}") # For Linux (requires mpg321 installed)

This script converts the text to English speech using Google’s API and saves the result as hello_gtts.mp3. Playing this file requires invoking an external audio player.

Controlling Speech Properties and Languages#

gTTS offers simpler control over speech properties directly via the gTTS object constructor, mainly focusing on language and slow speech.

1
from gtts import gTTS
2
import os
3

4
# Text in different languages
5
text_english = "Good morning."
6
text_french = "Bonjour."
7
text_spanish = "Buenos días."
8

9
# Create gTTS objects for different languages
10
tts_english = gTTS(text=text_english, lang='en')
11
tts_french = gTTS(text=text_french, lang='fr')
12
tts_spanish = gTTS(text=text_spanish, lang='es')
13

14
# Save to files
15
tts_english.save("good_morning_en.mp3")
16
tts_french.save("bonjour_fr.mp3")
17
tts_spanish.save("buenos_dias_es.mp3")
18

19
print("Saved English, French, and Spanish audio files.")
20

21
# Example of slow speech
22
text_slow = "Speaking slowly."
23
tts_slow = gTTS(text=text_slow, lang='en', slow=True)
24
tts_slow.save("speaking_slowly.mp3")
25

26
print("Saved slow speech audio file.")

The lang parameter accepts standard language codes (e.g., ‘en’, ‘fr’, ‘es’, ‘de’). A comprehensive list is available in the gTTS documentation.
The slow parameter (a boolean) can be set to True to produce slower speech.

Pros and Cons of gTTS#

Pros:

High-Quality Voices: Leverages Google’s advanced TTS engine, producing natural-sounding speech.
Extensive Language Support: Supports a wide variety of languages and accents.
Easy to Use: Simple API for converting text to audio files.

Cons:

Requires Internet Connection: Cannot function offline as it relies on Google’s API.
Saves to File: Does not speak directly in real-time; requires saving the audio and then playing the file.

Choosing Between pyttsx3 and gTTS#

The choice between pyttsx3 and gTTS depends heavily on the application’s requirements.

Feature	pyttsx3	gTTS
Type	Offline	Online (requires internet)
Engine	System-native (SAPI, NSSpeech, eSpeak)	Google Text-to-Speech API
Output	Direct audio playback	Saves to audio file (MP3)
Voice Quality	Varies (depends on OS voices), often less natural	Generally high quality, natural-sounding
Languages	Limited to installed OS voices	Wide range of languages supported
Dependencies	System speech engines	Internet connection, Google API
Ease of Use	Simple for basic speech, slightly more involved for voice/rate control	Simple for text-to-file conversion, playing file is separate

Choose pyttsx3 for applications requiring offline capability, real-time speech synthesis without saving files, or when leveraging specific voices installed on the user’s operating system is acceptable or necessary. Examples include simple desktop notification readers or basic voice feedback in offline tools.
Choose gTTS for applications where high-quality, natural-sounding speech is paramount, multiple languages are needed, and an internet connection is reliable. Examples include generating audio content for websites, creating audiobooks, or developing online educational platforms.

Real-World Application Examples#

Implementing TTS in Python can enhance various projects:

Accessibility Tools: Scripts can read out text content from documents or websites for users with visual impairments. pyttsx3 could be used for local document readers, while gTTS might be used for web content where internet is expected and higher voice quality is beneficial.
Voice Assistants: Simple voice response systems can use TTS to speak answers to user queries. pyttsx3 can provide quick, offline responses, while gTTS might fetch information online and speak it with better clarity.
Educational Software: Applications can read out lessons, instructions, or vocabulary words. gTTS is particularly useful here for its language support and clear pronunciation, especially when teaching foreign languages.
Automated Notifications: Scripts can announce system events or reminders. pyttsx3 is suitable for desktop notifications where offline operation is important.
Content Creation: Convert articles, blog posts, or scripts into audio format for podcasts or audio articles using gTTS to produce high-quality output files.

A simple example using gTTS to convert text from a file into an audiobook chapter:

1
from gtts import gTTS
2
import os
3

4
def text_file_to_audio(input_filename, output_filename, language='en'):
5
    """Reads text from a file and saves it as an audio file."""
6
    try:
7
        with open(input_filename, 'r', encoding='utf-8') as f:
8
            text = f.read()
9

10
        if not text.strip():
11
            print("Input file is empty.")
12
            return
13

14
        tts = gTTS(text=text, lang=language)
15
        tts.save(output_filename)
16
        print(f"Successfully converted '{input_filename}' to '{output_filename}'")
17

18
    except FileNotFoundError:
19
        print(f"Error: Input file '{input_filename}' not found.")
20
    except Exception as e:
21
        print(f"An error occurred: {e}")
22

23
# Example usage:
24
# 1. Create a dummy text file named 'chapter1.txt' with some text.
25
# 2. Run the function:
26
# text_file_to_audio('chapter1.txt', 'chapter1_audio.mp3', language='en')

This function demonstrates reading content from a text file, processing it through gTTS, and saving the synthesized speech as an MP3 file, suitable for creating audio content from written sources.

Key Takeaways#

Text-to-Speech (TTS) converts written text into spoken audio.
pyttsx3 is an offline, cross-platform Python library that uses the operating system’s speech engines for direct audio output.
gTTS is an online Python library that uses Google’s Text-to-Speech API to generate high-quality speech, typically saved to an MP3 file.
Installation for both libraries is done via pip install.
pyttsx3 is suitable for applications requiring no internet and real-time speech playback using system voices.
gTTS is suitable for applications requiring high-quality voices, extensive language support, and where saving audio files is acceptable and internet access is available.
Choosing the appropriate library depends on requirements like internet connectivity, desired voice quality, language support, and output format (direct speech vs. audio file).
Both libraries provide simple interfaces for adding voice capabilities to Python programs for various applications like accessibility, education, and content creation.