Python Text-to-Speech: Getting Started with pyttsx3 and gTTS
Text-to-Speech (TTS) technology converts written text into spoken audio. This capability is fundamental to various applications, including accessibility tools, automated voice systems, and interactive software. Python offers multiple libraries for implementing TTS, providing developers with flexible options for adding voice output to their programs. Two widely used libraries are pyttsx3 and gTTS. Understanding the strengths and appropriate use cases for each library is essential for successful implementation.
Essential Concepts in Text-to-Speech
Before exploring specific libraries, a few core concepts provide context for TTS in Python:
- Speech Synthesis: The process by which a computer generates human-like speech. This involves converting text into phonetic representations and then synthesizing these phonemes into waveforms.
- Offline TTS: Systems that perform speech synthesis locally on the user’s device without requiring an internet connection. These systems typically rely on voices installed on the operating system.
- Online TTS: Systems that use remote servers (APIs) to perform speech synthesis. The text is sent to a server, processed, and the resulting audio is streamed back or provided as an audio file. These systems often offer higher-quality voices and support for multiple languages.
- Speech Engine/API: The underlying software or service that performs the actual text-to-speech conversion. Libraries like
pyttsx3andgTTSact as interfaces to these engines or APIs.
Getting Started with pyttsx3 (Offline TTS)
pyttsx3 is a cross-platform, offline Text-to-Speech library. It interfaces with the speech engines available on the user’s operating system, such as SAPI on Windows, NSSpeechSynthesizer on macOS, and eSpeak or Festival on Linux. This makes it suitable for applications where an internet connection is not guaranteed or desired.
Installation
Installation is straightforward using pip:
pip install pyttsx3Basic Usage
A minimal script to speak text using pyttsx3 involves initializing the engine, providing text, and running the speech process.
import pyttsx3
# Initialize the TTS engineengine = pyttsx3.init()
# Provide the text to speaktext_to_speak = "Hello, this is pyttsx3 speaking."
# Queue the text to be spokenengine.say(text_to_speak)
# Block while the engine processes all queued commandsengine.runAndWait()
# Stop the engine (optional, but good practice)engine.stop()This script initializes the default speech engine, queues the phrase “Hello, this is pyttsx3 speaking.” for synthesis, and then waits for the speech to complete before exiting.
Controlling Speech Properties
pyttsx3 allows control over properties like voice, speech rate, and volume.
import pyttsx3
engine = pyttsx3.init()
# Get current propertiesrate = engine.getProperty('rate')volume = engine.getProperty('volume')voices = engine.getProperty('voices')
print(f"Current Rate: {rate}")print(f"Current Volume: {volume}")# print(f"Available Voices: {voices}") # Uncomment to see available voices
# Set propertiesengine.setProperty('rate', 150) # Speed of speech (words per minute)engine.setProperty('volume', 0.9) # Volume (0.0 to 1.0)
# Often, multiple voices are available. Select one by ID.# You would typically inspect the 'voices' list to find an appropriate ID.# Example (replace with actual voice ID from your system):# if voices:# engine.setProperty('voice', voices[0].id) # Use the first voice found
engine.say("Changing speech properties.")engine.runAndWait()engine.stop()- The
engine.getProperty()method retrieves current settings. - The
engine.setProperty(name, value)method adjusts settings. Common properties include ‘rate’, ‘volume’, and ‘voice’. - The ‘voices’ property returns a list of available voice objects, each with an
idandname. Selecting a voice requires setting the ‘voice’ property to the desired voice’sid. The available voices depend entirely on the operating system and installed speech packs.
Pros and Cons of pyttsx3
Pros:
- Offline Operation: Does not require an internet connection after installation.
- Cross-Platform: Works on Windows, macOS, and Linux.
- Direct Speaking: Synthesizes and plays audio directly without saving to a file first (by default).
- Platform Native Voices: Utilizes the system’s installed voices.
Cons:
- Voice Quality: Voice quality is dependent on the operating system’s installed voices, which can vary significantly and may sound less natural than online services.
- Limited Voice/Language Options: Access is limited to the voices installed on the specific machine.
Getting Started with gTTS (Google Text-to-Speech - Online TTS)
gTTS (Google Text-to-Speech) is a Python library that interfaces with Google’s Text-to-Speech API. It is an online service, meaning it requires an internet connection to function. It excels in providing high-quality, natural-sounding voices and support for a wide range of languages. Unlike pyttsx3 which speaks directly, gTTS saves the synthesized speech to an audio file (typically MP3).
Installation
Install gTTS using pip:
pip install gTTSBasic Usage
Using gTTS involves creating a gTTS object with the text and desired language, then calling the save() method to write the audio to a file.
from gtts import gTTSimport os
# Provide the text to speaktext_to_speak = "Hello, this is gTTS speaking."
# Specify the language (e.g., 'en' for English)language = 'en'
# Create a gTTS objecttts = gTTS(text=text_to_speak, lang=language, slow=False)
# Save the audio to an MP3 fileaudio_file = "hello_gtts.mp3"tts.save(audio_file)
print(f"Audio saved to {audio_file}")
# Optional: Play the saved file (requires a separate audio player)# Example for playing on a system with 'os.system' support:# os.system(f"start {audio_file}") # For Windows# os.system(f"afplay {audio_file}") # For macOS# os.system(f"mpg321 {audio_file}") # For Linux (requires mpg321 installed)This script converts the text to English speech using Google’s API and saves the result as hello_gtts.mp3. Playing this file requires invoking an external audio player.
Controlling Speech Properties and Languages
gTTS offers simpler control over speech properties directly via the gTTS object constructor, mainly focusing on language and slow speech.
from gtts import gTTSimport os
# Text in different languagestext_english = "Good morning."text_french = "Bonjour."text_spanish = "Buenos días."
# Create gTTS objects for different languagestts_english = gTTS(text=text_english, lang='en')tts_french = gTTS(text=text_french, lang='fr')tts_spanish = gTTS(text=text_spanish, lang='es')
# Save to filestts_english.save("good_morning_en.mp3")tts_french.save("bonjour_fr.mp3")tts_spanish.save("buenos_dias_es.mp3")
print("Saved English, French, and Spanish audio files.")
# Example of slow speechtext_slow = "Speaking slowly."tts_slow = gTTS(text=text_slow, lang='en', slow=True)tts_slow.save("speaking_slowly.mp3")
print("Saved slow speech audio file.")- The
langparameter accepts standard language codes (e.g., ‘en’, ‘fr’, ‘es’, ‘de’). A comprehensive list is available in thegTTSdocumentation. - The
slowparameter (a boolean) can be set toTrueto produce slower speech.
Pros and Cons of gTTS
Pros:
- High-Quality Voices: Leverages Google’s advanced TTS engine, producing natural-sounding speech.
- Extensive Language Support: Supports a wide variety of languages and accents.
- Easy to Use: Simple API for converting text to audio files.
Cons:
- Requires Internet Connection: Cannot function offline as it relies on Google’s API.
- Saves to File: Does not speak directly in real-time; requires saving the audio and then playing the file.
Choosing Between pyttsx3 and gTTS
The choice between pyttsx3 and gTTS depends heavily on the application’s requirements.
| Feature | pyttsx3 | gTTS |
|---|---|---|
| Type | Offline | Online (requires internet) |
| Engine | System-native (SAPI, NSSpeech, eSpeak) | Google Text-to-Speech API |
| Output | Direct audio playback | Saves to audio file (MP3) |
| Voice Quality | Varies (depends on OS voices), often less natural | Generally high quality, natural-sounding |
| Languages | Limited to installed OS voices | Wide range of languages supported |
| Dependencies | System speech engines | Internet connection, Google API |
| Ease of Use | Simple for basic speech, slightly more involved for voice/rate control | Simple for text-to-file conversion, playing file is separate |
- Choose
pyttsx3for applications requiring offline capability, real-time speech synthesis without saving files, or when leveraging specific voices installed on the user’s operating system is acceptable or necessary. Examples include simple desktop notification readers or basic voice feedback in offline tools. - Choose
gTTSfor applications where high-quality, natural-sounding speech is paramount, multiple languages are needed, and an internet connection is reliable. Examples include generating audio content for websites, creating audiobooks, or developing online educational platforms.
Real-World Application Examples
Implementing TTS in Python can enhance various projects:
- Accessibility Tools: Scripts can read out text content from documents or websites for users with visual impairments.
pyttsx3could be used for local document readers, whilegTTSmight be used for web content where internet is expected and higher voice quality is beneficial. - Voice Assistants: Simple voice response systems can use TTS to speak answers to user queries.
pyttsx3can provide quick, offline responses, whilegTTSmight fetch information online and speak it with better clarity. - Educational Software: Applications can read out lessons, instructions, or vocabulary words.
gTTSis particularly useful here for its language support and clear pronunciation, especially when teaching foreign languages. - Automated Notifications: Scripts can announce system events or reminders.
pyttsx3is suitable for desktop notifications where offline operation is important. - Content Creation: Convert articles, blog posts, or scripts into audio format for podcasts or audio articles using
gTTSto produce high-quality output files.
A simple example using gTTS to convert text from a file into an audiobook chapter:
from gtts import gTTSimport os
def text_file_to_audio(input_filename, output_filename, language='en'): """Reads text from a file and saves it as an audio file.""" try: with open(input_filename, 'r', encoding='utf-8') as f: text = f.read()
if not text.strip(): print("Input file is empty.") return
tts = gTTS(text=text, lang=language) tts.save(output_filename) print(f"Successfully converted '{input_filename}' to '{output_filename}'")
except FileNotFoundError: print(f"Error: Input file '{input_filename}' not found.") except Exception as e: print(f"An error occurred: {e}")
# Example usage:# 1. Create a dummy text file named 'chapter1.txt' with some text.# 2. Run the function:# text_file_to_audio('chapter1.txt', 'chapter1_audio.mp3', language='en')This function demonstrates reading content from a text file, processing it through gTTS, and saving the synthesized speech as an MP3 file, suitable for creating audio content from written sources.
Key Takeaways
- Text-to-Speech (TTS) converts written text into spoken audio.
pyttsx3is an offline, cross-platform Python library that uses the operating system’s speech engines for direct audio output.gTTSis an online Python library that uses Google’s Text-to-Speech API to generate high-quality speech, typically saved to an MP3 file.- Installation for both libraries is done via
pip install. pyttsx3is suitable for applications requiring no internet and real-time speech playback using system voices.gTTSis suitable for applications requiring high-quality voices, extensive language support, and where saving audio files is acceptable and internet access is available.- Choosing the appropriate library depends on requirements like internet connectivity, desired voice quality, language support, and output format (direct speech vs. audio file).
- Both libraries provide simple interfaces for adding voice capabilities to Python programs for various applications like accessibility, education, and content creation.