multiai-tts

TTS extension for multiai using OpenAI, Google GenAI and Azure Speech

View the Project on GitHub or PyPI

multiai-tts

multiai-tts is an extension library for multiai that provides Text-to-Speech (TTS) capabilities using OpenAI, Google GenAI, and Azure Speech.

Table of Contents

Supported AI providers

Provider Strengths Docs
OpenAI Simple API Models · Voices · API
Google GenAI Emotion tags, multi-speaker Models · Voices · API
Azure Speech SSML, extensive voice selection Voices · API

Install multiai and run

Prerequisites

API key configuration

This library relies on the configuration provided by multiai. You must set up your API keys (OpenAI API Key, Google API Key, Azure TTS Key and Region) using multiai’s configuration files or environment variables before using this library.

For details on how to configure API keys, please refer to the multiai documentation.

System requirements

Installation

pip install multiai-tts

Usage

Google GenAI example

import sys
import multiai_tts

client = multiai_tts.Prompt()
client.set_tts_model('google', 'gemini-3.1-tts-flash-preview')
client.tts_voice_google = 'charon'

# Speak directly
client.speak("Please speak the following. Hello, this is a test from Google model.")
if client.error:
    print(client.error_message)
    sys.exit(1)

# Save to file
client.save_tts("Please speak the following. Saving this audio to mp3.", "output_google.mp3")
if client.error:
    print(client.error_message)
    sys.exit(1)

OpenAI example

import sys
import multiai_tts

client = multiai_tts.Prompt()
client.set_tts_model('openai', 'gpt-4o-mini-tts')
client.tts_voice_openai = 'marin'

# Speak directly
client.speak("Hello, this is a test from OpenAI model.")
if client.error:
    print(client.error_message)
    sys.exit(1)

# Save to file
client.save_tts("Saving this audio to mp3.", "output_openai.mp3")
if client.error:
    print(client.error_message)
    sys.exit(1)

Azure TTS example

import sys
import multiai_tts

client = multiai_tts.Prompt()
client.set_tts_provider('azure')
client.tts_voice_azure = 'en-US-JennyNeural'

# Speak directly
client.speak("Hello, this is a test from Azure TTS.")
if client.error:
    print(client.error_message)
    sys.exit(1)

# Save to file
client.save_tts("Saving this audio to mp3.", "output_azure.mp3")
if client.error:
    print(client.error_message)
    sys.exit(1)

Notes

Style prompts

Both speak() and save_tts() accept an optional prompt argument: a style instruction (voice, tone, speed, emotion, …) that is separate from the spoken text. The prompt is not read aloud and is not subject to chunk splitting.

client.speak(
    "Hello, this is a test.",
    prompt="Speak cheerfully and a little slowly.",
)

The prompt is prepended to the text before synthesis, using the same rule for every provider — whether a style prompt helps and how to phrase it is up to you.

When the text is chunked (see below), the prompt is re-applied to every chunk so the style stays consistent across the whole audio. Because the prompt is kept separate from the body, chunk_size is measured against the spoken text length only — the prompt length never eats into it. Leaving prompt empty (the default) reproduces the original behavior exactly.

Long text (automatic chunking)

When the text is long — whether it exceeds a provider’s request length limit or degrades in quality with longer input (as is the case with some Gemini models) — speak() and save_tts() can automatically split the text into chunks, synthesize each chunk, and join the resulting audio.

# Split into chunks of at most ~1000 characters and join the audio
client.save_tts(long_text, "output.mp3", chunk_size=1000)
if client.error:
    print(client.error_message)
Parameter Type Default Description
prompt str "" Style instruction applied to every chunk (see Style prompts). Not part of the spoken text and not subject to splitting.
chunk_size int or None None Maximum characters per chunk, measured against the spoken text only. None disables splitting (original behavior).
split_chars str "。..!!??\n" Candidate split characters. The split point is just after the rightmost candidate found within chunk_size.
chunk_overflow str "extend" Behavior when no candidate is found within chunk_size: "extend" reads on until the next candidate (or end of text); "error" sets client.error and stops.

split_text() is also exposed directly if you only need the chunk boundaries:

chunks = client.split_text(long_text, chunk_size=1000)

Caveats