0

One script, one voice, every language: multilingual TTS with the ElevenLabs API

You can take one script, generate it in a dozen languages, and keep the same voice identity across all of them with the ElevenLabs API. That part is genuinely good. The part that will waste an afternoon is this: pick the wrong model and your language fails with a 400, and the error does not always make the reason obvious. I hit exactly that with Vietnamese.

Here is how multilingual generation actually works through the API, and the model choice that decides whether your language is even supported.

One voice, many languages

The same voice_id speaks every language — that is the whole point. You do not need a separate voice per locale. You pass text in the target language and the model speaks it in that voice:

from elevenlabs.client import ElevenLabs

client = ElevenLabs(api_key="YOUR_API_KEY")
VOICE_ID = "JBFqnCBsd6RMkjVDRZzb"   # one voice, used for every language

lines = {
    "en": "Welcome to the show. Today we are testing AI voices.",
    "es": "Bienvenido al programa. Hoy vamos a probar voces de IA.",
    "fr": "Bienvenue dans l'émission. Aujourd'hui, nous testons des voix d'IA.",
}

for code, text in lines.items():
    audio = client.text_to_speech.convert(
        voice_id=VOICE_ID,
        model_id="eleven_multilingual_v2",
        text=text,
        language_code=code,
        output_format="mp3_44100_128",
    )
    with open(f"out_{code}.mp3", "wb") as f:
        for chunk in audio:
            f.write(chunk)

Those three ran clean for me — valid audio in the same voice for each. Then I added Vietnamese and it broke.

The gotcha: not every model speaks every language

Adding "vi" to that loop with eleven_multilingual_v2 returns an HTTP 400:

{
  "detail": {
    "code": "invalid_parameters",
    "status": "unsupported_language",
    "message": "Model 'eleven_multilingual_v2' does not support language_code 'vi'.",
    "param": "language_code"
  }
}

eleven_multilingual_v2 covers 29 languages (per ElevenLabs' docs) — most of the common ones, including Hindi, which I confirmed works — but Vietnamese is not among them. The fix is the newer model: eleven_v3 supports a far wider set (70+ languages per the docs), Vietnamese included. Same call, one field changed:

audio = client.text_to_speech.convert(
    voice_id=VOICE_ID,
    model_id="eleven_v3",                # 70+ languages incl. Vietnamese
    text="Chào mừng bạn đến với chương trình. Hôm nay chúng ta sẽ thử nghiệm giọng nói AI.",
    language_code="vi",
    output_format="mp3_44100_128",
)

That produced valid audio for me on the first try, in the same voice. So the rule is simple: most of the common set is fine on multilingual_v2 (I confirmed English, Spanish, French, German, and Hindi all work on it); some languages are not — Vietnamese and Thai both return the same unsupported_language 400 — and those need eleven_v3. Do not assume: a language being "major" does not mean multilingual_v2 covers it, so check the one you need.

Handle the unsupported case explicitly, don't let it surprise you

Because the failure is a clean, typed error, you can detect it instead of discovering it in production. Catch the unsupported_language status and route that language to the model that supports it:

from elevenlabs.core.api_error import ApiError

def synth(voice_id, text, language_code):
    for model in ("eleven_multilingual_v2", "eleven_v3"):
        try:
            return b"".join(client.text_to_speech.convert(
                voice_id=voice_id, model_id=model, text=text,
                language_code=language_code, output_format="mp3_44100_128",
            ))
        except ApiError as e:
            body = e.body or {}
            if body.get("detail", {}).get("status") == "unsupported_language":
                continue   # this model can't, try the wider one
            raise          # any other error is real — surface it
    raise RuntimeError(f"No model supports language_code={language_code!r}")

Two things worth keeping. First, this is not a silent fallback — it only retries on the specific unsupported_language status and re-raises everything else, so real failures still blow up loudly. Second, language_code is optional: both models will auto-detect the language from the text. I pass it anyway, because being explicit disambiguates the edge cases (short strings, mixed scripts, names) where auto-detect can guess wrong.

Why one voice across languages matters

This is the part that makes it useful for real localization rather than a demo. Because the vocal identity is tied to the voice_id and not the language, a brand or a character keeps the same voice whether it is speaking English or Vietnamese. That is the same capability behind ElevenLabs' dubbing, which re-voices a video into dozens of languages while preserving the original speaker — I dug into how well that holds up, and where the model and language limits actually bite, in a full hands-on review. For building, the takeaway is the same as the code above: choose the model by your language list, not by habit.

Rule of thumb

  • Common set (English, Spanish, French, German, Hindi, …): eleven_multilingual_v2 is fine (29 languages per ElevenLabs' docs).
  • Languages it does not cover — Vietnamese, Thai, and the long tail: use eleven_v3 (70+).
  • Mixed language set: detect unsupported_language and route per language, as above — never assume one model covers them all.
  • Pass language_code explicitly even though auto-detect exists; it is cheap insurance against a wrong guess.

What languages are you generating, and have you found a case eleven_v3 still cannot handle? I am mapping the real edges of its coverage.


All Rights Reserved

Viblo
Let's register a Viblo Account to get more interesting posts.