Create custom voices from text descriptions — no audio samples required.

Overview

Voice design generates unique voices based on natural language descriptions. Describe the voice you want, and Izwi creates it:
  • No samples needed — Create voices from scratch
  • Infinite variety — Design any voice you can describe
  • Quick iteration — Rapidly test different voice concepts
  • Creative freedom — Perfect for characters and personas

Getting Started

Download a Voice Design Model

izwi pull Qwen3-TTS-12Hz-1.7B-VoiceDesign

Design a Voice

Describe the voice you want:
A warm, friendly female voice with a slight British accent. 
Middle-aged, professional but approachable.

Using the Web UI

Voice design now lives inside the unified Voices workspace.

Step 1: Describe Your Voice

  1. Navigate to Voices in the sidebar and choose the design flow
  2. Enter a description of your desired voice
  3. Be specific about characteristics you want

Step 2: Generate Sample

  1. Enter sample text to hear the voice
  2. Click Generate
  3. Listen to the result

Step 3: Iterate

  • Adjust your description
  • Generate again
  • Repeat until satisfied

Voice Description Tips

Effective Descriptions

Include details about:
AspectExamples
GenderMale, female, androgynous
AgeYoung, middle-aged, elderly
ToneWarm, authoritative, playful
AccentBritish, Southern US, neutral
PaceFast, measured, deliberate
EnergyEnergetic, calm, subdued
CharacterProfessional, friendly, mysterious

Example Descriptions

News anchor:
A professional male voice, mid-30s, with a clear American accent. 
Authoritative and trustworthy, with measured pacing.
Children’s narrator:
A warm, enthusiastic female voice. Friendly and expressive, 
perfect for storytelling. Slightly higher pitch with playful energy.
AI assistant:
A calm, neutral voice with no strong accent. Clear and helpful, 
not robotic but not overly emotional. Professional and efficient.
Audiobook narrator:
A rich, deep male voice with a slight British accent. 
Mature and sophisticated, with excellent diction and 
a storytelling quality.

Using the CLI

Use izwi tts with a VoiceDesign model and pass the voice description with --instructions:
izwi tts "Hello, this is my designed voice." \
  --model Qwen3-TTS-12Hz-1.7B-VoiceDesign \
  --instructions "A warm, friendly female voice with a British accent" \
  --output designed.wav
You can iterate quickly by changing only the prompt:
izwi tts "Welcome back to the show." \
  --model Qwen3-TTS-12Hz-1.7B-VoiceDesign-4bit \
  --instructions "A bright podcast host voice with crisp diction" \
  --output podcast-host.wav

Using the API

Endpoint

POST /v1/audio/speech

Request

{
  "model": "Qwen3-TTS-12Hz-1.7B-VoiceDesign",
  "input": "Hello, this is my designed voice.",
  "instructions": "A warm, friendly female voice with a British accent"
}

Example (curl)

curl -X POST http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-TTS-12Hz-1.7B-VoiceDesign",
    "input": "Hello, this is my designed voice.",
    "instructions": "A warm, friendly female voice"
  }' \
  --output designed.wav
Voice design history is also available through /v1/voice-designs. See the API Reference for the persisted route family and /v1/audio/speech streaming details.

Available Models

ModelSizeQuality
Qwen3-TTS-12Hz-1.7B-VoiceDesign~4.2 GBBetter
Qwen3-TTS-12Hz-1.7B-VoiceDesign-4bit~2.2 GBGood + lower memory
Larger models better interpret complex descriptions.

Best Practices

Be Specific

❌ “A nice voice” ✅ “A warm, professional female voice in her 40s with a calm, reassuring tone”

Use Comparisons

“Similar to a podcast host — conversational but polished”

Describe the Context

“A voice suitable for meditation apps — slow, soothing, and peaceful”

Iterate

Start broad, then refine:
  1. “A male voice”
  2. “A young male voice with energy”
  3. “A young male voice with energy, like a sports commentator”

Limitations

  • Consistency — Same description may produce slightly different voices
  • Extreme requests — Very unusual voices may not generate well
  • Accents — Some accents are better supported than others
  • Singing — Designed for speech, not singing

Voice Design vs Voice Cloning

AspectVoice DesignVoice Cloning
InputText descriptionAudio sample
Use caseCreate new voicesReplicate existing voices
ConsistencyMay vary slightlyMore consistent
FlexibilityUnlimited creativityLimited to source

See Also