Overview
Voice design generates unique voices based on natural language descriptions. Describe the voice you want, and Izwi creates it:- No samples needed — Create voices from scratch
- Infinite variety — Design any voice you can describe
- Quick iteration — Rapidly test different voice concepts
- Creative freedom — Perfect for characters and personas
Getting Started
Download a Voice Design Model
Design a Voice
Describe the voice you want:Using the Web UI
Voice design now lives inside the unified Voices workspace.Step 1: Describe Your Voice
- Navigate to Voices in the sidebar and choose the design flow
- Enter a description of your desired voice
- Be specific about characteristics you want
Step 2: Generate Sample
- Enter sample text to hear the voice
- Click Generate
- Listen to the result
Step 3: Iterate
- Adjust your description
- Generate again
- Repeat until satisfied
Voice Description Tips
Effective Descriptions
Include details about:| Aspect | Examples |
|---|---|
| Gender | Male, female, androgynous |
| Age | Young, middle-aged, elderly |
| Tone | Warm, authoritative, playful |
| Accent | British, Southern US, neutral |
| Pace | Fast, measured, deliberate |
| Energy | Energetic, calm, subdued |
| Character | Professional, friendly, mysterious |
Example Descriptions
News anchor:Using the CLI
Useizwi tts with a VoiceDesign model and pass the voice description with
--instructions:
Using the API
Endpoint
Request
Example (curl)
/v1/voice-designs. See the
API Reference for the persisted route
family and /v1/audio/speech streaming details.
Available Models
| Model | Size | Quality |
|---|---|---|
Qwen3-TTS-12Hz-1.7B-VoiceDesign | ~4.2 GB | Better |
Qwen3-TTS-12Hz-1.7B-VoiceDesign-4bit | ~2.2 GB | Good + lower memory |
Best Practices
Be Specific
❌ “A nice voice” ✅ “A warm, professional female voice in her 40s with a calm, reassuring tone”Use Comparisons
“Similar to a podcast host — conversational but polished”Describe the Context
“A voice suitable for meditation apps — slow, soothing, and peaceful”Iterate
Start broad, then refine:- “A male voice”
- “A young male voice with energy”
- “A young male voice with energy, like a sports commentator”
Limitations
- Consistency — Same description may produce slightly different voices
- Extreme requests — Very unusual voices may not generate well
- Accents — Some accents are better supported than others
- Singing — Designed for speech, not singing
Voice Design vs Voice Cloning
| Aspect | Voice Design | Voice Cloning |
|---|---|---|
| Input | Text description | Audio sample |
| Use case | Create new voices | Replicate existing voices |
| Consistency | May vary slightly | More consistent |
| Flexibility | Unlimited creativity | Limited to source |
See Also
- Voices — Manage and reuse saved voices
- Voice Cloning — Clone from audio samples
- Text-to-Speech — Standard TTS
- Models — Download models