Voice Design

Create custom voices from text descriptions — no audio samples required.

Overview

Voice design generates unique voices based on natural language descriptions. Describe the voice you want, and Izwi creates it:

No samples needed — Create voices from scratch
Infinite variety — Design any voice you can describe
Quick iteration — Rapidly test different voice concepts
Creative freedom — Perfect for characters and personas

Getting Started

Download a Voice Design Model

izwi pull Qwen3-TTS-12Hz-1.7B-VoiceDesign

Design a Voice

Describe the voice you want:

A warm, friendly female voice with a slight British accent. 
Middle-aged, professional but approachable.

Using the Web UI

Voice design now lives inside the unified Voices workspace.

Step 1: Describe Your Voice

Navigate to Voices in the sidebar and choose the design flow
Enter a description of your desired voice
Be specific about characteristics you want

Step 2: Generate Sample

Enter sample text to hear the voice
Click Generate
Listen to the result

Step 3: Iterate

Adjust your description
Generate again
Repeat until satisfied

Voice Description Tips

Effective Descriptions

Include details about:

Aspect	Examples
Gender	Male, female, androgynous
Age	Young, middle-aged, elderly
Tone	Warm, authoritative, playful
Accent	British, Southern US, neutral
Pace	Fast, measured, deliberate
Energy	Energetic, calm, subdued
Character	Professional, friendly, mysterious

Example Descriptions

News anchor:

A professional male voice, mid-30s, with a clear American accent. 
Authoritative and trustworthy, with measured pacing.

Children’s narrator:

A warm, enthusiastic female voice. Friendly and expressive, 
perfect for storytelling. Slightly higher pitch with playful energy.

AI assistant:

A calm, neutral voice with no strong accent. Clear and helpful, 
not robotic but not overly emotional. Professional and efficient.

Audiobook narrator:

A rich, deep male voice with a slight British accent. 
Mature and sophisticated, with excellent diction and 
a storytelling quality.

Using the CLI

Use izwi tts with a VoiceDesign model and pass the voice description with --instructions:

izwi tts "Hello, this is my designed voice." \
  --model Qwen3-TTS-12Hz-1.7B-VoiceDesign \
  --instructions "A warm, friendly female voice with a British accent" \
  --output designed.wav

You can iterate quickly by changing only the prompt:

izwi tts "Welcome back to the show." \
  --model Qwen3-TTS-12Hz-1.7B-VoiceDesign-4bit \
  --instructions "A bright podcast host voice with crisp diction" \
  --output podcast-host.wav

Using the API

Endpoint

POST /v1/audio/speech

Request

{
  "model": "Qwen3-TTS-12Hz-1.7B-VoiceDesign",
  "input": "Hello, this is my designed voice.",
  "instructions": "A warm, friendly female voice with a British accent"
}

Example (curl)

curl -X POST http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-TTS-12Hz-1.7B-VoiceDesign",
    "input": "Hello, this is my designed voice.",
    "instructions": "A warm, friendly female voice"
  }' \
  --output designed.wav

Voice design history is also available through /v1/voice-designs. See the API Reference for the persisted route family and /v1/audio/speech streaming details.

Available Models

Model	Size	Quality
`Qwen3-TTS-12Hz-1.7B-VoiceDesign`	~4.2 GB	Better
`Qwen3-TTS-12Hz-1.7B-VoiceDesign-4bit`	~2.2 GB	Good + lower memory

Larger models better interpret complex descriptions.

Best Practices

Be Specific

❌ “A nice voice” ✅ “A warm, professional female voice in her 40s with a calm, reassuring tone”

Use Comparisons

“Similar to a podcast host — conversational but polished”

Describe the Context

“A voice suitable for meditation apps — slow, soothing, and peaceful”

Iterate

Start broad, then refine:

“A male voice”
“A young male voice with energy”
“A young male voice with energy, like a sports commentator”

Limitations

Consistency — Same description may produce slightly different voices
Extreme requests — Very unusual voices may not generate well
Accents — Some accents are better supported than others
Singing — Designed for speech, not singing

Voice Design vs Voice Cloning

Aspect	Voice Design	Voice Cloning
Input	Text description	Audio sample
Use case	Create new voices	Replicate existing voices
Consistency	May vary slightly	More consistent
Flexibility	Unlimited creativity	Limited to source

​Overview

​Getting Started

​Download a Voice Design Model

​Design a Voice

​Using the Web UI

​Step 1: Describe Your Voice

​Step 2: Generate Sample

​Step 3: Iterate

​Voice Description Tips

​Effective Descriptions

​Example Descriptions

​Using the CLI

​Using the API

​Endpoint

​Request

​Example (curl)

​Available Models

​Best Practices

​Be Specific

​Use Comparisons

​Describe the Context

​Iterate

​Limitations

​Voice Design vs Voice Cloning

​See Also