Generate speech from text.

Synopsis

izwi tts <TEXT> [OPTIONS]

Description

Converts text to speech using a TTS model. The OSS server emits WAV audio for the CLI path, with voice selection and real-time streaming.

Kokoro-82M Prerequisite (espeak-ng)

Kokoro-82M requires espeak-ng to be installed on the host system (used for phonemization).

Arguments

ArgumentDescription
<TEXT>Text to synthesize (use - to read from stdin)

Options

OptionDescriptionDefault
-m, --model <MODEL>TTS model to useqwen3-tts-0.6b-base
-s, --speaker <VOICE>Built-in voice/speaker ID, or VibeVoice speaker labelmodel default
--saved-voice-id <ID>Reuse a saved reference voice from /v1/voices
--reference-audio <PATH>Reference audio file for voice cloning
--reference-text <TEXT>Transcript for the reference audio
--reference-text-file <PATH>File containing the reference transcript
--instructions <TEXT>Voice direction prompt for voice-design models
-o, --output <PATH>Output file pathstdout
-f, --format <FORMAT>Audio format requested from the server. OSS native output is wav; compressed choices require server-side encoder support.wav
-r, --speed <SPEED>Speech speed (0.5-2.0)1.0
-t, --temperature <TEMP>Sampling temperature0.7
--streamStream output in real-time
--allow-format-fallbackAllow WAV output when a requested compressed format is unavailable
-p, --playRequest playback after generation. The current CLI saves audio and reports playback as not implemented.

Examples

Basic usage

izwi tts "Hello, world!" --output hello.wav

Kokoro-82M

izwi tts "Hello my name is Bella" \
  --model Kokoro-82M \
  --speaker af_bella \
  --output kokoro.wav

Request playback

izwi tts "Hello, world!" --play
The current CLI still saves the generated audio, then reports that playback is not implemented in this version.

WAV output

izwi tts "Hello, world!" --format wav --output hello.wav

Adjust speed

# Slower
izwi tts "Speaking slowly" --speed 0.75 --output slow.wav

# Faster
izwi tts "Speaking quickly" --speed 1.5 --output fast.wav

Read from stdin

echo "Text from pipe" | izwi tts - --output piped.wav
cat article.txt | izwi tts - --output article.wav

CustomVoice speaker presets

izwi tts "Hello from a built-in CustomVoice speaker" \
  --model Qwen3-TTS-12Hz-0.6B-CustomVoice \
  --speaker Aiden \
  --output cloned.wav

Reference-audio voice cloning

Base and VibeVoice models can clone from a reference audio sample. Provide the audio and matching transcript together:
izwi tts "This sentence will use the reference voice." \
  --model Qwen3-TTS-12Hz-0.6B-Base \
  --reference-audio samples/reference.wav \
  --reference-text "This is the exact text spoken in the reference sample." \
  --output cloned.wav
For longer transcripts, read the reference text from a file:
izwi tts "Generate with the saved reference transcript." \
  --model Qwen3-TTS-12Hz-1.7B-Base \
  --reference-audio samples/reference.wav \
  --reference-text-file samples/reference.txt \
  --output cloned-long.wav

Saved voice reuse

Saved voices created through Voice Studio or /v1/voices can be reused without resending reference audio:
izwi tts "Use my saved narration voice." \
  --model Qwen3-TTS-12Hz-0.6B-Base \
  --saved-voice-id voice_abc123 \
  --output saved-voice.wav
Do not combine --saved-voice-id with --reference-audio or --reference-text; choose one voice source per request.

Prompt-based voice design

VoiceDesign models accept a natural-language direction prompt:
izwi tts "Welcome to the evening edition." \
  --model Qwen3-TTS-12Hz-1.7B-VoiceDesign \
  --instructions "A calm, warm newsreader voice with measured pacing" \
  --output designed.wav

Streaming with playback request

izwi tts "Long text for streaming" --stream --play

Audio Formats

FormatExtensionNotes
wav.wavNative OSS output format.
mp3, ogg, flac, aacMatching extension when supportedRecognized by the CLI enum, but the OSS server does not bundle compressed encoders yet. Add --allow-format-fallback when you intentionally want the CLI to accept WAV bytes for a compressed request; default filenames then use the actual returned extension.

Models

ModelTypeDescription
Kokoro-82MStandardLightweight TTS (requires espeak-ng)
qwen3-tts-0.6b-baseBaseGeneral-purpose TTS + reference-voice workflows
qwen3-tts-0.6b-customvoiceCustomVoiceBuilt-in speaker presets
Qwen3-TTS-12Hz-1.7B-VoiceDesignDesignVoice design-capable model family
VibeVoice-1.5BLong-form/reference voiceReference-voice TTS with long-form positioning
Voxtral-4B-TTS-2603Preset voices20 built-in voices, 24 kHz output, CC BY-NC 4.0
qwen3-tts-1.7b-*LargerHigher quality variants

See Also