Features - Izwi

Core Features

Feature	Description	Guide
Voice	Real-time voice conversations with AI	Voice Guide
Chat	Text-based AI conversations	Chat Guide
Transcription	Convert audio to text	Transcription Guide
Speaker Attributed ASR	Generate Granite speaker-turn transcripts	SAA Guide
Voices	Create, clone, design, preview, manage, and reuse voice assets	Voice Studio Guide
Text-to-Speech	Generate natural speech from text	TTS Guide
Studio	Manage long-form TTS projects and exports	Studio Guide
Settings	Configure appearance, updates, privacy, onboarding, and desktop behavior	Settings Guide
Diarization	Identify multiple speakers	Diarization Guide
Voice Cloning	Clone voices from audio samples inside Voice Studio	Voice Cloning Guide
Voice Design	Create voices from descriptions inside Voice Studio	Voice Design Guide

Feature Comparison

Feature	Web UI	Desktop	CLI	API
Voice	✓	✓	—	✓
Chat	✓	✓	✓	✓
Transcription	✓	✓	✓	✓
Speaker Attributed ASR	✓	✓	—	✓
Voices	✓	✓	✓	✓
Text-to-Speech	✓	✓	✓	✓
Studio	✓	✓	—	✓
Settings	✓	✓	—	✓
Diarization	✓	✓	—	✓
Voice Cloning	✓	✓	✓	✓
Voice Design	✓	✓	✓	✓

Getting Started

Start the server:
```
izwi serve
```
Open the web UI:
```
http://localhost:8080
```

Download required models:

izwi pull Qwen3-TTS-12Hz-0.6B-Base
izwi pull Qwen3-ASR-0.6B-GGUF
izwi pull Qwen3-8B-GGUF

Model Requirements

Different features require different models:

Feature	Required Models
Voice	TTS + ASR + Chat model (or unified `LFM2.5-Audio-1.5B-GGUF`)
Chat	Chat model (Qwen3, Qwen3.5, LFM2.5, or Gemma)
Speaker Attributed ASR	`Granite-Speech-4.1-2B-Plus`
Voices	Built-in voice model for presets; Base or VibeVoice model for cloning; VoiceDesign model for design
Text-to-Speech	TTS model
Studio	TTS model
Settings	No model required
Transcription	ASR model (`Parakeet-TDT-0.6B-v3` default; Qwen3/Whisper/Granite Speech/LFM2.5 also supported)
Diarization	`diar_streaming_sortformer_4spk-v2.1` (+ optional ASR and aligner models)
Forced Alignment	`Qwen3-ForcedAligner-0.6B` (or `-4bit`)
Voice Cloning	Qwen3 TTS Base model (`Qwen3-TTS-12Hz--Base`)
Voice Design	Qwen3 TTS VoiceDesign model (`Qwen3-TTS-12Hz-1.7B-VoiceDesign*`)

Next Steps

Choose a feature to learn more:

Voice Mode — Real-time conversations
Voices — Manage and create reusable voices
Text-to-Speech — Generate speech
Studio — Build long-form TTS projects
Settings — Configure app preferences
Transcription — Convert audio to text
Speaker Attributed ASR — Granite speaker-turn transcripts
API Reference — Integrate with HTTP, SSE, and WebSocket APIs

Voice Mode

Use Izwi for real-time voice conversations with speech recognition, chat, and text-to-speech.

Chat

Run local text and multimodal chat conversations through the Izwi CLI, web UI, and API.

Transcription

Convert audio to text with Izwi through the CLI, web UI, and local API.

Speaker Attributed ASR

Generate Granite Speech speaker-turn transcripts through the Transcription workspace and speech-text jobs API.

Voice Studio

Create, clone, design, preview, manage, and reuse voices from the unified Izwi Voices workspace.

Text-to-Speech

Generate natural speech from text with Izwi models, voices, streaming, and audio output formats.

Studio

Manage long-form text-to-speech projects, chapter workflows, and exports in Izwi Studio.

Settings and Onboarding

Configure appearance, updates, analytics, desktop system behavior, and first-run model setup in Izwi.

Diarization

Identify multiple speakers in audio and generate speaker-attributed transcripts with Izwi.

Voice Cloning

Create custom voices from reference audio and use them for local text-to-speech generation.

Voice Design

Design synthetic voices from text descriptions and use them in Izwi speech workflows.

From Source Voice Mode

​Core Features

​Feature Comparison

​Getting Started

​Model Requirements

​Next Steps