Core Features

FeatureDescriptionGuide
VoiceReal-time voice conversations with AIVoice Guide
ChatText-based AI conversationsChat Guide
TranscriptionConvert audio to textTranscription Guide
Speaker Attributed ASRGenerate Granite speaker-turn transcriptsSAA Guide
VoicesCreate, clone, design, preview, manage, and reuse voice assetsVoice Studio Guide
Text-to-SpeechGenerate natural speech from textTTS Guide
StudioManage long-form TTS projects and exportsStudio Guide
SettingsConfigure appearance, updates, privacy, onboarding, and desktop behaviorSettings Guide
DiarizationIdentify multiple speakersDiarization Guide
Voice CloningClone voices from audio samples inside Voice StudioVoice Cloning Guide
Voice DesignCreate voices from descriptions inside Voice StudioVoice Design Guide

Feature Comparison

FeatureWeb UIDesktopCLIAPI
Voice
Chat
Transcription
Speaker Attributed ASR
Voices
Text-to-Speech
Studio
Settings
Diarization
Voice Cloning
Voice Design

Getting Started

  1. Start the server:
    izwi serve
    
  2. Open the web UI:
    http://localhost:8080
    
  3. Download required models:
    izwi pull Qwen3-TTS-12Hz-0.6B-Base
    izwi pull Qwen3-ASR-0.6B-GGUF
    izwi pull Qwen3-8B-GGUF
    

Model Requirements

Different features require different models:
FeatureRequired Models
VoiceTTS + ASR + Chat model (or unified LFM2.5-Audio-1.5B-GGUF)
ChatChat model (Qwen3, Qwen3.5, LFM2.5, or Gemma)
Speaker Attributed ASRGranite-Speech-4.1-2B-Plus
VoicesBuilt-in voice model for presets; Base or VibeVoice model for cloning; VoiceDesign model for design
Text-to-SpeechTTS model
StudioTTS model
SettingsNo model required
TranscriptionASR model (Parakeet-TDT-0.6B-v3 default; Qwen3/Whisper/Granite Speech/LFM2.5 also supported)
Diarizationdiar_streaming_sortformer_4spk-v2.1 (+ optional ASR and aligner models)
Forced AlignmentQwen3-ForcedAligner-0.6B (or -4bit)
Voice CloningQwen3 TTS Base model (Qwen3-TTS-12Hz-*-Base*)
Voice DesignQwen3 TTS VoiceDesign model (Qwen3-TTS-12Hz-1.7B-VoiceDesign*)

Next Steps

Choose a feature to learn more:

Voice Mode

Use Izwi for real-time voice conversations with speech recognition, chat, and text-to-speech.

Chat

Run local text and multimodal chat conversations through the Izwi CLI, web UI, and API.

Transcription

Convert audio to text with Izwi through the CLI, web UI, and local API.

Speaker Attributed ASR

Generate Granite Speech speaker-turn transcripts through the Transcription workspace and speech-text jobs API.

Voice Studio

Create, clone, design, preview, manage, and reuse voices from the unified Izwi Voices workspace.

Text-to-Speech

Generate natural speech from text with Izwi models, voices, streaming, and audio output formats.

Studio

Manage long-form text-to-speech projects, chapter workflows, and exports in Izwi Studio.

Settings and Onboarding

Configure appearance, updates, analytics, desktop system behavior, and first-run model setup in Izwi.

Diarization

Identify multiple speakers in audio and generate speaker-attributed transcripts with Izwi.

Voice Cloning

Create custom voices from reference audio and use them for local text-to-speech generation.

Voice Design

Design synthetic voices from text descriptions and use them in Izwi speech workflows.