Overview
Izwi chat provides:- Local inference — Model execution stays on-device
- Multiple model families — Qwen3, Qwen3.5, LFM2.5, and Gemma
- System prompts — Shape assistant behavior
- Streaming output — Incremental response tokens
- Multimodal support (Qwen3.5 only) — Image inputs in chat API requests
Getting Started
Download a Chat Model
Start Chatting
Using the CLI
| Option | Description | Default |
|---|---|---|
--model, -m | Chat model to use | qwen3-0.6b-4bit |
--system, -s | System prompt | — |
--voice, -v | Voice for spoken responses | — |
qwen3-0.6b-4bit remains the CLI default for backward compatibility.
For new setups, prefer an enabled model from izwi list, such as Qwen3-8B-GGUF or Qwen3.5-4B.
Examples:
Using the Web UI
- Open Chat in the sidebar
- Enter a prompt
- Send and review streamed output
- Switch loaded models from the model selector
Using the API
Text Chat Endpoint
Text Request Example
cURL Example
Multimodal (Image) Example
Image inputs are supported only on Qwen3.5 GGUF chat variants:stream_options.include_usage, tool-call
payloads, and strict/relaxed OpenAI compatibility profiles. See the
API Reference for the full request contract and
streaming sequence.
Supported Chat Models
| Family | Models |
|---|---|
| Qwen3 | Qwen3-0.6B-GGUF, Qwen3-1.7B-GGUF, Qwen3-4B-GGUF, Qwen3-8B-GGUF |
| Qwen3.5 | Qwen3.5-0.8B, Qwen3.5-2B, Qwen3.5-4B, Qwen3.5-9B |
| LFM2.5 | LFM2.5-1.2B-Instruct-GGUF, LFM2.5-1.2B-Thinking-GGUF |
| Gemma | Gemma-3-1b-it |
Multimodal Limits
- Multimodal media chat is currently limited to Qwen3.5 GGUF models.
- Video inputs are not yet implemented.
- Non-Qwen3.5 chat variants currently support text-only requests.
Tips
- Use
izwi listto pick a currently enabled model ID. - Use stronger models (
Qwen3-8B-GGUF,Qwen3.5-9B) for harder tasks. - Use smaller models (
Qwen3.5-0.8B,LFM2.5-1.2B-*) for low-latency usage.