Chat - Izwi

Have local conversations with chat models running on your own machine.

Overview

Izwi chat provides:

Local inference — Model execution stays on-device
Multiple model families — Qwen3, Qwen3.5, LFM2.5, and Gemma
System prompts — Shape assistant behavior
Streaming output — Incremental response tokens
Multimodal support (Qwen3.5 only) — Image inputs in chat API requests

Getting Started

Download a Chat Model

izwi pull Qwen3-8B-GGUF

Start Chatting

izwi chat --model Qwen3-8B-GGUF

Web UI:

http://localhost:8080/chat

Using the CLI

Option	Description	Default
`--model`, `-m`	Chat model to use	`qwen3-0.6b-4bit`
`--system`, `-s`	System prompt	—
`--voice`, `-v`	Voice for spoken responses	—

qwen3-0.6b-4bit remains the CLI default for backward compatibility. For new setups, prefer an enabled model from izwi list, such as Qwen3-8B-GGUF or Qwen3.5-4B. Examples:

izwi chat --system "You are a helpful coding assistant."
izwi chat --model Qwen3-8B-GGUF
izwi chat --model Qwen3.5-4B
izwi chat --model LFM2.5-1.2B-Instruct-GGUF
izwi chat --model Gemma-3-1b-it

Using the Web UI

Open Chat in the sidebar
Enter a prompt
Send and review streamed output
Switch loaded models from the model selector

Using the API

Text Chat Endpoint

POST /v1/chat/completions

Text Request Example

{
  "model": "Qwen3-8B-GGUF",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Summarize this project in three bullets."}
  ],
  "stream": true
}

cURL Example

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-8B-GGUF",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Multimodal (Image) Example

Image inputs are supported only on Qwen3.5 GGUF chat variants:

{
  "model": "Qwen3.5-4B",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "input_text", "text": "What is in this image?"},
        {"type": "input_image", "image_url": {"url": "https://example.com/cat.png"}}
      ]
    }
  ]
}

The API also supports SSE streaming, stream_options.include_usage, tool-call payloads, and strict/relaxed OpenAI compatibility profiles. See the API Reference for the full request contract and streaming sequence.

Supported Chat Models

Family	Models
Qwen3	`Qwen3-0.6B-GGUF`, `Qwen3-1.7B-GGUF`, `Qwen3-4B-GGUF`, `Qwen3-8B-GGUF`
Qwen3.5	`Qwen3.5-0.8B`, `Qwen3.5-2B`, `Qwen3.5-4B`, `Qwen3.5-9B`
LFM2.5	`LFM2.5-1.2B-Instruct-GGUF`, `LFM2.5-1.2B-Thinking-GGUF`
Gemma	`Gemma-3-1b-it`

Multimodal Limits

Multimodal media chat is currently limited to Qwen3.5 GGUF models.
Video inputs are not yet implemented.
Non-Qwen3.5 chat variants currently support text-only requests.

Tips

Use izwi list to pick a currently enabled model ID.
Use stronger models (Qwen3-8B-GGUF, Qwen3.5-9B) for harder tasks.
Use smaller models (Qwen3.5-0.8B, LFM2.5-1.2B-*) for low-latency usage.

​Overview

​Getting Started

​Download a Chat Model

​Start Chatting

​Using the CLI

​Using the Web UI

​Using the API

​Text Chat Endpoint

​Text Request Example

​cURL Example

​Multimodal (Image) Example

​Supported Chat Models

​Multimodal Limits

​Tips

​See Also