Start the Izwi inference server.

Synopsis

izwi serve [OPTIONS]

Description

Launches the local HTTP API server that powers Izwi. The server also serves a local API reference:
  • Scalar API reference: http://<host>:<port>/docs
  • OpenAPI JSON: http://<host>:<port>/openapi.json
The local Scalar/OpenAPI reference covers the stable OpenAI-compatible contract, the /v1/responses preview contract, readiness probes, and sidebar entries for preview first-party, operator, and realtime route families. Detailed route behavior is documented in the API Reference. Resolution order for serve/runtime settings is:
  1. CLI flags
  2. Environment variables
  3. config.toml
  4. Built-in defaults
The resolved settings are passed through to the spawned izwi-server process, so izwi serve and izwi-server now run with the same runtime contract.

Options

OptionDescriptionBuilt-in default
--mode <MODE>Startup mode: server, desktop, webserver
-H, --host <HOST>Host to bind to0.0.0.0
-p, --port <PORT>Port to listen on8080
-m, --models-dir <PATH>Models directoryPlatform default
--max-batch-size <N>Maximum batch size8
--backend <BACKEND>Backend preference: auto, cpu, metal, cudaauto
-t, --threads <N>Number of CPU threadsAuto (available_parallelism, capped at 8)
--max-concurrent <N>Max concurrent requests100
--timeout <SECONDS>Request timeout300
--log-level <LEVEL>Log levelwarn
--log-format <FORMAT>Log output format: text or jsontext
--corsEnable wildcard CORS responsesDisabled
--no-uiDisable static web UI servingUI enabled

Modes

Server Mode

Starts only the HTTP server:
izwi serve
izwi serve --mode server

Desktop Mode

Starts the server and opens the native desktop application:
izwi serve --mode desktop

Web Mode

Starts the server and opens the browser.
izwi serve --mode web
When --no-ui is set, web mode opens http://<host>:<port>/v1/ready instead of the UI root.

Examples

Basic server

izwi serve

Custom runtime settings

izwi serve \
  --host 127.0.0.1 \
  --port 9000 \
  --backend metal \
  --max-batch-size 16 \
  --threads 6

Development browser access

izwi serve --cors --log-level debug

API-only mode

izwi serve --mode web --no-ui

Environment Variables

VariableEquivalent Option
IZWI_HOST--host
IZWI_PORT--port
IZWI_MODELS_DIR--models-dir
IZWI_BACKEND--backend
IZWI_MAX_BATCH_SIZE--max-batch-size
IZWI_NUM_THREADS--threads
IZWI_MAX_CONCURRENT--max-concurrent
IZWI_TIMEOUT--timeout
IZWI_CORS--cors
IZWI_CORS_ORIGINSConfig-only CORS origin list
IZWI_NO_UI--no-ui
IZWI_UI_DIRConfig-only UI build directory
RUST_LOG--log-level
IZWI_LOG_FORMAT--log-format
IZWI_SERVE_MODE--mode
Legacy aliases still resolve for one release cycle:
  • MAX_CONCURRENT_REQUESTS
  • REQUEST_TIMEOUT_SECS

Configuration File

izwi serve also reads config.toml. Supported runtime keys:
[server]
host = "0.0.0.0"
port = 8080
cors = false
cors_origins = ["http://localhost:3000"]

[models]
dir = "/path/to/models"

[runtime]
backend = "auto"
max_batch_size = 8
threads = 8
max_concurrent = 100
timeout = 300

[ui]
enabled = true
dir = "ui/dist"

Logging

Text logs are the default. Use JSON logs when a collector needs machine-readable application events:
izwi serve --log-format json
--log-format json controls Izwi’s application log lines. Docker’s json-file logging driver only frames container stdout/stderr and is separate from the application log format. JSON logs include the normal tracing fields such as timestamp, level, target, and message. HTTP request events also include service, version, correlation ID, method, path, status, latency in milliseconds, and error fields when a request fails.

Graceful Shutdown

Press Ctrl+C to gracefully shut down the server. Active requests finish before shutdown. During startup, izwi serve waits for the readiness endpoint before opening desktop or web clients. Use /readyz for deployment readiness checks and /livez for cheap liveness checks; /v1/health remains the richer status payload used by izwi status.

See Also