This page is the public support contract for Izwi’s current runtime surfaces. It answers four questions:
  1. Which OS and hardware combinations are supported?
  2. Which shipped artifact types expose which backends?
  3. Which deployment targets are considered supported?
  4. Which API surfaces are stable vs preview?
If another page says something different, this page should win.

Backend Matrix

SurfaceOS / HardwareBackend statusSupport levelNotes
Desktop app from GitHub ReleasesmacOS on Apple SiliconmetalStableDesktop and terminal binaries bundled in the macOS release can use Metal acceleration.
Desktop app from GitHub ReleasesLinux x86_64cpuStableNative Linux installers are CPU-only and do not bundle CUDA runtime libraries.
Desktop app from GitHub ReleasesWindows x86_64cpuStableNative Windows installers are CPU-only and do not bundle CUDA runtime DLLs.
Terminal bundle from GitHub ReleasesLinux x86_64cpuStableLinux terminal tarballs contain the public CPU-only CLI, server, and desktop shell binaries.
Terminal bundle from GitHub ReleasesmacOS Apple SiliconmetalStableMetal is compiled into the macOS build path.
Terminal bundle from GitHub ReleasesWindows x86_64cpuStableWindows terminal zips contain the public CPU-only CLI, server, and desktop shell binaries.
Source buildmacOS Apple Silicon with --features metalmetalStableRecommended GPU path on macOS.
Source buildLinux x86_64 with --features cuda and CUDA toolkit installedcudaSupportedUseful for development, custom builds, and debugging outside Docker. Requires a compatible NVIDIA driver/toolkit environment.
Source buildWindows with --features cuda and CUDA toolkit installedcudaPreviewUseful for development and custom validation. Native Windows release artifacts remain CPU-only.
Docker production targetLinux x86_64cpuStableCPU-only container image.
Docker production-cuda target / docker compose --profile cudaLinux x86_64 + NVIDIA GPUcudaPreviewShipped CUDA binary path. The final image is based on nvidia/cuda:12.4.1-cudnn-runtime-ubuntu22.04. When building on a machine without nvidia-smi, set CUDA_COMPUTE_CAP for the target GPU architecture.

Deployment Matrix

Deployment targetStatusNotes
Single-user macOS desktop evaluationStableBest-supported path for local evaluation.
Single-host Linux server on CPUStableSupported via GitHub Release packages, source builds, and the Docker CPU image.
Single-host Linux server on NVIDIA GPUSupported / Preview by artifactUse the Docker CUDA image/profile, or build from source with --features cuda. Native Linux release artifacts are CPU-only.
Windows desktop evaluationStable CPUNative Windows release artifacts are CPU-only. CUDA on Windows is source-build preview only.
Docker Compose on CPUStableUse the default izwi service.
Docker Compose on NVIDIA GPUPreviewUse docker compose --profile cuda up; the profile runs the izwi-cuda service and may require CUDA_COMPUTE_CAP when built on a non-GPU machine.
Kubernetes / Helm / multi-node production orchestrationNot yet supportedNot published in OSS today.

API Surface Maturity

The runtime exposes both compatibility APIs and first-party local workflow APIs under /v1. When the server is running, open /docs for the local Scalar API reference or /openapi.json for the raw OpenAPI document. The generated OpenAPI document covers the stable OpenAI-compatible contract, /v1/responses preview routes, readiness probes, and Scalar sidebar entries for preview first-party, operator, and realtime route families. Detailed preview behavior is documented in the API Reference.
SurfaceStatusNotes
POST /v1/audio/speechStableCore OpenAI-compatible TTS surface. Native OSS output formats are WAV and raw PCM; recognized compressed names require explicit WAV fallback opt-in until bundled encoders are added.
POST /v1/audio/transcriptionsStableCore OpenAI-compatible transcription surface.
POST /v1/audio/alignStableIzwi extension for word-level forced alignment of reference text to audio.
POST /v1/chat/completionsStableCore OpenAI-compatible chat surface.
GET /v1/modelsStableLive model catalog / availability surface.
Operational probes (/livez, /readyz, /v1/live, /v1/ready)StableUse /livez for cheap liveness and /readyz for readiness or deployment healthchecks. /v1/health remains the richer status payload.
Local OpenAPI reference (/docs, /openapi.json)StableServed by the same izwi-server process for the OpenAI-compatible contract, probes, and Scalar navigation for preview route families.
Markdown API reference (/docs/api on the website, docs/user/api.md in the repo)StableProvides detailed behavior for the broader preview first-party, operator, and realtime route surface.
Local CLI workflows (izwi serve, izwi pull, izwi tts, izwi transcribe)StablePrimary user-facing local runtime workflows.
POST /v1/responses and response-object lifecycle routesPreviewResponse objects are stored in bounded process memory for compatibility convenience. store:false skips retention; retained records can be evicted and are lost on server restart.
/v1/admin/models* model-management APIsPreviewOperator-oriented local model lifecycle and capability APIs; auth and long-term contract are not finalized.
Persisted speech and voice workflow APIs (/v1/speech-to-text/jobs*, /v1/diarizations*, /v1/text-to-speech*, /v1/voice-designs*, /v1/voice-clones*, /v1/voices*, /v1/studio/*)PreviewPowerful local product APIs, but the public compatibility/support contract is still evolving. Both speech-to-text diarization jobs and direct diarization records are supported first-party surfaces.
Local chat, agent, and voice state APIs (/v1/chat/threads*, /v1/agent/sessions*, /v1/voice/profile, /v1/voice/observations, /v1/voice/sessions*)PreviewAgent session metadata is process-local and bounded today. Linked chat threads, voice sessions, voice turns, and voice observations are the durable SQLite-backed local stores. Voice sessions now include REST create/update/end/delete/turn-list/export controls for external apps.
Local media lifecycle (/v1/media*)PreviewOSS local media can be listed, uploaded from base64 payloads, downloaded by catch-all relative path, and deleted. Provider-backed object storage can wrap the same route family; listing may be unavailable unless the provider exposes a local media root.
Realtime WebSocket APIs (/v1/speech-to-text/realtime/ws, /v1/voice/realtime/ws)PreviewLow-latency browser-facing protocols for streaming transcription and voice AI conversations.

CUDA Caveats

  • Linux and Windows GitHub Releases keep public binary names unchanged: izwi and izwi-server on Linux, izwi.exe and izwi-server.exe on Windows.
  • Linux and Windows GitHub Release artifacts are CPU-only and must not contain CUDA runtime libraries or private CUDA binaries.
  • Release installers do not replace the host NVIDIA driver. CUDA acceleration requires a compatible NVIDIA driver and CUDA-capable GPU.
  • Source builds still require the CUDA toolkit and remain useful for development or fallback validation.
  • The Docker CUDA image/profile is the CUDA distribution path for NVIDIA Linux hosts and may require CUDA_COMPUTE_CAP when built on a machine without nvidia-smi.
  • On macOS, the recommended GPU path is Metal, not CUDA.

Verification Guidance

Use the following expectations when validating a host:
  • macOS Apple Silicon: build or install a Metal-capable binary and run with --backend metal or IZWI_BACKEND=metal.
  • Linux/Windows GitHub Release: run izwi serve --backend cpu, then izwi status --detailed.
  • Docker CUDA on NVIDIA Linux hosts: run docker compose --profile cuda up, then confirm the container selects CUDA through /v1/health or izwi status --detailed from a matching client environment.
  • Linux/Windows source build for CUDA: build with cargo build --release --features cuda, then run with --backend cuda or IZWI_BACKEND=cuda. Whisper CUDA experiments can additionally enable Candle-backed features such as flash-attn or cudnn when the matching NVIDIA libraries are installed.

See Also