Synopsis
Description
Analyzes audio to identify different speakers and when they spoke. Optionally includes transcription with speaker labels.Arguments
| Argument | Description |
|---|---|
<FILE> | Audio file to analyze |
Options
| Option | Description | Default |
|---|---|---|
-m, --model <MODEL> | Diarization model | sortformer-4spk |
-n, --num-speakers <N> | Expected number of speakers | Auto-detect |
-f, --format <FORMAT> | Output format: text, json, verbose_json | text |
-o, --output <PATH> | Output file (default: stdout) | — |
--transcribe | Compatibility flag (transcript output is included by default) | — |
--asr-model <MODEL> | ASR model used for transcript generation | parakeet-tdt-0.6b-v3 |
Examples
Basic diarization
With known speaker count
Transcript output (default behavior)
JSON output
Full pipeline with custom models
Output Formats
Text
JSON
Verbose JSON (with transcription)
Available Models
| Model | Description |
|---|---|
sortformer-4spk | Alias for diar_streaming_sortformer_4spk-v2.1 (default) |
diar_streaming_sortformer_4spk-v2.1 | Canonical Sortformer model ID |
See Also
- Diarization Guide
izwi transcribe— Single-speaker transcription