Synopsis
Description
Aligns reference text to audio, producing word-level timestamps. Useful for:- Subtitle generation
- Karaoke timing
- Audio editing
- Pronunciation analysis
Arguments
| Argument | Description |
|---|---|
<FILE> | Audio file to align |
<TEXT> | Reference text to align |
Options
| Option | Description | Default |
|---|---|---|
-m, --model <MODEL> | Alignment model | qwen3-forcedaligner-0.6b |
-f, --format <FORMAT> | Output format: text, json, verbose_json | json |
-o, --output <PATH> | Output file (default: stdout) | — |
Examples
Basic alignment
Save to file
Text output
Output Formats
JSON (default)
Text
Use Cases
Subtitle Generation
Generate precise timestamps for subtitles:Audio Editing
Find exact word boundaries for editing:Pronunciation Analysis
Analyze timing of spoken words:Available Models
| Model | Description |
|---|---|
qwen3-forcedaligner-0.6b | CLI default alias |
Qwen3-ForcedAligner-0.6B | Canonical forced aligner model ID |
Qwen3-ForcedAligner-0.6B-4bit | Lower-memory variant |
See Also
izwi transcribe— Speech-to-textizwi diarize— Speaker diarization