When to Use SAA
| Use SAA when… | Use diarization when… |
|---|---|
You want a readable [Speaker N]: style transcript. | You need speaker segments with start/end times. |
| Granite Speech’s language model can infer speaker turns from the audio. | You need acoustic speaker separation from the Sortformer pipeline. |
| You do not need streaming or timestamp alignment. | You need reruns, alignment metrics, and diarization-specific quality controls. |
Model Requirement
SAA currently requires:model_id, the persisted SAA workflow defaults to
Granite-Speech-4.1-2B-Plus. Supplying a different model returns a validation
error.
Using the Web UI
- Open Transcription in the sidebar.
- Choose Speaker Attributed ASR from the mode switch.
- Upload or record audio.
- Select a ready Granite Speech model.
- Choose a speaker expectation: Auto, 2+, 3+, or 4+.
- Optionally enable summary generation.
- Submit the job and review the speaker-turn transcript.
Using the API
Create a persisted SAA job withjob_kind=speaker_attributed_asr:
job_kind=saa is also accepted.
Poll the returned record until processing_status is ready:
Request Fields
JSON and multipart create requests accept:| Field | Description |
|---|---|
file / audio_base64 | Source audio upload. |
model_id / model | Optional model override. Must be Granite-Speech-4.1-2B-Plus when present. |
language | Optional language hint, such as English. |
generate_summary | Generate an AI summary after the transcript completes. Defaults to false. |
min_speakers, max_speakers | Optional speaker expectation bounds. min_speakers is what the current UI sends. |
streaminclude_timestampsword_timestampsaligner_model_id