quickstart addons

This commit is contained in:
5shekel
2026-01-18 13:31:11 +02:00
parent 9641621ff7
commit 7e965efd58

View File

@@ -23,9 +23,9 @@ A diarization viewer for Whisper transcription output
## Quick Start
1. Place your audio file in `input/`
2. Place your Whisper transcript JSON in `outputs/float32/`
3. Generate the waveform data (see below)
1. Place your audio file in `input/` - an example is pre configured
2. Place your Whisper transcript JSON in `outputs/float32/` - an example is pre configured
3. Generate the waveform data (see below) - an example is pre configured
4. Start a local server and open in browser:
```bash
npx serve -p 5000
@@ -99,11 +99,12 @@ amuta-meetings/
## Transcription with WhisperX (GPU or CPU)
For transcribing audio with speaker diarization, we used [WhisperX](https://github.com/m-bain/whisperX) on a rented GPU service (verda) or book one of tami's P40s, or do cpp.
For transcribing audio with speaker diarization
we used [WhisperX](https://github.com/m-bain/whisperX) on a rented GPU service (verda), book one of tami's P40s, or change `--device` to cpu for non cuda machines.
### WhisperX CLI Command
the code to save as json and convert to srt for quick anima runs
the code to save as json and convert to srt for quick anima runs
from https://notes.nicolasdeville.com/python/library-whisperx/
we adapted to add diarization (see below for huginface hug)
@@ -111,6 +112,7 @@ we adapted to add diarization (see below for huginface hug)
| Argument | Description |
|----------|-------------|
| `--device` | Device to use for inference (`cpu` or `cuda`) |
| `--model` | Whisper model: `large-v3` (best quality), `turbo` (fastest), `large-v2` |
| `--language` | Source language code (e.g., `en` for English, country ISO codes) |
| `--diarize` | Enable speaker diarization (requires HuggingFace token) |