quickstart addons

2026-01-18 13:31:11 +02:00
parent 9641621ff7
commit 7e965efd58
1 changed files with 7 additions and 5 deletions
--- a/README.md
+++ b/README.md
@@ -23,9 +23,9 @@ A diarization viewer for Whisper transcription output

 ## Quick Start

-1. Place your audio file in `input/`
-2. Place your Whisper transcript JSON in `outputs/float32/`
-3. Generate the waveform data (see below)
+1. Place your audio file in `input/` - an example is pre configured
+2. Place your Whisper transcript JSON in `outputs/float32/` - an example is pre configured
+3. Generate the waveform data (see below) - an example is pre configured
 4. Start a local server and open in browser:
   ```bash
   npx serve -p 5000
@@ -99,11 +99,12 @@ amuta-meetings/

 ## Transcription with WhisperX (GPU or CPU)

-For transcribing audio with speaker diarization, we used [WhisperX](https://github.com/m-bain/whisperX) on a rented GPU service (verda) or book one of tami's P40s, or do cpp.
+For transcribing audio with speaker diarization  
+we used [WhisperX](https://github.com/m-bain/whisperX) on a rented GPU service (verda), book one of tami's P40s, or change `--device` to cpu for non cuda machines.

 ### WhisperX CLI Command

-the code to save as json and convert to srt for quick anima runs
+the code to save as json and convert to srt for quick anima runs  
 from https://notes.nicolasdeville.com/python/library-whisperx/

 we adapted to add diarization (see below for huginface hug)
@@ -111,6 +112,7 @@ we adapted to add diarization (see below for huginface hug)

 | Argument | Description |
 |----------|-------------|
+| `--device` | Device to use for inference (`cpu` or `cuda`) |
 | `--model` | Whisper model: `large-v3` (best quality), `turbo` (fastest), `large-v2` |
 | `--language` | Source language code (e.g., `en` for English,  country ISO codes) |
 | `--diarize` | Enable speaker diarization (requires HuggingFace token) |