diff --git a/README.md b/README.md index b0589f4..9e0254e 100644 --- a/README.md +++ b/README.md @@ -23,9 +23,9 @@ A diarization viewer for Whisper transcription output ## Quick Start -1. Place your audio file in `input/` -2. Place your Whisper transcript JSON in `outputs/float32/` -3. Generate the waveform data (see below) +1. Place your audio file in `input/` - an example is pre configured +2. Place your Whisper transcript JSON in `outputs/float32/` - an example is pre configured +3. Generate the waveform data (see below) - an example is pre configured 4. Start a local server and open in browser: ```bash npx serve -p 5000 @@ -99,11 +99,12 @@ amuta-meetings/ ## Transcription with WhisperX (GPU or CPU) -For transcribing audio with speaker diarization, we used [WhisperX](https://github.com/m-bain/whisperX) on a rented GPU service (verda) or book one of tami's P40s, or do cpp. +For transcribing audio with speaker diarization +we used [WhisperX](https://github.com/m-bain/whisperX) on a rented GPU service (verda), book one of tami's P40s, or change `--device` to cpu for non cuda machines. ### WhisperX CLI Command -the code to save as json and convert to srt for quick anima runs +the code to save as json and convert to srt for quick anima runs from https://notes.nicolasdeville.com/python/library-whisperx/ we adapted to add diarization (see below for huginface hug) @@ -111,6 +112,7 @@ we adapted to add diarization (see below for huginface hug) | Argument | Description | |----------|-------------| +| `--device` | Device to use for inference (`cpu` or `cuda`) | | `--model` | Whisper model: `large-v3` (best quality), `turbo` (fastest), `large-v2` | | `--language` | Source language code (e.g., `en` for English, country ISO codes) | | `--diarize` | Enable speaker diarization (requires HuggingFace token) |