diff --git a/README.md b/README.md index f8e1d8c..b0589f4 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ -# Amuta Space Talkers +# Space Talkers -A diarization viewer for Whisper transcription output, featuring a visual "space" display of speakers and waveform-based audio navigation. +A diarization viewer for Whisper transcription output ![Demo Screenshot](screenshots/demo.jpg) @@ -11,6 +11,16 @@ A diarization viewer for Whisper transcription output, featuring a visual "space - **Waveform navigation**: Click/drag on the waveform to seek through the audio - **Keyboard controls**: Space to play/pause, Arrow keys to seek + +## Keyboard Shortcuts + +| Key | Action | +|-----|--------| +| `Space` | Play/Pause | +| `←` / `A` | Seek back 10 seconds | +| `→` / `D` | Seek forward 10 seconds | +| `Shift` + `←`/`→` | Seek 60 seconds | + ## Quick Start 1. Place your audio file in `input/` @@ -51,34 +61,6 @@ node scripts/generate-waveform.js input/amuta_2026-01-12_1.opus outputs/float32/ # Or let it auto-generate the output path node scripts/generate-waveform.js input/meeting.opus ``` - -### Waveform JSON Format - -The generated JSON file has this structure: - -```json -{ - "version": 1, - "source": "meeting.opus", - "duration": 7200.5, - "sampleRate": 48000, - "columns": 1000, - "peaks": [ - { "min": -0.82, "max": 0.91 }, - { "min": -0.45, "max": 0.52 } - ] -} -``` - -| Field | Description | -|-------|-------------| -| `version` | Schema version for future compatibility | -| `source` | Original audio filename | -| `duration` | Audio duration in seconds | -| `sampleRate` | Original sample rate | -| `columns` | Number of data points | -| `peaks` | Array of min/max amplitude pairs | - ## Configuration Edit the paths at the top of `app.js`: @@ -91,22 +73,15 @@ const waveformPath = "outputs/float32/amuta_2026-01-12_1.waveform.json"; ### Speaker Labels Map speaker IDs to display names in `app.js`: +supports merging names ```javascript const SPEAKER_LABELS = { "SPEAKER_01": "Maya", - "SPEAKER_02": "David", + "SPEAKER_02": "SPEAKER_23","SPEAKER_4", }; ``` -## Keyboard Shortcuts - -| Key | Action | -|-----|--------| -| `Space` | Play/Pause | -| `←` / `A` | Seek back 10 seconds | -| `→` / `D` | Seek forward 10 seconds | -| `Shift` + `←`/`→` | Seek 60 seconds | ## File Structure @@ -120,53 +95,28 @@ amuta-meetings/ ├── input/ # Audio files (gitignored) ├── outputs/ │ └── float32/ # Transcript and waveform JSON -└── plans/ - └── waveform-optimization.md # Architecture documentation ``` -## Performance Notes +## Transcription with WhisperX (GPU or CPU) -- **Waveform JSON (~20KB)** loads in milliseconds vs decoding 50-100MB audio in 5-15 seconds -- The waveform is loaded immediately on page load for instant display -- Audio is only downloaded once (by the `