# Amuta Space Talkers A diarization viewer for Whisper transcription output, featuring a visual "space" display of speakers and waveform-based audio navigation. ## Features - **Speaker visualization**: Speakers displayed as animated orbs in a starfield - **Real-time transcription**: Live transcript panel following audio playback - **Waveform navigation**: Click/drag on the waveform to seek through the audio - **Keyboard controls**: Space to play/pause, Arrow keys to seek ## Quick Start 1. Place your audio file in `input/` 2. Place your Whisper transcript JSON in `outputs/float32/` 3. Generate the waveform data (see below) 4. Open `index.html` in a browser ## Waveform Generation For optimal performance with long audio files (1-3 hours), waveform data is pre-generated as JSON rather than decoded in the browser. ### Prerequisites - [Node.js](https://nodejs.org/) (v14+) - [FFmpeg](https://ffmpeg.org/) installed and available in PATH ### Generate Waveform Data ```bash node scripts/generate-waveform.js [output-json] [columns] ``` **Arguments:** - `input-audio` - Path to the audio file (opus, mp3, wav, etc.) - `output-json` - Output path for waveform JSON (default: `.waveform.json`) - `columns` - Number of waveform columns/peaks (default: 1000) **Example:** ```bash # Generate waveform for a single file node scripts/generate-waveform.js input/amuta_2026-01-12_1.opus outputs/float32/amuta_2026-01-12_1.waveform.json # Or let it auto-generate the output path node scripts/generate-waveform.js input/meeting.opus ``` ### Waveform JSON Format The generated JSON file has this structure: ```json { "version": 1, "source": "meeting.opus", "duration": 7200.5, "sampleRate": 48000, "columns": 1000, "peaks": [ { "min": -0.82, "max": 0.91 }, { "min": -0.45, "max": 0.52 } ] } ``` | Field | Description | |-------|-------------| | `version` | Schema version for future compatibility | | `source` | Original audio filename | | `duration` | Audio duration in seconds | | `sampleRate` | Original sample rate | | `columns` | Number of data points | | `peaks` | Array of min/max amplitude pairs | ## Configuration Edit the paths at the top of `app.js`: ```javascript const transcriptPath = "outputs/float32/amuta_2026-01-12_1.json"; const waveformPath = "outputs/float32/amuta_2026-01-12_1.waveform.json"; ``` ### Speaker Labels Map speaker IDs to display names in `app.js`: ```javascript const SPEAKER_LABELS = { "SPEAKER_01": "Maya", "SPEAKER_02": "David", }; ``` ## Keyboard Shortcuts | Key | Action | |-----|--------| | `Space` | Play/Pause | | `←` / `A` | Seek back 10 seconds | | `→` / `D` | Seek forward 10 seconds | | `Shift` + `←`/`→` | Seek 60 seconds | ## File Structure ``` amuta-meetings/ ├── index.html # Main HTML page ├── app.js # Application logic ├── styles.css # Styles ├── scripts/ │ └── generate-waveform.js # Waveform generator script ├── input/ # Audio files (gitignored) ├── outputs/ │ └── float32/ # Transcript and waveform JSON └── plans/ └── waveform-optimization.md # Architecture documentation ``` ## Performance Notes - **Waveform JSON (~20KB)** loads in milliseconds vs decoding 50-100MB audio in 5-15 seconds - The waveform is loaded immediately on page load for instant display - Audio is only downloaded once (by the `