b1d9c8eb861c9c8356bc13467bafa92a022f8905
Amuta Space Talkers
A diarization viewer for Whisper transcription output, featuring a visual "space" display of speakers and waveform-based audio navigation.
Features
- Speaker visualization: Speakers displayed as animated orbs in a starfield
- Real-time transcription: Live transcript panel following audio playback
- Waveform navigation: Click/drag on the waveform to seek through the audio
- Keyboard controls: Space to play/pause, Arrow keys to seek
Quick Start
- Place your audio file in
input/ - Place your Whisper transcript JSON in
outputs/float32/ - Generate the waveform data (see below)
- Start a local server and open in browser:
Then navigate to http://localhost:5000
npx serve -p 5000
Waveform Generation
For optimal performance with long audio files (1-3 hours), waveform data is pre-generated as JSON rather than decoded in the browser.
Prerequisites
Generate Waveform Data
node scripts/generate-waveform.js <input-audio> [output-json] [columns]
Arguments:
input-audio- Path to the audio file (opus, mp3, wav, etc.)output-json- Output path for waveform JSON (default:<input>.waveform.json)columns- Number of waveform columns/peaks (default: 1000)
Example:
# Generate waveform for a single file
node scripts/generate-waveform.js input/amuta_2026-01-12_1.opus outputs/float32/amuta_2026-01-12_1.waveform.json
# Or let it auto-generate the output path
node scripts/generate-waveform.js input/meeting.opus
Waveform JSON Format
The generated JSON file has this structure:
{
"version": 1,
"source": "meeting.opus",
"duration": 7200.5,
"sampleRate": 48000,
"columns": 1000,
"peaks": [
{ "min": -0.82, "max": 0.91 },
{ "min": -0.45, "max": 0.52 }
]
}
| Field | Description |
|---|---|
version |
Schema version for future compatibility |
source |
Original audio filename |
duration |
Audio duration in seconds |
sampleRate |
Original sample rate |
columns |
Number of data points |
peaks |
Array of min/max amplitude pairs |
Configuration
Edit the paths at the top of app.js:
const transcriptPath = "outputs/float32/amuta_2026-01-12_1.json";
const waveformPath = "outputs/float32/amuta_2026-01-12_1.waveform.json";
Speaker Labels
Map speaker IDs to display names in app.js:
const SPEAKER_LABELS = {
"SPEAKER_01": "Maya",
"SPEAKER_02": "David",
};
Keyboard Shortcuts
| Key | Action |
|---|---|
Space |
Play/Pause |
← / A |
Seek back 10 seconds |
→ / D |
Seek forward 10 seconds |
Shift + ←/→ |
Seek 60 seconds |
File Structure
amuta-meetings/
├── index.html # Main HTML page
├── app.js # Application logic
├── styles.css # Styles
├── scripts/
│ └── generate-waveform.js # Waveform generator script
├── input/ # Audio files (gitignored)
├── outputs/
│ └── float32/ # Transcript and waveform JSON
└── plans/
└── waveform-optimization.md # Architecture documentation
Performance Notes
- Waveform JSON (~20KB) loads in milliseconds vs decoding 50-100MB audio in 5-15 seconds
- The waveform is loaded immediately on page load for instant display
- Audio is only downloaded once (by the
<audio>element)
Description
Languages
JavaScript
70.1%
Python
14.3%
CSS
10.5%
HTML
5.1%
