Amuta Space Talkers

A diarization viewer for Whisper transcription output, featuring a visual "space" display of speakers and waveform-based audio navigation.

Demo Screenshot

Features

  • Speaker visualization: Speakers displayed as animated orbs in a starfield
  • Real-time transcription: Live transcript panel following audio playback
  • Waveform navigation: Click/drag on the waveform to seek through the audio
  • Keyboard controls: Space to play/pause, Arrow keys to seek

Quick Start

  1. Place your audio file in input/
  2. Place your Whisper transcript JSON in outputs/float32/
  3. Generate the waveform data (see below)
  4. Start a local server and open in browser:
    npx serve -p 5000
    
    Then navigate to http://localhost:5000

Waveform Generation

For optimal performance with long audio files (1-3 hours), waveform data is pre-generated as JSON rather than decoded in the browser.

Prerequisites

Generate Waveform Data

node scripts/generate-waveform.js <input-audio> [output-json] [columns]

Arguments:

  • input-audio - Path to the audio file (opus, mp3, wav, etc.)
  • output-json - Output path for waveform JSON (default: <input>.waveform.json)
  • columns - Number of waveform columns/peaks (default: 1000)

Example:

# Generate waveform for a single file
node scripts/generate-waveform.js input/amuta_2026-01-12_1.opus outputs/float32/amuta_2026-01-12_1.waveform.json

# Or let it auto-generate the output path
node scripts/generate-waveform.js input/meeting.opus

Waveform JSON Format

The generated JSON file has this structure:

{
  "version": 1,
  "source": "meeting.opus",
  "duration": 7200.5,
  "sampleRate": 48000,
  "columns": 1000,
  "peaks": [
    { "min": -0.82, "max": 0.91 },
    { "min": -0.45, "max": 0.52 }
  ]
}
Field Description
version Schema version for future compatibility
source Original audio filename
duration Audio duration in seconds
sampleRate Original sample rate
columns Number of data points
peaks Array of min/max amplitude pairs

Configuration

Edit the paths at the top of app.js:

const transcriptPath = "outputs/float32/amuta_2026-01-12_1.json";
const waveformPath = "outputs/float32/amuta_2026-01-12_1.waveform.json";

Speaker Labels

Map speaker IDs to display names in app.js:

const SPEAKER_LABELS = {
  "SPEAKER_01": "Maya",
  "SPEAKER_02": "David",
};

Keyboard Shortcuts

Key Action
Space Play/Pause
/ A Seek back 10 seconds
/ D Seek forward 10 seconds
Shift + / Seek 60 seconds

File Structure

amuta-meetings/
├── index.html              # Main HTML page
├── app.js                  # Application logic
├── styles.css              # Styles
├── scripts/
│   └── generate-waveform.js  # Waveform generator script
├── input/                  # Audio files (gitignored)
├── outputs/
│   └── float32/            # Transcript and waveform JSON
└── plans/
    └── waveform-optimization.md  # Architecture documentation

Performance Notes

  • Waveform JSON (~20KB) loads in milliseconds vs decoding 50-100MB audio in 5-15 seconds
  • The waveform is loaded immediately on page load for instant display
  • Audio is only downloaded once (by the <audio> element)
Description
effort to visualize serialize and Diarizationize meetings
Readme 21 MiB
Languages
JavaScript 70.1%
Python 14.3%
CSS 10.5%
HTML 5.1%