space-talkers/README.md

# Amuta Space Talkers

A diarization viewer for Whisper transcription output, featuring a visual "space" display of speakers and waveform-based audio navigation.

## Features

- **Speaker visualization**: Speakers displayed as animated orbs in a starfield
- **Real-time transcription**: Live transcript panel following audio playback
- **Waveform navigation**: Click/drag on the waveform to seek through the audio
- **Keyboard controls**: Space to play/pause, Arrow keys to seek

## Quick Start

1. Place your audio file in `input/`
2. Place your Whisper transcript JSON in `outputs/float32/`
3. Generate the waveform data (see below)
4. Open `index.html` in a browser

## Waveform Generation

For optimal performance with long audio files (1-3 hours), waveform data is pre-generated as JSON rather than decoded in the browser.

### Prerequisites

- [Node.js](https://nodejs.org/) (v14+)
- [FFmpeg](https://ffmpeg.org/) installed and available in PATH

### Generate Waveform Data

```bash
node scripts/generate-waveform.js <input-audio> [output-json] [columns]
```

**Arguments:**
- `input-audio` - Path to the audio file (opus, mp3, wav, etc.)
- `output-json` - Output path for waveform JSON (default: `<input>.waveform.json`)
- `columns` - Number of waveform columns/peaks (default: 1000)

**Example:**

```bash
# Generate waveform for a single file
node scripts/generate-waveform.js input/amuta_2026-01-12_1.opus outputs/float32/amuta_2026-01-12_1.waveform.json

# Or let it auto-generate the output path
node scripts/generate-waveform.js input/meeting.opus
```

### Waveform JSON Format

The generated JSON file has this structure:

```json
{
  "version": 1,
  "source": "meeting.opus",
  "duration": 7200.5,
  "sampleRate": 48000,
  "columns": 1000,
  "peaks": [
    { "min": -0.82, "max": 0.91 },
    { "min": -0.45, "max": 0.52 }
  ]
}
```

| Field | Description |
|-------|-------------|
| `version` | Schema version for future compatibility |
| `source` | Original audio filename |
| `duration` | Audio duration in seconds |
| `sampleRate` | Original sample rate |
| `columns` | Number of data points |
| `peaks` | Array of min/max amplitude pairs |

## Configuration

Edit the paths at the top of `app.js`:

```javascript
const transcriptPath = "outputs/float32/amuta_2026-01-12_1.json";
const waveformPath = "outputs/float32/amuta_2026-01-12_1.waveform.json";
```

### Speaker Labels

Map speaker IDs to display names in `app.js`:

```javascript
const SPEAKER_LABELS = {
  "SPEAKER_01": "Maya",
  "SPEAKER_02": "David",
};
```

## Keyboard Shortcuts

| Key | Action |
|-----|--------|
| `Space` | Play/Pause |
| `←` / `A` | Seek back 10 seconds |
| `→` / `D` | Seek forward 10 seconds |
| `Shift` + `←`/`→` | Seek 60 seconds |

## File Structure

```
amuta-meetings/
├── index.html              # Main HTML page
├── app.js                  # Application logic
├── styles.css              # Styles
├── scripts/
│   └── generate-waveform.js  # Waveform generator script
├── input/                  # Audio files (gitignored)
├── outputs/
│   └── float32/            # Transcript and waveform JSON
└── plans/
    └── waveform-optimization.md  # Architecture documentation
```

## Performance Notes

- **Waveform JSON (~20KB)** loads in milliseconds vs decoding 50-100MB audio in 5-15 seconds
- The waveform is loaded immediately on page load for instant display
- Audio is only downloaded once (by the `<audio>` element)