126 lines
3.4 KiB
Markdown
126 lines
3.4 KiB
Markdown
# Amuta Space Talkers
|
|
|
|
A diarization viewer for Whisper transcription output, featuring a visual "space" display of speakers and waveform-based audio navigation.
|
|
|
|
## Features
|
|
|
|
- **Speaker visualization**: Speakers displayed as animated orbs in a starfield
|
|
- **Real-time transcription**: Live transcript panel following audio playback
|
|
- **Waveform navigation**: Click/drag on the waveform to seek through the audio
|
|
- **Keyboard controls**: Space to play/pause, Arrow keys to seek
|
|
|
|
## Quick Start
|
|
|
|
1. Place your audio file in `input/`
|
|
2. Place your Whisper transcript JSON in `outputs/float32/`
|
|
3. Generate the waveform data (see below)
|
|
4. Open `index.html` in a browser
|
|
|
|
## Waveform Generation
|
|
|
|
For optimal performance with long audio files (1-3 hours), waveform data is pre-generated as JSON rather than decoded in the browser.
|
|
|
|
### Prerequisites
|
|
|
|
- [Node.js](https://nodejs.org/) (v14+)
|
|
- [FFmpeg](https://ffmpeg.org/) installed and available in PATH
|
|
|
|
### Generate Waveform Data
|
|
|
|
```bash
|
|
node scripts/generate-waveform.js <input-audio> [output-json] [columns]
|
|
```
|
|
|
|
**Arguments:**
|
|
- `input-audio` - Path to the audio file (opus, mp3, wav, etc.)
|
|
- `output-json` - Output path for waveform JSON (default: `<input>.waveform.json`)
|
|
- `columns` - Number of waveform columns/peaks (default: 1000)
|
|
|
|
**Example:**
|
|
|
|
```bash
|
|
# Generate waveform for a single file
|
|
node scripts/generate-waveform.js input/amuta_2026-01-12_1.opus outputs/float32/amuta_2026-01-12_1.waveform.json
|
|
|
|
# Or let it auto-generate the output path
|
|
node scripts/generate-waveform.js input/meeting.opus
|
|
```
|
|
|
|
### Waveform JSON Format
|
|
|
|
The generated JSON file has this structure:
|
|
|
|
```json
|
|
{
|
|
"version": 1,
|
|
"source": "meeting.opus",
|
|
"duration": 7200.5,
|
|
"sampleRate": 48000,
|
|
"columns": 1000,
|
|
"peaks": [
|
|
{ "min": -0.82, "max": 0.91 },
|
|
{ "min": -0.45, "max": 0.52 }
|
|
]
|
|
}
|
|
```
|
|
|
|
| Field | Description |
|
|
|-------|-------------|
|
|
| `version` | Schema version for future compatibility |
|
|
| `source` | Original audio filename |
|
|
| `duration` | Audio duration in seconds |
|
|
| `sampleRate` | Original sample rate |
|
|
| `columns` | Number of data points |
|
|
| `peaks` | Array of min/max amplitude pairs |
|
|
|
|
## Configuration
|
|
|
|
Edit the paths at the top of `app.js`:
|
|
|
|
```javascript
|
|
const transcriptPath = "outputs/float32/amuta_2026-01-12_1.json";
|
|
const waveformPath = "outputs/float32/amuta_2026-01-12_1.waveform.json";
|
|
```
|
|
|
|
### Speaker Labels
|
|
|
|
Map speaker IDs to display names in `app.js`:
|
|
|
|
```javascript
|
|
const SPEAKER_LABELS = {
|
|
"SPEAKER_01": "Maya",
|
|
"SPEAKER_02": "David",
|
|
};
|
|
```
|
|
|
|
## Keyboard Shortcuts
|
|
|
|
| Key | Action |
|
|
|-----|--------|
|
|
| `Space` | Play/Pause |
|
|
| `←` / `A` | Seek back 10 seconds |
|
|
| `→` / `D` | Seek forward 10 seconds |
|
|
| `Shift` + `←`/`→` | Seek 60 seconds |
|
|
|
|
## File Structure
|
|
|
|
```
|
|
amuta-meetings/
|
|
├── index.html # Main HTML page
|
|
├── app.js # Application logic
|
|
├── styles.css # Styles
|
|
├── scripts/
|
|
│ └── generate-waveform.js # Waveform generator script
|
|
├── input/ # Audio files (gitignored)
|
|
├── outputs/
|
|
│ └── float32/ # Transcript and waveform JSON
|
|
└── plans/
|
|
└── waveform-optimization.md # Architecture documentation
|
|
```
|
|
|
|
## Performance Notes
|
|
|
|
- **Waveform JSON (~20KB)** loads in milliseconds vs decoding 50-100MB audio in 5-15 seconds
|
|
- The waveform is loaded immediately on page load for instant display
|
|
- Audio is only downloaded once (by the `<audio>` element)
|