upload to earbuds

Audio pipeline.

Every Hook Object starts as protected audio, then becomes timed playback, atmosphere, lyrics, and share state.

The product rule is simple: a listener should instantly feel the hook, but unpurchased listeners can't capture the full track. The pipeline is designed to make that boundary enforceable on the server while still making the public Profile World feel immediate, musical, and alive.

Stage 1 — Upload

A creator drops a file (WAV, MP3, FLAC, or AAC) into the edit-mode track list. The client streams it to POST /api/upload/media in chunks, through lib/upload-manager.ts.

Magic-byte check. The first 12 bytes are inspected with lib/magic-bytes.ts before we trust the Content-Type. A mislabeled file is rejected before it touches storage.
SHA-256 dedup. We hash the stream as it uploads. If the same audio exists on another casset owned by the same user, we reuse the existing blob and just create a new Track row pointing at it.
Vercel Blob put. The file lands in a private-read bucket, keyed by content hash. No public URL is minted.
Track row. We insert a Track with the blob reference, detected duration, the artist id, and default previewStartSec = null (falls back to 35s).
Waveform pre-compute. A background job decodes the file, downsamples to ~500 peaks, and stores them so the scrubber can paint without client-side decoding.

Stage 2 — Choose the Hook Object

In edit mode, the owner drags the preview scrubber. That writes previewStartSec back via POST /api/studio/tracks. The number is just an offset; the duration stays locked to HOOK_DURATION_SEC (30s) from lib/hook-constants.ts. That audio window is the seed of the Hook Object: timing, lyric cues, visual atmosphere, share export, and room context build around it.

The waveform endpoint (GET /api/audio/waveform/[trackId]) returns the pre-computed peaks as a normalized JSON array of floats. The scrubber paints them with a 30s window overlay — the exact boundaries the server will enforce when listeners stream.

Stage 3 — Stream

A listener taps play. The audio element is pointed at:

GET /api/audio/<trackId>?t=<short-lived token>

Inside lib/audio-access.ts:

Validate the token (HMAC signed with AUDIO_TOKEN_SECRET or JWT_SECRET, 1-hour TTL).
Look up the caller's session and check entitlement via a Purchase query.
If entitled — sign a short-lived Vercel Blob URL and respond with a 302. Browser fetches the audio directly from storage.
If not entitled — don't redirect. Proxy the audio back with the byte range clipped to the hook window (see next section).

The proxy window

Two constants shape the serve-only-the-hook behavior:

PREVIEW_START_SEC (= 35) — the default offset where a preview begins if the artist hasn't chosen one.
AUDIO_PROXY_WINDOW_SEC (= 30) — how many seconds past the start we'll proxy through. Same as HOOK_DURATION_SEC by design, but kept as a separate knob in case we need headroom.

The server translates seconds to byte offsets using an estimated bitrate (ESTIMATED_BITRATE_BYTES_PER_SEC ≈ 16 KB/s for 128 kbps MP3) and clamps the HTTP Range response accordingly. Even if a client requests range bytes=0-, they only receive the bytes inside the hook window.

Net effect: a determined visitor could record the 30s preview by pointing a capture tool at the audio element (that's just how browsers work), but they cannot reconstruct the full track by replaying requests.

Stage 4 — Client playback

The media footer subscribes to a single global audio element (components/mediafooter/*). Because there's only one element, switching tracks / cassets is seamless — the same element is re-pointed at a new URL and UI state flips.

Loop + fade

For unentitled listeners, the client additionally enforces the hook window on timeupdate:

When currentTime >= previewStartSec + HOOK_DURATION_SEC, smoothSeekToStart fades volume to 0, seeks back to start, and restores volume over ~50ms.
Shorter-than-window tracks just loop on ended — no fade needed.
The progress UI is wrapped with the same window, so the scrubber bar fills over 30s even on a 4-minute track.

Stage 5 — Synchronized audiovisual frame

Playback also feeds the hook timeline. The central clock reads the global audio element, resolves hook-local time, and produces one frame of audiovisual state for lyrics, ShaderLab environments, subtle audio reactivity, and atmosphere.

readPlaybackClockFrame() soft-corrects drift and snaps on seeks or track changes.
resolveHookPlaybackFrame() returns phrase arrival, active cue, beat intensity, vocal energy, visual ramp, and atmosphere.
useAudioReactivity() reads the shared analyser or deterministic waveform peaks, then smooths pressure, transient, decay, and envelope values.

Casset uses that analysis to warm, blur, glow, breathe, and gently shift shader environments. It is intentionally not a generic FFT visualizer. Full detail lives in audiovisual playback.

Stage 6 — Share export

When a fan taps share on a track, the TikTok video exporter (lib/tiktok-video.ts) kicks in:

POST /api/audio/token mints a one-shot access token for the current track.
The client draws the art + artist + waveform + progress ring on a canvas in 1080×1920.
MediaRecorder records the canvas captureStream() alongside the audio element's captureStream for 30s.
The encoded MP4 is written to a blob URL and opened in a share sheet.

The generated video embeds a QR + text CTA pointing at casset.fm/yourname, so even if the video travels far from Casset, the link still routes back.

Invariants, at a glance

No public audio URL — period.
Entitlement resolves on every stream, not once at login.
Preview truncation happens server-side via byte range.
Hook duration is a single constant (HOOK_DURATION_SEC).
Signed URLs expire in under a minute so hotlinks don't work.

← Docs home → Hook system → Architecture