Audio pipeline.
Every Hook Object starts as protected audio, then becomes timed playback, atmosphere, lyrics, and share state.
The product rule is simple: a listener should instantly feel the hook, but unpurchased listeners can't capture the full track. The pipeline is designed to make that boundary enforceable on the server while still making the public Profile World feel immediate, musical, and alive.
Stage 1 — Upload
A creator drops a file (WAV, MP3, FLAC, or AAC) into the edit-mode track list. The client streams it to POST /api/upload/media in chunks, through lib/upload-manager.ts.
- Magic-byte check. The first 12 bytes are inspected with
lib/magic-bytes.tsbefore we trust theContent-Type. A mislabeled file is rejected before it touches storage. - SHA-256 dedup. We hash the stream as it uploads. If the same audio exists on another casset owned by the same user, we reuse the existing blob and just create a new
Trackrow pointing at it. - Vercel Blob put. The file lands in a private-read bucket, keyed by content hash. No public URL is minted.
- Track row. We insert a
Trackwith the blob reference, detected duration, the artist id, and defaultpreviewStartSec=null(falls back to 35s). - Waveform pre-compute. A background job decodes the file, downsamples to ~500 peaks, and stores them so the scrubber can paint without client-side decoding.
Stage 2 — Choose the Hook Object
In edit mode, the owner drags the preview scrubber. That writes previewStartSec back via POST /api/studio/tracks. The number is just an offset; the duration stays locked to HOOK_DURATION_SEC (30s) from lib/hook-constants.ts. That audio window is the seed of the Hook Object: timing, lyric cues, visual atmosphere, share export, and room context build around it.
The waveform endpoint (GET /api/audio/waveform/[trackId]) returns the pre-computed peaks as a normalized JSON array of floats. The scrubber paints them with a 30s window overlay — the exact boundaries the server will enforce when listeners stream.
Stage 3 — Stream
A listener taps play. The audio element is pointed at:
GET /api/audio/<trackId>?t=<short-lived token>Inside lib/audio-access.ts:
- Validate the token (HMAC signed with
AUDIO_TOKEN_SECRETorJWT_SECRET, 1-hour TTL). - Look up the caller's session and check entitlement via a
Purchasequery. - If entitled — sign a short-lived Vercel Blob URL and respond with a
302. Browser fetches the audio directly from storage. - If not entitled — don't redirect. Proxy the audio back with the byte range clipped to the hook window (see next section).
The proxy window
Two constants shape the serve-only-the-hook behavior:
PREVIEW_START_SEC(=35) — the default offset where a preview begins if the artist hasn't chosen one.AUDIO_PROXY_WINDOW_SEC(=30) — how many seconds past the start we'll proxy through. Same asHOOK_DURATION_SECby design, but kept as a separate knob in case we need headroom.
The server translates seconds to byte offsets using an estimated bitrate (ESTIMATED_BITRATE_BYTES_PER_SEC ≈ 16 KB/s for 128 kbps MP3) and clamps the HTTP Range response accordingly. Even if a client requests range bytes=0-, they only receive the bytes inside the hook window.
Net effect: a determined visitor could record the 30s preview by pointing a capture tool at the audio element (that's just how browsers work), but they cannot reconstruct the full track by replaying requests.
Stage 4 — Client playback
The media footer subscribes to a single global audio element (components/mediafooter/*). Because there's only one element, switching tracks / cassets is seamless — the same element is re-pointed at a new URL and UI state flips.
Loop + fade
For unentitled listeners, the client additionally enforces the hook window on timeupdate:
- When
currentTime >= previewStartSec + HOOK_DURATION_SEC,smoothSeekToStartfades volume to 0, seeks back to start, and restores volume over ~50ms. - Shorter-than-window tracks just loop on
ended— no fade needed. - The progress UI is wrapped with the same window, so the scrubber bar fills over 30s even on a 4-minute track.
Stage 5 — Synchronized audiovisual frame
Playback also feeds the hook timeline. The central clock reads the global audio element, resolves hook-local time, and produces one frame of audiovisual state for lyrics, ShaderLab environments, subtle audio reactivity, and atmosphere.
readPlaybackClockFrame()soft-corrects drift and snaps on seeks or track changes.resolveHookPlaybackFrame()returns phrase arrival, active cue, beat intensity, vocal energy, visual ramp, and atmosphere.useAudioReactivity()reads the shared analyser or deterministic waveform peaks, then smooths pressure, transient, decay, and envelope values.
Casset uses that analysis to warm, blur, glow, breathe, and gently shift shader environments. It is intentionally not a generic FFT visualizer. Full detail lives in audiovisual playback.
Stage 6 — Share export
When a fan taps share on a track, the TikTok video exporter (lib/tiktok-video.ts) kicks in:
POST /api/audio/tokenmints a one-shot access token for the current track.- The client draws the art + artist + waveform + progress ring on a canvas in 1080×1920.
MediaRecorderrecords the canvascaptureStream()alongside the audio element'scaptureStreamfor 30s.- The encoded MP4 is written to a blob URL and opened in a share sheet.
The generated video embeds a QR + text CTA pointing at casset.fm/yourname, so even if the video travels far from Casset, the link still routes back.
Invariants, at a glance
- No public audio URL — period.
- Entitlement resolves on every stream, not once at login.
- Preview truncation happens server-side via byte range.
- Hook duration is a single constant (
HOOK_DURATION_SEC). - Signed URLs expire in under a minute so hotlinks don't work.