Audiovisual playback engine.
How Casset turns a hook into a synchronized surface: audio, lyrics, shaders, emotion, motion, atmosphere, and participation moving from the same clock.
Most music apps treat playback as a stack of separate layers: audio on one clock, lyrics on another, visuals as decoration, and the release page as a static container around the song. Casset is built in the opposite direction. A hook resolves into one cinematic playback frame, and every layer listens to that frame.
The result should not feel like a visualizer attached to a track. It should feel like the hook has a room, a memory, and a weather system. This is the technical basis for Casset's product bet: living hook experiences instead of passive music pages.
Why hooks matter
A hook is the smallest emotional unit of a song that can travel. It is not just a preview and not only a conversion tactic. It is the part of a record that people repeat, quote, send to a friend, build a video around, or remember before they know the title.
Casset gives that fragment a first-class format. The artist chooses the window, the system preserves the timing, and the page becomes a world around the strongest thirty seconds instead of a catalog row around a full file.
Audiovisual playback engine
The playback engine is organized around a deterministic timeline rather than independent UI effects. The browser still plays a normal HTMLAudioElement, but Casset reads it through a central clock and resolves the hook into structured frame data.
HTMLAudioElement
-> readPlaybackClockFrame()
-> buildHookPlaybackTimeline()
-> resolveHookPlaybackFrame()
-> lyrics, shaders, motion, atmosphere, audio-reactive statereadPlaybackClockFrame()anchors playback toperformance.now(), corrects small drift, and snaps on seek or track change. The visual system follows the actual song rather than a separate animation timer.buildHookPlaybackTimeline()precomputes phrase, word, and breath events from the hook lyrics and waveform peaks. Render-time code asks the timeline where the hook is emotionally, instead of parsing lyric text every frame.resolveHookPlaybackFrame()returns one resolved frame: active lyric cue, phrase arrival, beat intensity, vocal energy, atmosphere, transition intensity, and visual ramp.- RAF-driven subscribers update only when the resolved frame version changes, reducing React churn while keeping animation aligned to the same playback source.
This matters emotionally because tiny timing differences change how a hook feels. Casset lets the environment lead a lyric by a breath, lets a phrase arrive with visual pressure, and lets the previous line decay instead of disappearing. The page feels composed because the system is composed.
Resolved playback frame
The resolved frame is Casset's contract between sound and surface. Lyrics, ShaderLab playback, reactive typography, ambient overlays, and UI motion do not invent separate interpretations of the song. They read the same cue state.
cuedescribes the active, leading, or held phrase.phraseArrivalandvisualRampcreate anticipation before a line lands.vocalEnergy,beatIntensity, andatmospherecontrol subtle environmental pressure.transitionIntensitygives shader transitions a musical reason to change.
Basement Studio ShaderLab integration
Casset uses Basement Studio's @basementstudio/shader-lab runtime as the live rendering layer for Hook Objects. The integration is deliberately framed as emotional audiovisual environments, not graphics effects.
A ShaderLab visual is stored as a Casset shader configuration, adapted into feed/profile formats, and rendered through ShaderLabBackground. The runtime composes media, gradient fields, typography, grain, bloom, displacement, blur, analog texture, and color behavior into a GPU-rendered atmosphere.
- Shader presets are organized as emotional scene graphs: intro memory, emergence, pressure, fragmentation, bloom, collapse, decay, silence, and aftermath.
- Scene samples blend controls for bloom, contrast, grain, instability, and motion instead of hard-switching visual treatments.
- The shader clock can be externally driven by hook playback time, so visual motion stays locked to the song rather than the component mount time.
- Live shaders are guarded by runtime policy: reduced motion, save-data, background tabs, scrolling, mobile pressure, GPU capability, and sustained frame drops can all demote the runtime to a lighter path.
Hooks generate living visual worlds. The environment can react to lyric transitions, phrase arrivals, emotional intensity, dynamics, and beat energy, but the goal is not to show that the GPU is busy. The goal is for the world to feel like it already knew the line was coming.
Continuity over remount jitter
Shader playback avoids the brittle feeling of remounting a new canvas for every visual state. Casset keeps a live shader source mounted while changing its config and externally clocked timeline. Active shader ownership is coordinated so only the relevant instance claims GPU attention, and adjacent media is prewarmed before it is needed.
Transitions are atmospheric blends: a pressure scene can resolve into decay, or a lyric-led graph can drift toward a video bloom treatment, without the page blinking or resetting the emotional state.
Audio reactivity
Casset uses audio analysis as a subtle environmental modulation layer. The live hook engine reads the shared Web Audio analyser through the canonical audio toolkit adapter, with precomputed waveform peaks as a deterministic fallback. Meyda remains part of the broader export and offline analysis toolchain.
- Live analysis maps amplitude, RMS, bass, mid, high, vocal presence, spectral centroid, envelope, pressure, instability, beat energy, and transient emphasis.
- Waveform fallback samples behind, current, and ahead values so paused, exported, or constrained runtimes still preserve the song's energy shape.
- The reactivity frame is smoothed with attack, hold, and decay so motion breathes instead of twitching.
Casset intentionally avoids generic FFT bars, EDM-spectrum motion, and hyper-reactive objects that compete with the song. Analysis is used to shape warmth, grain, blur, glow, displacement, pressure, and environmental breathing. Sound remains the center of gravity.
Lyric synchronization system
Lyrics in Casset are timed as cinematic cues, not karaoke overlays. The system normalizes provider lyric JSON into timed lines and optional word-level timing, slices those lyrics to the hook window, and builds a hook timeline before playback begins.
- Phrase windows define the emotional shape of each line.
- Word-level timings drive active emphasis; missing word timings are inferred from text weight and line duration.
- Lead timing lets lyrics and visuals begin preparing before the literal timestamp.
- Held lyric states keep a phrase present after it ends, then ease it down through decay rather than cutting it off.
- Subtitle behavior is synchronized to phrase arrival, vocal energy, lyric opacity, lift, blur, and emphasis.
The intended feeling is memory-like: the words should feel emotionally inevitable, as if they are surfacing from the music. They should not feel like bouncing captions pasted over a song.
Emotionally reactive hook engine
The emotional engine is the difference between synchronization and decoration. A traditional visualizer asks, "How loud is the sound right now?" Casset asks, "Where is the listener inside this phrase, and what should the world remember from the last one?"
- Phrase hold / decay. Finished lines remain present as traces, then fade with eased decay so the listener feels continuity.
- Anticipation timing. Upcoming phrases create visual tension before the words arrive.
- Environmental lead timing. Shaders can warm, breathe, or sharpen slightly ahead of a lyric transition.
- Visual warm-state. Hook visuals prefetch adjacent image/video/poster assets and progress from instant poster to light treatment to live shader when conditions allow.
This is why Casset playback can feel fundamentally different from a traditional music page. The page is not waiting for the song to finish so it can ask for a follow. It is participating in the emotional arc of the hook while the listener is inside it.
Pre-release listening rooms
The pre-release system turns unreleased songs into living listening rooms. A listener can hear the immersive hook preview first, understand the emotional world of the record, and then unlock participation through a presave.
activePreReleaseselects the track, release date, artwork source, sound platform, presave URLs, early unlock settings, and CD room treatment.- Presave intent records the listener's chosen platform and provider target where possible.
- Spotify and Apple Music completion flows persist the unlock state: room access, comments, uploads, connected accounts, and completed platform state.
- Guest sessions preserve continuity before account creation; OAuth return flow can complete the presave without losing the listener's place in the room.
Casset is not trying to be a smart-link utility or generic music marketing software. The presave is part of an experiential release layer: listen, feel the world, then become early enough to participate.
Performance and engineering
The emotional surface only works if it is stable. The engine is built around mobile-first playback constraints, Safari/PWA behavior, and careful ownership of expensive work.
- A singleton global audio element supports background playback, lock screen controls, and consistent analyser state.
- The playback clock centralizes time ownership and soft-corrects drift rather than letting every component invent its own timeline.
- Resolved timeline frames publish version keys so React state updates only when meaningful lyric/emotion state changes.
- ShaderLab canvases adapt resolution, target FPS, and runtime eligibility based on device pressure, visibility, scrolling, and observed frame time.
- Visual assets warm before arrival, and live shader activation waits until the lightweight poster state is already present.
- Compositor-safe opacity/filter/transform layers carry most of the ambient motion, keeping layout changes away from the playback loop.
Design philosophy
Casset's playback philosophy is restraint over overload, atmosphere over spectacle, synchronization over effect count, emotion over engagement hacking, and sound as the center of gravity.
The goal is not music visualizers. The goal is living audiovisual memories attached to sound: release experiences that artists can own, collaborators can build with, and listeners can enter before a song has fully arrived in the world.
Code map
lib/playback-clock.ts— central clock, drift correction, seek snapping, playback frame source.lib/hook-playback-timeline.ts— lyric/emotion timeline generation and resolved hook playback frame.lib/lyrics.ts— timed lyric validation, normalization, and hook slicing.lib/audio-reactivity/*andhooks/useAudioReactivity.ts— audio analysis, waveform fallback, smoothing, decay, and external-store publication.components/visuals/ShaderLabBackground.tsx— Basement ShaderLab runtime mount, external clocking, budget policy, and canvas lifecycle.lib/casset-studios/runtime/shader-sequencer.ts— emotional shader scenes, transitions, and audio-reactive scene sampling.lib/pre-release-presave.tsandapp/api/pre-release/[trackId]/*— presave state, persistent unlocks, and participation gates.