Technical overview.
What Casset actually is under the hood, how the pieces fit together, and what it would cost a team to rebuild from scratch.
This is the long-form companion to the architecture doc. If that page is the map, this one is the field report — scope, sophistication, and honest build effort.
1. Executive summary
Casset is a production-grade audiovisual music identity platform built around four primitives: Profile Worlds, Hook Objects, Release Rituals, and Listening Rooms. Creators get a page at casset.fm/<handle>, pick hooks, shape atmosphere, set provenance, choose support/unlock behavior, and share. Listeners play, follow, join rooms, collect, reply, and carry Hook Objects outward.
Under the hood it's a single Next.js App Router app with a deep Prisma model, a keyless-to-the-client audio streaming pipeline, Stripe Connect commerce, Server-Sent Events for realtime, Visual Studio/runtime systems, co-cassets, profile provenance controls, canonical release manifests, agent-readable permissions, provenance events, lineage, quiet Base anchoring, and optional release-quest infrastructure built around those primitives.
2. Product scope
Audio playback engine
A custom cross-browser playback layer that enforces hook boundaries, loops with a micro-fade, and resolves audio, lyrics, ShaderLab environments, motion, and atmosphere into one deterministic playback frame. See audiovisual playback.
Payments & monetization
Stripe Connect Destination charges with Apple Pay / Google Pay as the default confirmation sheet, referral attribution cookies, per-track rewards, and campaign payouts. See commerce.
Release quests / drops
Time-boxed scoring campaigns where artists fund a prize pool and fans submit hook videos. This system is implemented, but strategically optional: it belongs when it strengthens a Release Ritual or Listening Room, not as the default product story.
Listening Room / Side B
Every casset has a room/community layer: live activity feed, presence indicators, comment threads (on the casset + per-hook), custom paid emoji, follows, membership, and tip support. Everything streams in via SSE where live behavior is needed.
Hook sharing
Client-side video export via MediaRecorder — a 30s, 1080×1920 vertical MP4 with the hook audio burned in, artwork + title + "use this sound" CTA overlaid. No server-side transcoding queue.
Artist Studio
Private dashboard at /studio: upload, edit cassets inline with a preview scrubber, set prices, manage drops, review clip submissions, connect Stripe. The preview scrubber writes previewStartSec directly; waveforms are pre-computed on upload so the scrubber paints instantly.
Canonical release infrastructure
The release layer records Release, ReleaseVersion, ReleaseManifest, ReleaseAnchor, contributors, splits, permission policies, provenance events, derivative links, and agent access policies. The goal is not to turn Casset into a protocol product. The goal is to keep an artist's release context readable as generative and agent-mediated media grows.
Agent release APIs
Published canonical releases expose machine-readable manifest, DNA, provenance, lineage, permission-check, license, access, and derivative registration routes. These routes let agents ask what they can do with a release instead of inferring permission from public availability.
Infrastructure
- Next.js App Router on Vercel (edge middleware + node routes).
- Postgres via Prisma; Vercel Blob for audio + artwork.
- Redis for presence, rate limits, short-term caches.
- Stripe Connect + Apple Pay for payments.
- Base anchoring for manifest hash proof, kept underneath the product surface.
- Sentry, Vercel Analytics,
vitestfor tests.
3. Technical sophistication
A few invariants that take real work to honor:
- The raw audio URL never leaves the server. Unentitled listeners receive byte-range-clipped proxied audio; entitled listeners get minute-lived signed URLs.
- Entitlement is resolved on every stream, not once at login. No client-held tokens grant access.
- Idempotency + retry semantics permeate payments, campaign payouts, guest checkout, and webhook reconciliation.
- Realtime uses SSE (not WebSockets) to stay cheap on a serverless host and survive restrictive middleboxes.
- The whole UI derives from the artist's cover image — see theming.
- Release manifests are deterministic snapshots with canonical hashes, making release state inspectable by agents and verifiable by anchors.
- Permission evaluation returns a structured decision instead of burying artist intent in prose: allow, deny, contact owner, or license required.
4. Architecture & engineering quality
Patterns worth noting
- Single source of truth files.
lib/hook-constants.ts,lib/footer-themes.ts,lib/audio-access.ts, and similar files centralize product-level constants so changes ripple correctly. - Webhook-authoritative commerce. The client never grants entitlement; the Stripe webhook does. Even if the browser crashes mid-confirm, money ends up in the right ledger.
- Append-only accounting.
CreditLedgeris never mutated; reversals are new rows. Any balance can be reconstructed from the history. - Magic-byte uploads. File type is sniffed from the first 12 bytes, not trusted from
Content-Type. - React-cached data loaders.
getDataon the casset page is wrapped inReact.cachesogenerateMetadataand the page share a single Prisma round-trip.
Database design
Prisma models are partitioned by concern: identity & content; community & activity; visual assets; co-cassets; pre-release; canonical releases; permissions; provenance; lineage; campaigns & rewards; intelligence & reputation; accounting. Long-running money-moving and proof-moving processes have terminal + retryable states and idempotency keys so retries don't double-fire.
5. Build effort (realistic)
Casset is not a weekend project. A representative breakdown:
- Audio engine — 4–6 weeks (1 senior eng, cross-browser is an iceberg)
- Payments & commerce — 3–5 weeks (Stripe Connect onboarding + webhooks)
- Campaign drops + scoring — 6–8 weeks (scoring pipeline, integrity checks, payouts)
- Social layer (Side B) — 3–4 weeks (SSE + presence + comments)
- Auth & security — 2–3 weeks (email + Apple + Google, audio tokens, magic-bytes)
- Artist Studio & CMS — 3–4 weeks
- Feed & discovery — 2–3 weeks
- UI, theming, visualizations — 3–4 weeks (color extraction, skins, visualizer)
- Infrastructure & devops — 1–2 weeks
Serially, that's a ~30-week build for one senior engineer. Parallelized across a small team (3 engineers + designer), call it 12–16 weeks of calendar time. Dollar-equivalent at US market rates: roughly $250K–$400K depending on seniority and iteration discipline.
6. Strategic advantage
- Audiovisual identity shape. Every decision — from playback loops to TikTok export dimensions to OG image rendering — assumes the hook is an audiovisual object inside a Profile World. Fast followers have to rebuild the full primitive to compete meaningfully.
- Ritual and room memory. Collects, follows, comments, clips, co-casset activity, presaves, and rooms create social memory around a song. The longer Casset operates, the harder a generic "music + links" tool is to substitute.
- Agent-era release context. Manifests, permission policies, provenance events, contributor graphs, and lineage are the structured layer that generated and agent-mediated media will need. They are hard to bolt on after the product has no canonical release definition.
7. Conclusion
Casset ships a music platform with the density of a mid-stage startup inside the surface area of a Profile World. The commerce is real, the audio is real, the realtime is real, and the audiovisual runtime is real. The canonical release layer is now real enough to make the next thesis concrete: Casset can remain beautiful for humans while becoming readable for agents. The opportunity is to keep the public shape — Profile World, Hook Object, Release Ritual, Listening Room — stable while widening the trusted infrastructure underneath it.