Music video outsourcing.
The hook is the seed. The audience helps shape the visual world around it.
Casset should let a song become a collectively directed living music video. Artists define the emotional world; listeners and collaborators contribute real visual fragments captured around the sound; the runtime sequences those fragments into a music video that can keep evolving after release.
Thesis
Music video outsourcing does not mean turning Casset into a creator marketplace or social posting app. It means giving artists a way to gather distributed cinematography from the people and places emotionally connected to a song.
Artist chooses hook
-> defines emotional world and capture constraints
-> contributors capture fragments from the world
-> artist curates or accepts fragments
-> runtime sequences by beat, lyric, density, and motion
-> song becomes a living audiovisual worldNot UGC
Users are not posting content. They are contributing visual fragments that help shape the emotional world of a song.
Do not build
- TikTok or Reels behavior
- influencer content
- reaction loops
- selfie performance
- engagement-ranked posts
- creator-economy mechanics
Build toward
- collaborative visual interpretation
- distributed cinematography
- emotional atmosphere generation
- artist-directed curation
- beat-aware sequencing
- evolving song worlds
Fragment Language
Strong captures should feel cinematic, imperfect, environmental, atmospheric, and emotionally real. The camera should make the world the subject through the lens of the sound.
Video constraints
- Minimum: 0.5 seconds.
- Ideal target: 2 to 4 seconds.
- Hard max: 5 seconds.
- Five seconds should feel like emotional residue, not a vlog segment.
Camera posture
- Rear camera is the default capture mode.
- Environmental framing is preferred over talking-to-camera behavior.
- Future Cassets can disable the front camera entirely when the brief requires it.
- Native camera capture can exist for higher quality photo and video.
Media texture
The system should favor fast upload, smooth playback, and emotional texture over pristine fidelity. Grain, softness, motion blur, CRT-like degradation, and compression artifacts can help when they serve the song.
Living Still Photos
Still photos can become living memory fragments through subtle generated motion. The treatment should be restrained: light flicker, slight camera drift, environmental breathing, depth parallax, ambient movement, and soft loops.
The system should avoid surreal morphing, fantasy hallucination, obvious generation artifacts, and uncanny face or body motion. The goal is haunting memory, not AI spectacle.
Runtime Direction
Playback should evolve from scrapbook scrolling into cinematic sequencing. The Film tab should feel like a living audiovisual memory field around the hook.
- Beat-aware cuts.
- Lyric-aware atmosphere shifts.
- Motion-aware transitions.
- Visual density modulation.
- Breathing room between intense moments.
- Snap cuts where the song calls for them.
- A single canonical playback clock.
Product Surfaces
Film tab
The Film tab is the living music video surface. It can cycle through fragments in time with BPM, beat segments, lyric phrases, and selected atmosphere. The horizontal film carousel remains scrollable because it is a transport strip for fragments, not a feed.
Fragment reel
The fragment reel is a curation surface. Rows should emphasize contributor handle, thumbnail, add/select, and delete/remove. They should not emphasize fragment names, likes, captions, or social ranking.
Capture overlay
Capture copy should use language like capture fragment, choose fragment, world lens, and shape the world around the hook. Avoid post, upload content, go viral, creator, reaction, and reel as primary language.
Architecture Implications
This direction strengthens existing primitives instead of creating a new app model. Profile World remains the destination. Hook Object remains the seed. Release Ritual creates the reason to contribute. Listening Room holds presence and memory.
HookObject
-> VisualWorld
-> Fragment[]
-> contributor
-> media asset
-> capture constraints
-> permission/provenance metadata
-> sequencing metadataFragment provenance and contributor handles should map into release context over time. Sequencing belongs to the audiovisual runtime, not local React timers or feed logic.
Roadmap
Phase 1 - Connor-6 prototype
- Basement runtime viewport stays coherent.
- Film tab cycles through contributed fragments by beat.
- Fragment reel selects media into the viewport.
- Rows show contributor handles only.
- Add/delete controls support taste-oriented curation.
Phase 2 - Artist-controlled Casset
- Artists define a visual brief for a hook.
- Capture constraints become configurable per Casset.
- Rear camera default and optional front-camera disablement.
- Short video fragments capped at 5 seconds.
Phase 3 - Rendered living video
- Runtime can render or export the current living video state.
- Share artifacts point back to the canonical Hook Object and Profile World.
- Old releases can keep accumulating new visual interpretations.
Success Metrics
Measure repeat hook plays, fragment selection, curation actions, contributor return, rendered living-video exports, and shares that route back to Profile World. Do not optimize around likes on fragments, follower growth for contributors, or feed impressions detached from the hook.
Design Principle
Good should feel like the world around the song started filming itself. The interface should not ask people to become creators. It should invite them to notice, capture, and contribute a real emotional fragment to the song.