product direction

Music video outsourcing.

The hook is the seed. The audience helps shape the visual world around it.

Casset should let a song become a collectively directed living music video. Artists define the emotional world; listeners and collaborators contribute real visual fragments captured around the sound; the runtime sequences those fragments into a music video that can keep evolving after release.

Thesis

Music video outsourcing does not mean turning Casset into a creator marketplace or social posting app. It means giving artists a way to gather distributed cinematography from the people and places emotionally connected to a song.

Artist chooses hook
-> defines emotional world and capture constraints
-> contributors capture fragments from the world
-> artist curates or accepts fragments
-> runtime sequences by beat, lyric, density, and motion
-> song becomes a living audiovisual world

Not UGC

Users are not posting content. They are contributing visual fragments that help shape the emotional world of a song.

Do not build

TikTok or Reels behavior
influencer content
reaction loops
selfie performance
engagement-ranked posts
creator-economy mechanics

Build toward

collaborative visual interpretation
distributed cinematography
emotional atmosphere generation
artist-directed curation
beat-aware sequencing
evolving song worlds

Fragment Language

Strong captures should feel cinematic, imperfect, environmental, atmospheric, and emotionally real. The camera should make the world the subject through the lens of the sound.

Video constraints

Minimum: 0.5 seconds.
Ideal target: 2 to 4 seconds.
Hard max: 5 seconds.
Five seconds should feel like emotional residue, not a vlog segment.

Camera posture

Rear camera is the default capture mode.
Environmental framing is preferred over talking-to-camera behavior.
Future Cassets can disable the front camera entirely when the brief requires it.
Native camera capture can exist for higher quality photo and video.

Media texture

The system should favor fast upload, smooth playback, and emotional texture over pristine fidelity. Grain, softness, motion blur, CRT-like degradation, and compression artifacts can help when they serve the song.

Living Still Photos

Still photos can become living memory fragments through subtle generated motion. The treatment should be restrained: light flicker, slight camera drift, environmental breathing, depth parallax, ambient movement, and soft loops.

The system should avoid surreal morphing, fantasy hallucination, obvious generation artifacts, and uncanny face or body motion. The goal is haunting memory, not AI spectacle.

Runtime Direction

Playback should evolve from scrapbook scrolling into cinematic sequencing. The Film tab should feel like a living audiovisual memory field around the hook.

Beat-aware cuts.
Lyric-aware atmosphere shifts.
Motion-aware transitions.
Visual density modulation.
Breathing room between intense moments.
Snap cuts where the song calls for them.
A single canonical playback clock.

Product Surfaces

Film tab

The Film tab is the living music video surface. It can cycle through fragments in time with BPM, beat segments, lyric phrases, and selected atmosphere. The horizontal film carousel remains scrollable because it is a transport strip for fragments, not a feed.

Fragment reel

The fragment reel is a curation surface. Rows should emphasize contributor handle, thumbnail, add/select, and delete/remove. They should not emphasize fragment names, likes, captions, or social ranking.

Capture overlay

Capture copy should use language like capture fragment, choose fragment, world lens, and shape the world around the hook. Avoid post, upload content, go viral, creator, reaction, and reel as primary language.

Architecture Implications

This direction strengthens existing primitives instead of creating a new app model. Profile World remains the destination. Hook Object remains the seed. Release Ritual creates the reason to contribute. Listening Room holds presence and memory.

HookObject
  -> VisualWorld
  -> Fragment[]
       -> contributor
       -> media asset
       -> capture constraints
       -> permission/provenance metadata
       -> sequencing metadata

Fragment provenance and contributor handles should map into release context over time. Sequencing belongs to the audiovisual runtime, not local React timers or feed logic.

Roadmap

Phase 1 - Connor-6 prototype

Basement runtime viewport stays coherent.
Film tab cycles through contributed fragments by beat.
Fragment reel selects media into the viewport.
Rows show contributor handles only.
Add/delete controls support taste-oriented curation.

Phase 2 - Artist-controlled Casset

Artists define a visual brief for a hook.
Capture constraints become configurable per Casset.
Rear camera default and optional front-camera disablement.
Short video fragments capped at 5 seconds.

Phase 3 - Rendered living video

Runtime can render or export the current living video state.
Share artifacts point back to the canonical Hook Object and Profile World.
Old releases can keep accumulating new visual interpretations.

Success Metrics

Measure repeat hook plays, fragment selection, curation actions, contributor return, rendered living-video exports, and shares that route back to Profile World. Do not optimize around likes on fragments, follower growth for contributors, or feed impressions detached from the hook.

Design Principle

Good should feel like the world around the song started filming itself. The interface should not ask people to become creators. It should invite them to notice, capture, and contribute a real emotional fragment to the song.

Roadmap Hook system Product philosophy