Discussion about this post

User's avatar
Contarini's avatar

For AI generated video, what might be a good interim solution would be video generation with more "frame by frame" direction by the human creator, where the AI was more of a "helpful paintbrush," than a generator of a first draft. And, of course, as billions of sessions accumulated, it would make better predictions about what users do and do not want.

Expand full comment
Slicey Me Likey's avatar

I don't fully buy this argument. First, most video models don't "predict the next frame", they generate full videos at once given a text prompt.

That aside, there is no convincing argument that a model needs to understand 3D/physics in order to produce very convincing videos (models like Veo 2 are already getting close). Remember 12 months ago when image models kept generating 6 fingers on hands? People expected that something had to be done explicitly to handle this, or else it wouldn't be solved. All that mattered in the end was more data and more compute (problem solved). Read The Bitter Lesson by Sutton.

Expand full comment
3 more comments...

No posts