A very, very different mechanism that "just" displays the scene as the author ex...

echelon · 2025-12-26T16:26:36 1766766396

> with the tradeoff mainly being control and artistic errors.

For now. We're not even a decade in with this tech, and look how far we've come in the last year alone with Veo 3, Sora 2, and Kling 4x, and Kling O1. Not to mention the editing models like Qwen Edit and Nano Banana!

This is going to be serious tech soon.

I think vision is easier than "intelligence". In essence, we solved it in closed form sixty years ago.

We have many formulations of algorithms and pipelines. Not just for the real physics, but also tons of different hacks to account for hardware limitations.

We understand optics in a way we don't understand intelligence.

Furthermore, evolution keeps evolving vision over and over. It's fast and highly detailed. It must be correspondingly simple.

We're going to optimize the shit out of this. In a decade we'll probably have perfectly consistent Holodecks.

justinclift · 2025-12-27T15:05:55 1766847955

Hmmm, future video's might just "compress" down to a common AI model and a bunch of prompts + metadata about scene order. ;)

arghwhat · 2025-12-27T11:00:04 1766833204

I feel like this misses the point. Also, vision and image generation are entirely different things. Even for humans, with some people not being able to create images in their head despite having perfectly good vision.

Understanding optics instead of intelligence speaks to the traditional render workflow, a pure simulation of input data with no "creative processes". Either the massive hack that is traditional game render pipelines, or proper light simulation. We'll probably eventually get to the point where we can have full-scene, real-time ray-tracing.

The AI image generation approach is the "intelligence" approach where you throw all optics, physics and render knowledge up in the air and let the model "paint" according to how it imagines the scene, like handing a pencil to a cartoon/anime artist. Zero simulation, zero physics, zero roles - just the imagination of a black box.

No light, physics or existing render pipeline tricks are relevant. If that's what you want, you're looking for entirely new tricks: Tricks to ensure object permanence, attention to detail (no variable finger counts), and inference performance. Even if we have it running in real-time, giving up your control and definition of consistency is part of the deal when you hand off the role of artist to the box.

If you want AI in the simulation approach you'll be taking an entirely different path, skipping any involvement in rendering/image creation and instead just letting the model pupetteer the scene within some physics restraints. Makes for cool games, but completely unrelated to the technology being discussed.