I've seen quite a few NeRF demos so far, but it appears that all of them are not much more than slightly distorting and remixing the input images. Also in these examples, the water looks like plastic and the NeRF clearly cannot reproduce the colorful grass but instead turns it into one large smooth surface.
But without transparency effects, all of this can be rendered fast and efficiently using geometric textures - i.e. what the Unreal Engine 5 uses. Here's how "traditional" real-time rendering looks like:
Compared to that, I find "AI rendering" which is blurry and much slower (15fps @ 800px) somewhat underwhelming.
In case anyone here is working on AI rendering, I'd suggest that you focus on transparent glass. That's where all game engines slow down 2x to 4x because of the need to render things in multiple passes. So if you can make fake glass look nice, that's when AI rendering becomes a useful improvement over the state of the art.
As for the scenes in this demo, I'm pretty sure that structure from motion + mesh reduction + normal mapping can reconstruct the original images almost pixel-perfect, and at 200+ fps on modern GPUs. So there's simply not much benefit to having a neural raymarcher over a highly optimized raycaster.
> I'm pretty sure that structure from motion + mesh reduction + normal mapping can reconstruct the original images almost pixel-perfect
This is actually incorrect, NeRF is the SotA method for these novel view synthesis benchmarks. If you could get better results using meshes, you’d be able to publish a paper and blow all these methods out of the water!
That said as you point out there are still major limitations, particularly rendering speed. But the technique is very new and progressing rapidly.
I'm not going to mince words: NeRF is rubbish compared to traditional photogrammetry.
> NeRF is the SotA method for these novel view synthesis benchmarks
That's an interesting statement; uh, I don't know what 'novel view synthesis benchmarks' you're referring to, but the parent post didn't mention them, and like me, probably doesn't care what they are.
If the state of the art is an 800x800 pixel image... uh, well, bluntly, that's really very unimpressive.
> Compared to that, I find "AI rendering" which is blurry and much slower (15fps @ 800px) somewhat underwhelming.
^ This.
It's very much a 'watch this space' technology, because it does have some really interesting and promising features, and it's changing quickly, but I think finding it 'somewhat underwhelming' is a pretty fair response.
No, I'm sorry but this blows photogrammetry out of the water. The original NeRF paper photorealistically handled complex occlusions (like foliage) and reflective and refractive caustics. No other technique comes close. That is the entire reason it's interesting, and believe it or not there are practical applications for it right now. Forget gaming - this lets you capture lightfields for VR with a cell phone in 5 minutes. And if the NeRFs themselves can be rendered in real time, it solves the problem of compressed light field scene representation. Buckle up for photorealistic VR.
I just gave you one. You can now cheaply and rapidly capture dense lightfields of highly specular objects for VR display. Get yourself a camera array (100 cameras is not infeasible!) and you can capture them instantaneously. That's totally game changing compared to the current state of the art of scanning camera gantries (slow) or photogrammetry (fails on complex or highly specular geometry).
If you're asking me for an example of it being publicly used in production, well I think you're asking a lot considering the technique is only a few months old.
> If you're asking me for an example of it being publicly used in production, well I think you're asking a lot considering the technique is only a few months old
That is what I explicitly asked for.
You’re failure to provide an example is not because it’s new it’s because it’s actually not useful practically at the moment.
NeRF has been around since March 2020 (https://arxiv.org/abs/2003.08934); you are simply wrong; traditional techniques are better right now, have better implementations and are widely used.
NeRF is a promising technology that is categorically worse in its current implementation and maturity.
The big fallacy with AI research is that people treat it as "completely new", so in their mind it doesn't make sense to compare the AI to traditional methods. But most traditional methods have also been created by highly advanced intelligences ... us humans.
FYI we once got 1st place on the Sintel AI benachmark in "Clean & EPE matched" with a 2004 paper... By now it's down to 10th place, but AI is by no means far ahead of traditional methods.
As for "novel view synthesis benchmarks", photogrammetry is used in many hollywood productions for virtual actors and/or for virtual environment destruction. In my opinion, having Hollywood use your technique for billion-dollar blockbuster movies is probably a hint that it works well in practice ;)
I don't think "remixing source images" is a fair characterization of this research. Finding pixel-accurate synthetic images is actually super hard.
That said, yes, it's odd they're posing this as a way to extend game engines, when this entire line of research is interesting (to me) for its potential to be a generalized holographic codec. I'm not qualified to evaluate if this is generalizable, but I would hope these results also indicate a possible way forward for camera-derived footage, where there are not acceptable approaches. But to your point:
>As for the scenes in this demo, I'm pretty sure that structure from motion + mesh reduction + normal mapping can reconstruct the original images almost pixel-perfect, and at 200+ fps on modern GPUs.
I don't agree, we've seen attempts to do generalized photogrammetry as a holographic codec, and the results are poor. Microsoft's HoloCapture is probably the best commercial example, and, well... it ain't pretty. (https://holocap.com/) I think the target should be visual fidelity at parity with good 2D codecs. So there's a long ways to go.
The paucity of data sets may be underselling how hard this is and how advanced NeRF (and other approaches like MPI) are relative to advanced photogrammetry. In particular, approaches that extend to video and work well with the human face are very desirable, but there aren't a lot of datasets to test against. And yet these approaches _are_ significant improvements over photogrammetry; I'm not at liberty to share, but I've seen a demo comparing pure photogrammetry to a neural renderer, and the difference is night and day.
But let me agree with you: yes, we should throw these new representations at difficult subject matter, including reflections, translucency, refraction, subsurface scattering, etc, and see how they do. The original NeRF paper had some wild results for refraction, for example, which hadn't been demonstrated before. And when possible, I'd love to see them with video images of people.
I think we're only a few years away from a functional holographic codec, which is pretty exciting.
https://www.russian3dscanner.com/wrap4d/
is popular with Hollywood and it used to cost $10k+ before Epic Games purchased it. And it just happens to be amazing with animated faces ;)
That said, I agree with you in general that NeRF could become amazing if we can get it to work with difficult data sets where photogrammetry breaks down. That's why I find it sad to see them demo it on datasets where photogrammetry is known to work well.
Do you have any good references on SFM reconstruction? I spent a fair while playing around with various commercial and open source offerings (Alicevision Meshlab, RealityCapture, a few others I don't remember) a year or two ago and came away quite disillusioned. They look great with the carefully selected and optimized demo datasets but if you go out yourself and take a few hundred photos of anything but a nice convex boulder, the end result is almost always a complete mess.
But without transparency effects, all of this can be rendered fast and efficiently using geometric textures - i.e. what the Unreal Engine 5 uses. Here's how "traditional" real-time rendering looks like:
https://www.youtube.com/watch?v=PBktSo0bXas
Compared to that, I find "AI rendering" which is blurry and much slower (15fps @ 800px) somewhat underwhelming.
In case anyone here is working on AI rendering, I'd suggest that you focus on transparent glass. That's where all game engines slow down 2x to 4x because of the need to render things in multiple passes. So if you can make fake glass look nice, that's when AI rendering becomes a useful improvement over the state of the art.
As for the scenes in this demo, I'm pretty sure that structure from motion + mesh reduction + normal mapping can reconstruct the original images almost pixel-perfect, and at 200+ fps on modern GPUs. So there's simply not much benefit to having a neural raymarcher over a highly optimized raycaster.