Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Curious how this will fare when playing Pokemon Red.




Gemini 3 Pro has been playing Pokemon Crystal (which is significantly harder than Red) in a race against Gemini 2.5 Pro: https://www.twitch.tv/gemini_plays_pokemon

Gemini 3 Pro has been making steady progress (12/16 badges) while Gemini 2.5 Pro is stuck (3/16 badges) despite using double the turns and tokens.


I think what would be interesting is if it could play the game with vision only inputs. That would represent a massive leap multimodal understanding.

Yeah the "High frame rate understanding" feature caught my eye, actual real time analysis of live video feeds seems really cool. Also wondering what they mean by "video reasoning/thinking"?

I don’t think it’s real time? The videos were likely taken previously.

> 3. Turning long videos into action: Gemini 3 Pro bridges the gap between video and code. It can extract knowledge from long-form content and immediately translate it into functioning apps or structured code

I'm curious as to how close these models are to achieving that once long-ago mocked claim (by Microsoft I think?) that AIs could view gameplay video of long lost games and produce the code to emulate them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: