I'm cautiously waiting for the feedback from the first users. Meta has produced ...

conradkay · 2026-04-08T16:40:10 1775666410

It doesn't seem benchmaxxed, ARC AGI 2 score is quite bad (42.5%, GPT 5.4 is 76.1%) and coding is okay. But maybe this is the best Meta can do even benchmaxxing

The impressive part is multimodality, very plausible since there's less focus there by other labs (especially Anthropic)

solenoid0937 · 2026-04-08T16:37:20 1775666240

My Meta friends say it's benchmaxxed af

loeg · 2026-04-08T17:09:44 1775668184

We used to call this "overfitting," but I suppose everything has to be maxxed now. Fitmaxxed?

dbgrman · 2026-04-08T20:23:40 1775679820

Given llama 4 mucked up benchmark numbers, I’d take spark announcement with a many grains of salt.