The biggest jump in the numbers they quoted is 6%. Please look at the columns OT...

josephg · 2026-04-07T23:03:43 1775603023

> Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)

> Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%

> USAMO: 97.6% / 42.3% / 95.2% / 74.4%

> The biggest jump in the numbers they quoted is 6%.

Just in the numbers you quoted, thats a 16.6% jump in terminal-bench and a 55.3% absolute increase in USAMO over their previous Opus 4.6 model.

devmor · 2026-04-07T23:13:45 1775603625

I don’t know if you’re willingly disregarding everything being said to you or there’s a language barrier here.

dang · 2026-04-10T15:29:27 1775834967

Can you please stop posting comments with personal swipes in them? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

devmor · 2026-04-10T22:31:23 1775860283

You're right, I apologize for that. I have been responding with annoyance rather than walking away when I receive replies that appear to be ignoring context.

dang · 2026-04-11T03:16:39 1775877399

Appreciated! and of course, I know it's not easy - believe me I know...

nl · 2026-04-08T01:12:23 1775610743

It's higher than all other models except vs Gemini 3.1 Pro on MMMLU

DroneBetter · 2026-04-08T10:45:52 1775645152

this just in: HN user forgets how sigmoid functions work

dang · 2026-04-10T15:29:59 1775834999

Please don't respond to a bad comment by breaking the site guidelines yourself. That only makes things worse.

https://news.ycombinator.com/newsguidelines.html