Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The biggest jump in the numbers they quoted is 6%.

Please look at the columns OTHER than Opus as well.



> Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)

> Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%

> USAMO: 97.6% / 42.3% / 95.2% / 74.4%

> The biggest jump in the numbers they quoted is 6%.

Just in the numbers you quoted, thats a 16.6% jump in terminal-bench and a 55.3% absolute increase in USAMO over their previous Opus 4.6 model.


I don’t know if you’re willingly disregarding everything being said to you or there’s a language barrier here.


Can you please stop posting comments with personal swipes in them? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.


You're right, I apologize for that. I have been responding with annoyance rather than walking away when I receive replies that appear to be ignoring context.


Appreciated! and of course, I know it's not easy - believe me I know...


It's higher than all other models except vs Gemini 3.1 Pro on MMMLU


this just in: HN user forgets how sigmoid functions work


Please don't respond to a bad comment by breaking the site guidelines yourself. That only makes things worse.

https://news.ycombinator.com/newsguidelines.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: