[comment removed]

Palmik · 2025-11-18T13:19:54 1763471994

The reported results where GPT 5.1 beats Gemini 3 are on SWE Bench Verified, and GPT 5.1 Codex also beats Gemini 3 on Terminal Bench.

HereBePandas · 2025-11-18T13:30:50 1763472650

You're right on SWE Bench Verified, I missed that and I'll delete my comment.

GPT 5.1 Codex beats Gemini 3 on Terminal Bench specifically on Codex CLI, but that's apples-to-oranges (hard to tell how much of that is a Codex-specific harness vs model). Look forward to seeing the apples-to-apples numbers soon, but I wouldn't be surprised if Gemini 3 wins given how close it comes in these benchmarks.

Palmik · 2025-11-18T14:04:33 1763474673

All evals on Terminal Bench require some harness. :) Or "Agent", as Terminal Bench calls it. Presumably the Gemini 3 are using Gemini CLI.