I totally agree. I remember the June magic as well - almost overnight my abilities and throughput were profoundly increased, I had many weeks of late nights in awe and wonder trying things that were beyond my ability to implement technically but within the bounds of my conceptual understanding.
Initially, I found Codex CLI with GPT-5 to be a substitute for Claude Code - now GPT-5 Codex materially surpasses it in my line of work, with a huge asterisk. I work in a niche industry, and Codex has generally poor domain understanding of many of the critical attributes and concepts. Claude happens to have better background knowledge for my tasks, so I've found that Sonnet 4.5 with Claude Code generally does a better job at scaffolding any given new feature. Then, I call in Codex to implement actual functionality since Codex does not have the "You're absolutely right" and mocked/placeholder implementation issues of CC, and just generally writes clean, maintainable, well-planned code. It's the first time I've ever really felt the whole "it's as good as a senior engineer" hype - I think, in most cases, GPT5-Codex finally is as good as a senior engineer for my specific use case.
I think Codex is a generally better product with better pricing, typically 40-50% cheaper for about the same level of daily usage for me compared to CC. I agree that it will take a genuinely novel and material advancement to dethrone Codex now. I think the next frontier for coding agents is speed. I would use CC over Codex if it was 2x or 3x as fast, even at the same quality level. Otherwise, Codex will remain my workhorse.
When I was in high school, I would see the algebra teacher work through expressions and go "ohhh, that makes sense". But when I got back home to work with the homework, I couldn't make the pieces fit.
Isn't that the same? Just because you recognize something someone else wrote and makes you go "ohh, I understand it conceptually" doesn't mean that you can apply that concept in a few days or weeks.
So when the person you responded to says:
>almost overnight *my abilities* and throughput were profoundly increased
I'd argue the throughput did but his abilities really weren't, because without the tool in question you're just as good as before the tool. To truly claim that his abilities were profoundly increased, he has to be able to internalize the pattern, recognize the pattern, and successfully reproduce it across variable contexts.
Another example would be claiming that my painting abilities and throughput were profoundly increased, because I used to draw stick figures and now I can draw Yu-Gi-Oh! cards by using the tool. My throughput was really increased, but my abilities as a painter really haven't.
>I think, in most cases, GPT5-Codex finally is as good as a senior engineer for my specific use case.
This is beyond bananas to me given that I regularly see codex high and Gpt-5-high both fail to create basic react code slightly off the normal distribution.
That might say something about the understandability of the react framework/paradigm ;)
Quality varies a lot based on what you're doing, how you prompt it, how you orchestrate it, and how you babysit and correct it. I haven't seen anything I'd call senior, but I have seen it, for some classes of tasks, turn this particular engineer into many seniors. I still have to supply all the heavy lifting (here's the concurrency model, how you'll ensure exactly-once-delivery, particular functions and classes you definitely want, a few common pitfalls to avoid, etc), but then it can flesh out the details extremely well.
If you really want to see it fail at something easy, try to have write something that can use JSX but doesn't use React (Bun, Hono, etc). Seems like no amount of context management and detailed instructions will keep it from reaching for React-isms.
Do you mind if I ask what kind of React code you're working on? I've had good success using Codex for my frontend development, especially since all of my projects consistently rely on a pretty widely used and well documented component library. I realize that makes my use case fairly narrow, so I don't think I've discovered the limits you have.
Today I was trying to get it to temporarily shim in for development and consume the value of a redux store via merely putting a default in the reducer. Depending on that value, the application would present different state.
It failed to accomplish this and added a disgusting amount of defensive nonsense code in my saga, reducer and component to ensure the value was there. It took me a very short time to correct it but just watching it completely fail at this task was borderline absurd.
Thanks for the context! I feel the same way. When it fails it fails hard. This is why I'm extremely skeptical of any of the non-cli cloud solutions - as you observed, I think the failures compound and cascade if you don't stop them early, which requires a compelling interface and the ability to manually intervene very fast.
Initially, I found Codex CLI with GPT-5 to be a substitute for Claude Code - now GPT-5 Codex materially surpasses it in my line of work, with a huge asterisk. I work in a niche industry, and Codex has generally poor domain understanding of many of the critical attributes and concepts. Claude happens to have better background knowledge for my tasks, so I've found that Sonnet 4.5 with Claude Code generally does a better job at scaffolding any given new feature. Then, I call in Codex to implement actual functionality since Codex does not have the "You're absolutely right" and mocked/placeholder implementation issues of CC, and just generally writes clean, maintainable, well-planned code. It's the first time I've ever really felt the whole "it's as good as a senior engineer" hype - I think, in most cases, GPT5-Codex finally is as good as a senior engineer for my specific use case.
I think Codex is a generally better product with better pricing, typically 40-50% cheaper for about the same level of daily usage for me compared to CC. I agree that it will take a genuinely novel and material advancement to dethrone Codex now. I think the next frontier for coding agents is speed. I would use CC over Codex if it was 2x or 3x as fast, even at the same quality level. Otherwise, Codex will remain my workhorse.