Claude is good at writing code, not so good at reasoning, and I would never trust or deploy to production something solely written by Claude.
GPT-5.2 is not as good for coding, but much better at thinking and finding bugs, inconsistencies and edge cases.
The only decent way I found to use AI agents is by doing multiple steps between Claude and GPT, asking GPT to review every step of every plan and every single code change from Claude, and manually reviewing and tweaking questions and responses both way, until all the parties, including myself, agree. I also sometimes introduce other models like Qwen and K2 in the mix, for a different perspective.
And gosh, by doing so you immediately realize how dumb, unreliable and dangerous code generated by Claude alone is.
It's a slow and expensive process and at the end of the day, it doesn't save me time at all. But, perhaps counterintuitively, it gives me more confidence in the end result. The code is guaranteed to have tons of tests and assurance for edge cases that I may not have thought about.
Would love to know myself, I recall there was some plugin for VSCode that did next edits that accepted a custom model but I don't recall what it was now.
Because the authors found out about it by chance on Hacker News.
That said, these issues are not a big deal.
The first one concerns someone manually reading a signature with cat (which is completely untrusted at that stage, since nothing has been verified), then using the actual tool meant to parse it, and ignoring that tool’s output. cat is a different tool from minisign.
If you manually cat a file, it can contain arbitrary characters, not just in the specific location this report focuses on, but anywhere in the file.
The second issue is about trusting an untrusted signer who could include control characters in a comment.
In that case, a malicious signer could just make the signed file itself malicious as well, so you shouldn’t trust them in the first place.
Still, it’s worth fixing. In the Zig implementation of minisign, these characters are escaped when printed. In the C implementation, invalid strings are now rejected at load time.
I just end up using most of these models with Claude Code as the tooling because it just seems to work better than anything else. Crush also works well.
GPT-5.2 is not as good for coding, but much better at thinking and finding bugs, inconsistencies and edge cases.
The only decent way I found to use AI agents is by doing multiple steps between Claude and GPT, asking GPT to review every step of every plan and every single code change from Claude, and manually reviewing and tweaking questions and responses both way, until all the parties, including myself, agree. I also sometimes introduce other models like Qwen and K2 in the mix, for a different perspective.
And gosh, by doing so you immediately realize how dumb, unreliable and dangerous code generated by Claude alone is.
It's a slow and expensive process and at the end of the day, it doesn't save me time at all. But, perhaps counterintuitively, it gives me more confidence in the end result. The code is guaranteed to have tons of tests and assurance for edge cases that I may not have thought about.
reply