Literally no difference in productivity from a free/ <0.50c output OpenRouter model. All these > $1.00+ per mm output are literal scams. No added value to the world.
Many problems where latter spins its wheel and Pro gets it in one go, for me. You need to give Pro full files as context and you need to fit within its ~60k (I forget exactly) silent context window if using via ChatGPT. Don't have it make edits directly, have it give the execution plan back to Codex
I'm downloading DeepSeek-V3.2-Speciale now at FP8 (reportedly Gold-medal performance in the 2025 International Mathematical Olympiad and International Olympiad in Informatics).
It will fit in system RAM, and as its mixture of experts and the experts are not too large, I can at least run it. Token/second speed will be slower, but as system memory bandwidth is somewhere around 5-600Gb/s, so it should feel OK.
Check out "--n-cpu-moe" in llama.cpp if you're not familiar. That allows you to force a certain number of experts to be kept in system memory while everything else (including context cache and the parts of the model that every token touches) is kept in VRAM. You can do something like "-c128k -ngl 99 --n-cpu-moe <tuned_amt>" where you find a number that allows you to maximize VRAM usage without OOMing.
It's not perfect, for example its monthly/yearly subscription detection didn't work great for me, but compared to all those apps that involve trusting a third party with your banking data it's worth a look.
You can push to any other Git server during a GitHub outage to still share work, trigger a CI job, deploy etc, and later when GitHub is reachable again you push there too.
Yes you lose some convenience (like GitHub's pull requests UI can't be used, but you can temporarily use the other Git server's UI for that.
I think their point was that you're not fully locked in to GitHub. You have the repo locally and can mirror it on any Git remote.
- https://github.com/goccy/go-json
- https://github.com/bytedance/sonic
reply