Hacker Newsnew | past | comments | ask | show | jobs | submit | PhilippGille's commentslogin


Gemini 3 Pro Preview also got more expensive than 2.5 Pro.

2.5 Pro: $1.25 input, $10 output (million tokens)

3 Pro Preview: $2 input, $12 output (million tokens)


Literally no difference in productivity from a free/ <0.50c output OpenRouter model. All these > $1.00+ per mm output are literal scams. No added value to the world.

5.1 Pro is great

I struggle to see where Pro is better than 5.x with Thinking. Actually prefer the latter.

Many problems where latter spins its wheel and Pro gets it in one go, for me. You need to give Pro full files as context and you need to fit within its ~60k (I forget exactly) silent context window if using via ChatGPT. Don't have it make edits directly, have it give the execution plan back to Codex

> He has GLM 4.5 Running at ~100 Tokens per second.

GLM 4.5 Air, to be precise. It's a smaller 166B model, not the full 355B one.

Worth mentioning when discussing token throughput.


I'm downloading DeepSeek-V3.2-Speciale now at FP8 (reportedly Gold-medal performance in the 2025 International Mathematical Olympiad and International Olympiad in Informatics).

It will fit in system RAM, and as its mixture of experts and the experts are not too large, I can at least run it. Token/second speed will be slower, but as system memory bandwidth is somewhere around 5-600Gb/s, so it should feel OK.


Check out "--n-cpu-moe" in llama.cpp if you're not familiar. That allows you to force a certain number of experts to be kept in system memory while everything else (including context cache and the parts of the model that every token touches) is kept in VRAM. You can do something like "-c128k -ngl 99 --n-cpu-moe <tuned_amt>" where you find a number that allows you to maximize VRAM usage without OOMing.

Why not? A good status page runs on a different cloud provider in a different region, specifically to not be affected at the same time.


Multiple projects for autonomous multi agent teams already exist.


Qwen2.5-VL-7B to be precise. It's a relevant difference.


Related from July:

"Linux on Snapdragon X Elite: Linaro and Tuxedo Pave the Way for ARM64 Laptops"

291 points, 217 comments

https://news.ycombinator.com/item?id=44699393


The first comment there is worth reading again, just for this sentence:

If you want to change some settings oft[sic] the device, you need to use their terrible Electron application.


Not mentioned yet in this subthread, but worth checking out because it runs fully local: https://play.google.com/store/apps/details?id=com.stoegerit....

It's not perfect, for example its monthly/yearly subscription detection didn't work great for me, but compared to all those apps that involve trusting a third party with your banking data it's worth a look.


You can push to any other Git server during a GitHub outage to still share work, trigger a CI job, deploy etc, and later when GitHub is reachable again you push there too.

Yes you lose some convenience (like GitHub's pull requests UI can't be used, but you can temporarily use the other Git server's UI for that.

I think their point was that you're not fully locked in to GitHub. You have the repo locally and can mirror it on any Git remote.


For sure, you don’t have to use GitHub to be that shared server.

It is awfully convenient, web interface, per branch permissions and such.

But you can choose a different server.


If your whole network is down, and you also don't want to connect the hosts with an Ethernet cable, you can even just push to an USB stick.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: