Hacker Newsnew | past | comments | ask | show | jobs | submit | fariszr's commentslogin

These flash models keep getting more expensive with every release.

Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?

Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.

> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.

The replacement for old flash models will be probably the 3.0 flash lite then.


Yes, but the 3.0 Flash is cheaper, faster and better than 2.5 Pro.

So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh.


I agree, adding one point: a better model can in effect use fewer tokens if you get a higher percentage of successful one-shots to work. I am a ‘retired gentleman scientist’ so take this with a grain of salt (I do a lot of non-commercial, non-production experiments): when I watch the output for tool use, better models have fewer tool ‘re-tries.’


I think it's good, they're raising the size (and price) of flash a bit and trying to position Flash as an actually useful coding / reasoning model. There's always lite for people who want dirt cheap prices and don't care about quality at all.


Nvidia released Nemotron 3 nano recently and I think it fits your requirements for an OSS model: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...

It's extremely fast on good hardware, quite smart, and can support up to 1m context with reasonable accuracy


I second this: I have spent about five hours this week experimenting with Nemotron 3 nano for both tool use and code analysis: it is excellent! and fast!

Relevant to the linked Google blog: I feel like getting Nemotron 3 nano and Gemini 3 flash in one week is an early Christmas gift. I have lived with the exponential improvements in practical LLM tools over the last three years, but this week seems special.


For my apps evals Gemini flash and grok 4 fast are the only ones worth using. I'd love for an open weights model to compete in this arena but I haven't found one.


This one is more powerful than openai models, including gpt 5.2 (which is worse on various benchmarks than 5.1 which is worse than 5.1, and that's where 5.2 was using XHIGH, whiulst the others were on high eg: https://youtu.be/4p73Uu_jZ10?si=x1gZopegCacznUDA&t=582 )

https://epoch.ai/benchmarks/simplebench


cost of e2e task resolution should be cheaper, even if single inference cost is higher, you need fewer loops to solve a problem now


Sure, but for simple tasks that require a large context window, aka the typical usecase for 2.0 flash, it's still significantly more expensive.


This is a big jump in most benchmarks.And if it can match other models in coding while having that Google TPM inference speed and the actually native 1m context window, it's going to be a big hit.

I hope it's isn't such a sycophant like the current gemini 2.5 models, it makes me doubt its output, which is maybe a good thing now that I think about it.


> it's over for the other labs.

What's with the hyperbole? It'll tighten the screws, but saying that it's "over for the other labs' might be a tad premature.


I mean over in that I don't see a need to use the other models. Codex models are the best but incredibly slow. Claude models are not as good(IMO) but much faster. If gemini can beat them while having being faster and having better apps with better integrations, i don't see a reason why I would use another provider.


You should probably keep supporting competitors since if there's a monopoly/duopoly expect prices to skyrocket.


> it's over for the other labs.

Its not over and never will be for 2 decade old accounting software, it is definitely will not be over for other AI labs.


Can you explain what you mean by this? iPhone was the end of Blackberry. It seems reasonable that a smarter, cheaper, faster model would obsolete anything else. ChatGPT has some brand inertia, but not that much given it's barely 2 years old.


Yeah iPhone was the end of Blackberry but Google Pixel was not the end of iPhone.

The new Gemini is not THAT far of a jump to switch your org to a new model if you already invested in e.g. OpenAI.

The difference must be night and day to call it "its over".

Right they all are marginally different. Today google fine tuned their model to be better, tomorrow it will be new Kimi, after that DeepSeek.


Ask yourself why Microsoft Teams won. These are business tools first and foremost.


That's an odd take. Teams doesn't have the leading market share in videoconferencing, Zoom does. I can't judge what it's like because I've never yet had to use Teams - not a single company that we deal with uses it, it's all Zoom and Chime - but I do hear friends who have to use it complain about it all the time. (Zoom is better than it used to be, but for all that is holy please get rid of the floating menu when we're sharing screens)


https://github.com/FarisZR/knocker

Knocker, an http knock based access service for your homelab that works at a reverse proxy or firewall level.

It's a more convenient albeit less secure alternative to VPNs like tailscale. It's more convenient because it whitelists the enite network, and it's less secure for that reason.


Is there a way to force clients to use a relay? It seems like this is only meant as a fallback, but what if a relayed connection is actually faster (like when direct peering between tailnet members is slow, not rare in consumer connections)


Just set the relay up as an exit node too


Hey I'm the creator of knocker! I actually wanted to write a blog post about it before posting, but OP already did that. If you have any questions just let me know!

Will go into more details why I created in the blog post coming very soon! Just doing the final touches right now.


The TTL is for the whitelist. The whitelist rules aren't permanent.


Sorry if i wasn't clear. It isn't more secure, it's just more convenient because it works in every network, without needing to set up a VPN connection on each device.

I created this because I always have a VPN on my devices, and I can't have tailscale running with that, in addition to tailscale killing my battery life on android.


> Why not just run a reverse proxy in front of (whatever service you're trying to protect) and use the API keys there?

Because it breaks the clients of most homelab services.

That's what authelia does.


It's a compromise.It's not as secure as using a VPN, but it's way more convenient, since only one device has to have a knocker client on it without needing any sort of VPN.

The likelihood of someone is on the same network as you noticing your servic, try to hack it, before the TTL expires again is IMO quite low.

This is without taking into account that the services themselves have their own security and login processes, getting a port open doesn't mean the service is hacked.


Tailscale is not as easy as this. It has to be installed on every device or at the router level.

And it will not work on mobile if you already use another VPN.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: