I agree a lot with the first part, the only time I actually feel productive with...

igravious · 2025-11-13T13:46:56 1763041616

> You can prompt for that though, include something like "Include all the sources you came across, and explain why you think it was irrelevant" and unsurprisingly, it'll include those. I've also added a "verify_claim" tool which it is instructed to use for any claims before sharing a final response, checks things inside a brand new context, one call per claim. So far it works great for me with GPT-OSS-120b as a local agent, with access to search tools.

Feel like this should be built in?

Explain your setup in more detail please?

embedding-shape · 2025-11-13T14:20:11 1763043611

> Feel like this should be built in?

Not everyone uses LLMs the same way, which is made extra clear because of the announcement this submission is about. I don't want conversational LLMs, but seems that perspective isn't shared by absolutely everyone, and that makes sense, it's a subjective thing how you like to be talked/written to.

> Explain your setup in more detail please?

I don't know what else to tell you that I haven't said already :P Not trying to be obtuse, just don't know what sort of details you're looking for. I guess in more specific terms; I'm using llama.cpp(/llama-server) as the "runner", and then I have a Rust program that acts as the CLI for my "queries", and it makes HTTP requests to llama-server. The requests to llama-server includes "tools", where one of those is a "web_search" tool hooked up to a local YaCy instance, another is "verify_claim" which basically restarts a new separate conversation inside the same process, with access to a subset of the tools. Is that helpful at all?

AJ007 · 2025-11-13T15:48:30 1763048910

"one call per claim" I wonder how long it takes for it to be common knowledge how important this is. Starting to think never. Great idea by the way, I should try this.

embedding-shape · 2025-11-13T15:55:54 1763049354

I've been trying to figure out ways of highlighting why it's important and how it actually works, maybe some heatmap of the attention of previous tokens, so people can see visually how messed up things become once even two concepts at the same time are mixed.