Code correctness should be checked automatically with the CI and testsuite. New ...

merely-unlikely · 2025-12-07T17:02:50 1765126970

This discussion makes me think peer reviews need more automated tooling somewhat analogous to what software engineers have long relied on. For example, a tool could use an LLM to check that the citation actually substantiates the claim the paper says it does, or else flags the claim for review.

noitpmeder · 2025-12-07T17:15:52 1765127752

I'd go one further and say all published papers should come with a clear list of "claimed truths", and one is only able to cite said paper if they are linking in to an explicit truth.

Then you can build a true hierarchy of citation dependencies, checked 'statically', and have better indications of impact if a fundamental truth is disproven, ...

vkou · 2025-12-07T20:02:34 1765137754

Have you authored a lot of non-CS papers?

Could you provide a proof of concept paper for that sort of thing? Not a toy example, an actual example, derived from messy real-world data, in a non-trivial[1] field?

---

[1] Any field is non-trivial when you get deep enough into it.

noitpmeder · 2025-12-13T01:38:59 1765589939

I'd say my expectation is papers should be minimal in their effect, and compounding. If your project proves new facts, either they should be clearly enumerable (with as much specificity as possible), or your project/presentation/paper should be broken up to the point your findings ARE enumerable.

alexcdot · 2025-12-07T23:12:34 1765149154

hey, i'm a part of the gptzero team that built automated tooling, to get the results in that article!

totally agree with your thinking here, we can't just give this to an LLM, because of the need to have industry-specific standards for what is a hallucination / match, and how to do the search

thfuran · 2025-12-07T17:01:57 1765126917

What exactly is the analogy you’re suggesting, using LLMs to verify the citations?

tpoacher · 2025-12-07T19:18:44 1765135124

not OP, but that wouldn't really be necessary.

One could submit their bibtex files and expect bibtex citations to be verifiable using a low level checker.

Worst case scenario if your bibtex citation was a variant of one in the checker database you'd be asked to correct it to match the canonical version.

However, as others here have stated, hallucinated "citations" are actually the lesser problem. Citing irrelevant papers based on a fly-by reference is a much harder problem; this was present even before LLMs, but this has now become far worse with LLMs.

thfuran · 2025-12-07T20:28:15 1765139295

Yes, I think verifying mere existence of the cited paper barely moves the needle. I mean, I guess automated verification of that is a cheap rejection criterion, but I don’t think it’s overall very useful.

alexcdot · 2025-12-07T23:37:43 1765150663

really good point. one of the cofounders of gptzero here!

the tool gptzero used in the article also detects if the citation supports the claim too, if you scroll to "cited information accuracy" here: https://app.gptzero.me/documents/1641652a-c598-453f-9c94-e0b...

this is still in beta because its a much harder problem for sure, since its hard to determine if a 40 page paper supports a claims (if the paper claims X is computationally intractable, does that mean algorithms to compute approximate X are slow?)