This is true, but here the equivalent situation is someone using a greek question mark (";") instead of a semicolon (";"), and you as a code reviewer are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.
Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.
So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.
> as a code reviewer [you] are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.
As a PR reviewer I frequently pull down the code and run it. Especially if I'm suggesting changes because I want to make sure my suggestion is correct.
I don't commonly do this and I don't know many people who do this frequently either. But it depends strongly on the code, the risks, the gains of doing so, the contributor, the project, the state of testing and how else an error would get caught (I guess this is another way of saying "it depends on the risks"), etc.
E.g. you can imagine that if I'm reviewing changes in authentication logic, I'm obviously going to put a lot more effort into validation than if I'm reviewing a container and wondering if it would be faster as a hashtable instead of a tree.
> because I want to make sure my suggestion is correct.
In this case I would just ask "have you already also tried X" which is much faster than pulling their code, implementing your suggestion, and waiting for a build and test to run.
I do too, but this is a conference, I doubt code was provided.
And even then, what you're describing isn't review per se, it's replication. In principle there are entire journals that one can submit replication reports to, which count as actual peer reviewable publications in themselves. So one needs to be pragmatic with what is expected from a peer review (especially given the imbalance between resources invested to create one versus the lack of resources offered and lack of any meaningful reward)
> I do too, but this is a conference, I doubt code was provided.
Machine learning conferences generally encourage (anonymized) submission of code. However, that still doesn't mean that replication is easy. Even if the data is also available, replication of results might require impractical levels of compute power; it's not realistic to ask a peer reviewer to pony up for a cloud account to reproduce even medium-scale results.
No, because this is usually a waste of time, because CI enforces that the code and the tests can run at submission time. If your CI isn't doing it, you should put some work in to configure it.
If you regularly have to do this, your codebase should probably have more tests. If you don't trust the author, you should ask them to include test cases for whatever it is that you are concerned about.
If there’s anything I would want to run to verify, I ask the author to add a unit test. Generally, the existing CI test + new tests in the PR having run successfully is enough. I might pull and run it if I am not sure whether a particular edge case is handled.
Reviewers wanting to pull and run many PRs makes me think your automated tests need improvement.
> This is true, but here the equivalent situation is someone using a greek question mark (";") instead of a semicolon (";"),
No it's not. I think you're trying to make a different point, because you're using an example of a specific deliberate malicious way to hide a token error that prevents compilation, but is visually similar.
> and you as a code reviewer are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.
What weird world are you living in where you don't have CI. Also, it's pretty common I'll test code locally when reviewing something more complex, more complex, or more important, if I don't have CI.
> Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.
I don't, because it won't compile. Not because I assume good faith. References and citations are similar to introducing dependencies. We're talking about completely fabricated deps. e.g. This engineer went on npm and grabbed the first package that said left-pad but it's actually a crypto miner. We're not talking about a citation missing a page number, or publication year. We're talking about something that's completely incorrect, being represented as relevant.
> So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.
I would never miss this, because the important thing is code needs to compile. If it doesn't compile, it doesn't reach the master branch. Peer review of a paper doesn't have CI, I'm aware, but it's also not vulnerable to syntax errors like that. A paper with a fake semicolon isn't meaningfully different, so this analogy doesn't map to the fraud I'm commenting on.
you have completely missed the point of the analogy.
breaking the analogy beyond the point where it is useful by introducing non-generalising specifics is not a useful argument. Otherwise I can counter your more specific non-generalising analogy by introducing little green aliens sabotaging your imaginary CI with the same ease and effect.
I disagree you could do that and claim to be reasonable.
But I agree, because I'd rather discuss the pragmatics and not bicker over the semantics about an analogy.
Introducing a token error, is different from plagiarism, no? Someone wrote code that can't compile, is different from someone "stealing" proprietary code from some company, and contributing it to some FOSS repo?
In order to assume good faith, you also need to assume the author is the origin. But that's clearly not the case. The origin is from somewhere else, and the author that put their name on the paper didn't verify it, and didn't credit it.
Sure but the focus here is on the reviewer not the author.
The point is what is expected as reasonable review before one can "sign their name on it".
"Lazy" (or possibly malicious) authors will always have incentives to cut corners as long as no mechanisms exist to reject (or even penalise) the paper on submission automatically. Which would be the equivalent of a "compiler error" in the code analogy.
Effectively the point is, in the absence of such tools, the reviewer can only reasonably be expected to "look over the paper" for high-level issues; catching such low-level issues via manual checks by reviewers has massively diminishing returns for the extra effort involved.
So I don't think the conference shaming the reviewers here in the absence of providing such tooling is appropriate.
Code correctness should be checked automatically with the CI and testsuite. New tests should be added. This is exactly what makes sure these stupid errors don't bother the reviewer. Same for the code formatting and documentation.
This discussion makes me think peer reviews need more automated tooling somewhat analogous to what software engineers have long relied on. For example, a tool could use an LLM to check that the citation actually substantiates the claim the paper says it does, or else flags the claim for review.
I'd go one further and say all published papers should come with a clear list of "claimed truths", and one is only able to cite said paper if they are linking in to an explicit truth.
Then you can build a true hierarchy of citation dependencies, checked 'statically', and have better indications of impact if a fundamental truth is disproven, ...
Could you provide a proof of concept paper for that sort of thing? Not a toy example, an actual example, derived from messy real-world data, in a non-trivial[1] field?
---
[1] Any field is non-trivial when you get deep enough into it.
hey, i'm a part of the gptzero team that built automated tooling, to get the results in that article!
totally agree with your thinking here, we can't just give this to an LLM, because of the need to have industry-specific standards for what is a hallucination / match, and how to do the search
One could submit their bibtex files and expect bibtex citations to be verifiable using a low level checker.
Worst case scenario if your bibtex citation was a variant of one in the checker database you'd be asked to correct it to match the canonical version.
However, as others here have stated, hallucinated "citations" are actually the lesser problem. Citing irrelevant papers based on a fly-by reference is a much harder problem; this was present even before LLMs, but this has now become far worse with LLMs.
Yes, I think verifying mere existence of the cited paper barely moves the needle. I mean, I guess automated verification of that is a cheap rejection criterion, but I don’t think it’s overall very useful.
this is still in beta because its a much harder problem for sure, since its hard to determine if a 40 page paper supports a claims (if the paper claims X is computationally intractable, does that mean algorithms to compute approximate X are slow?)
I don't know about real world "examples", but the beauty of tail-call recursion specifically is the theoretical insight that they have a one-to-one mapping with an loop-based equivalent formulation, and vice versa (which is generally not necessarily true of all recursion).
But, for languages that don't have loop constructs and you need to rely on recursion, all you need to do is write your recipe in standard loop form, and then map back to a tail-call syntax. This is often a LOT easier than trying to think of the problem in a recursive mindset from scratch. (though occasionally, the reverse is also true.)
So the only constraint for re-implementing such looped logic onto tailcalls is that this relies on the stack, which may overflow. By providing TCO you are effectively removing that restriction, so it's a very useful thing for a language to support (especially if they don't provide low-level loops).
The title "tail call optimisation" in the package above is a bit of a misnomer, since this is more of a "transformation" than an "optimisation", but effectively the whole loop-tailcall equivalence is exactly what the package mentioned above relies on to work; it uses decorators to transform tail-call recursive functions to their equivalent loop-based formulations, and thus passing the need to create multiple stacks for the recursion (and risk stack overflow), since the translated loop will now take place in a single stack frame.
but i suspect you're talking about tail-recursion rather than TCO specifically. Otherwise the only sensible answer is, why on earth wouldn't you want that if you could have it for free?
so as for tail recursion examples, one nice example i had in the past which made thinking about the problem a lot easier than loops, was when I was designing a 3D maze-like game. The recuraion allowed me to draw each subsequent "step" visible on the screen without having to kniw in advance hiw many steps should be visible. you just draw the "next" room at increasing vanishing distance, until you hit a "wall" (the base case). It was a very simple, elegant result for minimal code; where the equivalent loop would have been long and horrible.
There isn't a killer use case, because tail calls (to yourself or to siblings) can always be easily converted to a loop, and the loop is more idiomatic in most mainstream languages.
> All users who host projects on SourceHut are expected to pay according to their means. choose the subscription plan most appropriate to your means — there is no difference between the subscriptions besides price.
Interesting approach and asking for some money upfront to cover the actual hosting costs and other stuff feels pretty good - rather than having to worry about shady monetization and about whether your data is the product.
Drew's direct engagement into tech cancel-culture (with targets such as DHH, RMS, Andreas Kling, Jack Dorsey), makes it difficult to do business with him (assuming hosted sourcehut service as an alternative to codeberg). Furthermore, at the newly proposed service rates it is much more liberating to self-host (any lightweight forge–including sourcehut).
I haven't followed closely; but the few times I did, it seemed that he had reasonably nuanced opinions translating into upholdable values, rather than overzealous cancel-fever, whether I agreed with his opinions or not. To me this is not reason enough to not use his product, and I happen to like his product (much more than the alternatives anyway).
Also, it would be remiss of me not to appreciate the irony that you're effectively suggesting "cancelling" his business over his opinions which you consider of a "cancelly" nature ...
I was trying to find an e-ink tablet, amazon kept recommending me the magic notepad from xppen. It looked good, but I wasn't sure what that cryptic "x-paper display" was. The wording is just vague enough to make you think it's an e-paper display, without committing to that detail.
It took going through comments to find out that it's not an e-ink display.
The Daylight computer seem like that too. So what do you think of the display? Is it just another LED screen, or does it approach e-ink in any way?
I really hated how they marketed that tablet. Some weird statements about how it will transform your life, all while making meaningless comparisons about framerate by just not acknowledging the difference between e-ink and transflective LCDs (to the point I found it intentionally misleading).
reply