You know, I had a potential hire last week, and I was interviewing this one guy whose resume was really strong, it was exceptional in many ways plus his open-source code was looking really tight. But at the beginning of the interview, I always show the candidates the same silly code example with signed integer overflow undefined behavior baked in. I did the same here and asked him if he sees anything unusual with it, and he failed to detect it. We closed the round immediately and I disclosed no hire decision.
Does the ability to verbally detect gotchas in short conversations dealing only with text on a screen or white board really map to stronger candidates?
In actual situations you have documentation, editor, tooling, tests, and are a tad less distracted than when dealing with a job interview and all the attendant stress. Isn't the fact that he actually produces quality code in real life a stronger signal of quality?
It's bias and, from my experience, many people do not know how to assess the interviewee to extract his best. My example was luckily just a plastic example that sarcastically portrays how people nowadays are assessing LLM capabilities too. No difference.