Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You need iteration and I believe these kinds of AI have the same issues as us.

It's funny how we resort to humanizing the machines when their results are inaccurate. We don't do that with the calculator, because it's expected to be 100% bug free. When there's a bug in the calculator code we expect it to be fixed, not gradually improved.

Speaking of bugs: mistakes in code is one thing, wrong output because of a fundamental flaw in the algorithm is another. The statistical machines we are dealing with work as intended, or at least the wrong output the top comment here brings up is not a bug, it's a feature. That's the difference.



Literally LLMs get much better with chain of thought, feedback, and/or consensus.

Gpt-3 performance on MultiArith goes from 18% to 92% with all three. This isn't some hackneyed anthropomizing. Countless research papers showing massive improvement with these processes.


That's (IMO) too narrow view of what a "machine" is. Complex machinery of any kind never is 100% correct and needs constant correction and maintenance. I still think approaching this as a "calculator" is awkward at best.


> Complex machinery of any kind never is 100% correct and needs constant correction and maintenance

Computers are extremely close to 100%, we generally expect a CPU to never make errors even after years of working. If it starts making any errors at all we throw it away and make a new one.


This is a very weird statement that's failing based on logical category.

My computer will pretty much add 1+1 correctly forever never making a mistake.

My computer will perform an 'error' every time I put bad code into it, and some of those logic chains and error conditions are not very obvious.

The issue here is you think the LLM is performing a category 1 error, when the problem we are seeing is a much more human like category 2 error.


>Computers are extremely close to 100%

We must work in extremely different industries!


Do you code in checks to check the calculations made by the CPU? I've never ever seen anyone do that. If a CPU starts making errors we throw it away. A typical CPU will make many quadrillions of correct calculations before its first error, I'd say that is basically 0 errors.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: