Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

do you find a 40-60% failure rate fits your definition of correctness? I don't think they really needed to spell this failure out...

https://www.salesforce.com/blog/why-generic-llm-agents-fall-...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: