Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looks like the best way to keep improving the models is to come up with really useful benchmarks and make them popular. ARC-AGI-2 is a big jump, I'd be curious to find out how that transfers over to everyday tasks in various fields.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: