Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here [1] is the leaderboard from chabot arena, where users vote on the output of two anonymous models. Deepseek R1 needs more data points- but it already climbed to No 1 with Style control ranking, which is pretty impressive.

Link [2] to the result on more standard LLM benchmarks. They conveniently placed the results on the first page of the paper.

[1] https://lmarena.ai/?leaderboard

[2] https://arxiv.org/pdf/2501.12948 (PDF)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: