Here [1] is the leaderboard from chabot arena, where users vote on the output of two anonymous models. Deepseek R1 needs more data points- but it already climbed to No 1 with Style control ranking, which is pretty impressive.
Link [2] to the result on more standard LLM benchmarks. They conveniently placed the results on the first page of the paper.
Link [2] to the result on more standard LLM benchmarks. They conveniently placed the results on the first page of the paper.
[1] https://lmarena.ai/?leaderboard
[2] https://arxiv.org/pdf/2501.12948 (PDF)