Aristotle's output is formally verified in Lean, so you can run it for days on a...

jjmarr · 2026-01-10T07:07:14 1768028834

Seeing a task-specific model be consistently better at anything is extremely surprising given rapid innovation in foundation models.

Have you tried Aristotle on other, non-Lean tasks? Is it better at logical reasoning in general?

runeblaze · 2026-01-11T02:09:47 1768097387

Is it though? There is a reason gpt has codex variants. RL on a specific task raises the performance on that task

jjmarr · 2026-01-11T02:30:42 1768098642

Post-training doesn't transfer over when a new base model arrives so anyone who adopted a task-specific LLM gets burned when a new generational advance comes out.

runeblaze · 2026-01-11T17:35:59 1768152959

Resouce-affording, if you are chasing the frontier of some more niche task you redo your training regime on the new-gen LLMs

Davidzheng · 2026-01-10T08:20:04 1768033204

how strong is your internal informal LLM at theorem-proving before the formalization stage? or it's combined in a way so that is not measurable?