That's not how modern LLMs are built. The days of dumping everything on the internet into the training data and crossing your fingers are long past.
Anthropic and OpenAI spent most of 2025 focusing almost expensively on improving the coding abilities of their models, through reinforcement learning combined with additional expert curation of training data.
Have you seen how many people are talking about the November 2025 inflection point, where the models ticked over from being good at running coding agents to being really good at it?
Anthropic and OpenAI spent most of 2025 focusing almost expensively on improving the coding abilities of their models, through reinforcement learning combined with additional expert curation of training data.