You can train a smaller model and run inference multiple times and it will reach similar performance as a larger model running inference just once. What's the best way to make use of those multiple inferences is still up to debate, but we already know it works (self-consistency is one example).