GPT models shouldn't be used at temp 1 unless you only care about creative writing. They get much worse at factual stuff and code than with lower temperatures. And yes, 3.5 Turbo is less affected by this, which might be the reason why the models performed for you in reverse.