I wonder why? It seems to work pretty well for me.
> Lesson 4: GPT is really bad at producing the null hypothesis
Tell me about it! Just yesterday I was testing a prompt around text modification rules that ended with “If none of the rules apply to the text, return the original text without any changes”.
Do you know ChatGPT’s response to a text where none of the rules applied?
“The original text without any changes”. Yes, the literal string.
You know all the stories about the capricious djinn that grants cursed wishes based on the literal wording? That's what we have. Those of us who've been prompting models in image space for years now have gotten a handle on this but for people who got in because of LLMs, it can be a bit of a surprise.
One fun anecdote, a while back I was making an image of three women drinking wine in a fancy garden for a tarot card, and at the end of the prompt I had "lush vegetation" but that was enough to tip the women from classy to red nosed frat girls, because of the double meaning of lush.
Programming is already the capricious djinn, only it's completely upfront as to how literally it interprets your commands. The guise of AI being able to infer your actual intent, which is impossible to do accurately, even for humans, is distracting tech folks from one of the main blessings of programming: forcing people to think before they speak and hone their intention.
> I wonder why? It seems to work pretty well for me.
I read this as "what we do works just fine to not need to use JSON mode". We're in the same boat at my company. Been live for a year now, no need to switch. Our prompt is effective at getting GPT-3.5 to always produce JSON.
There's nothing to switch to. You just enable it. No need to change the prompt or anything else. All it requires is that you mention "JSON" in your prompt, which you obviously already do.
You do need to change the prompt. You need to explicitly tell it to emit JSON, and in my experience, if you want it to follow a format you need to also provide that format.
I've found that this is pretty simple to do when you have a basic schema and there's no need to define one and enable function calling.
But in one of my cases, the schema is quite complicated, and "model doesn't produce JSON" hasn't been a problem for us in production. There's no incentive for us to change what we have that's working very well.
I wonder why? It seems to work pretty well for me.
> Lesson 4: GPT is really bad at producing the null hypothesis
Tell me about it! Just yesterday I was testing a prompt around text modification rules that ended with “If none of the rules apply to the text, return the original text without any changes”.
Do you know ChatGPT’s response to a text where none of the rules applied?
“The original text without any changes”. Yes, the literal string.