I think it's more likely that people are confused, and OpenAI is not making things any clearer either.
AFAIK, OpenAI has repeatedly stated that GPT4 hasn't changed. People repeatedly states that when they use ChatGPT, they get a difference experience today than before. Both can be true at the same time, as ChatGPT is a "packaged" experience of GPT4, so if you use the API versions, nothing has likely changed. But ChatGPT has guaranteed changed, for better or worse, as that's "just" integration work rather than fundamental changes to the model.
In the discussions on HN, people tend to speak each other regarding this as well, saying things like "GPT4 has for sure changed" when their only experience of GPT4 is via ChatGPT, which has changed since launch, obviously.
But ChatGPT != GPT4, which could always be made clearer.
It's a bit of both. The GPT-4 models have definitely been changing - there's multiple versions right now and you can try them out in the Playground. One of the biggest differences is that the latest model patches all of the GPT-4 jailbreak prompts; quite a big change if you were doing anything remotely spicy. But OA also says that it hasn't been changing the underlying model beyond that (that's probably the tweet you're thinking of), while people are still reporting big degradations in the ChatGPT interface, and those may be mistakes or changes in the rest of the infrastructure.
I was just getting started with ChatGPT plus in mid may. the exact date was not clear but I was within the first week of using GPT4 via chatgpt plus to write some work ansible code. on may 16 (not that exact date, but day N) it was amazing and when I wasn't writing work stuff, I was brainstorming for my novel.
The next day, suddenly prompts that used to work now gave much more generic results, the code was much more skinflinty and the kept trying to 'no wait I'm going to leave that long code as an exercise for you human'.
I didn't have time to buy in to a hallucination, I wasn't involved in openai chats to get 'infected by hysteria' or whatever, I was just using the tool a ton. and there was a noticeable change on day N+1 that has persisted until now.
The fact that gpt4 API calls appear to be similar tells me they changed their hidden meta prompt on the chatgpt plus website backend and are not admitting that they adjusted the meta prompt or other settings on the interface middleware between the JS webpage we users see and the actual gpt4 models running.
I’d note they explicitly document they rev GPT-4 every two weeks and provide fixed snapshots of the prior periods model for reference. One could reasonably benchmark the evolution of the model performance and publish the results. But certainly you’re right - ChatGPT != GPT4, and I would expect that ChatGPT performs worse than GPT4 as it’s likely extremely constrained in its guidance, tunings, and whatever else they do to form ChatGPT’s behavior. It might also very well be that to scale and revenue follow costs they’ve dumbed down the ChatGPT plus. I’ve found it increasingly less useful over time but I sincerely feel like it’s mostly because of the layers of sandbox protection they’re adding constraining the model into non optimal spaces. I do find that the classical iterative prompt engineering still helps a great deal - give it a new identity aligned to the subject matter. Insist on depth. Insist on checking the work and repeating itself. Asking it if it’s sure about a response. Periodically reinforcing the context you want to boost the signal. Etc.
Heh, this kind of reminds me of the process of enterprise support.
Working with the customer in dev: "Ok, run this SQL query and restart the service. Done, ok does the test case pass?" Done in 15 minutes.
Working with customer in production: "Ok, here is a 35 point checklist of what's needed to run the SQL query and restart the service. Have your compliance officer check it and get VP approval, then we'll run implementation testing and verification" --same query and restart now takes 6 hours.
> so if you use the API versions, nothing has likely changed
I doubt that. I don't recall them actually clearly and precisely saying they aren't changing the 'gpt-4' model - i.e. the model you're getting when specifying 'gpt-4' in an API call. That one direct tweet I recall, which I think you're referring to, could be read more narrowly as saying the pinned versions didn't change.
That is, if you issue calls against 'gpt-4-0314', then indeed nothing changed since its release. But with calls against 'gpt-4', anything goes.
This would be consistent with their documentation and overall deployment model: the whole reason behind the split between versioned (e.g. 'gpt-4-0314', 'gpt-4-0613') and unversioned models (e.g. 'gpt-4') was so that you could have both stable base and a changing tip. If that tweet is to be read as saying 'gpt-4' didn't change since release, then the whole thing with versioning is kind of redundant.
The -0613 version is really different! It added function calling to the API as a hint to the LLM, and in my experience if you don't use function calling it's significantly worse at code-like tasks, but if you do use it, it's roughly equivalent or better when it calls your function.
Seconded. In particular, how does function calling help restore performance in general prompts like: "Here's roughly what I'm trying to achieve: <bunch of requirements> Could you please write me such function/script/whatever?".
Maybe I lack the imagination, but what function should I give to the LLM? "insert(text: string)"?
For generating arbitrary code, I imagine you could do the same thing but swap `query_db` with the name `exec_javascript` or something similar based on your preferred language.
>But ChatGPT != GPT4, which could always be made clearer.
Isn't the thread about ChatGPT? I mean it is helpful to know that they are not the same (I personally was not clear on this myself, so I, at least, benefitted from your comment), but I think the thread is just about Chat GPT.
AFAIK, OpenAI has repeatedly stated that GPT4 hasn't changed. People repeatedly states that when they use ChatGPT, they get a difference experience today than before. Both can be true at the same time, as ChatGPT is a "packaged" experience of GPT4, so if you use the API versions, nothing has likely changed. But ChatGPT has guaranteed changed, for better or worse, as that's "just" integration work rather than fundamental changes to the model.
In the discussions on HN, people tend to speak each other regarding this as well, saying things like "GPT4 has for sure changed" when their only experience of GPT4 is via ChatGPT, which has changed since launch, obviously.
But ChatGPT != GPT4, which could always be made clearer.