Vertex's offering of Gemini very much does implicit caching, and has always been the case [1]. The recent addition of applying implicit cache hit discounts also works on Vertex, as long as you don't use the `global` endpoint and hit one of the regional endpoints.
[1]: http://web.archive.org/web/20240517173258/https://cloud.goog..., "By default Google caches a customer's inputs and outputs for Gemini models to accelerate responses to subsequent prompts from the customer. Cached contents are stored for up to 24 hours."
It's a change to the CA rules that was passed in https://cabforum.org/2022/04/06/ballot-csc-13-update-to-subs... to align OV certificate requirements with the EV ones (that enforces the use of HSMs/hardware tokens/etc) that was meant to go into effect for new certificates issued after November 2022, but was delayed and eventually implemented on June 1 2023.
Since April 2023 they support custom OIDC providers[1], and as of April 2024 that was extended to the free plan as well[2], so you can bring your own auth.
https://www.minimaxi.com is their website for the Chinese parent company 上海稀宇科技有限公司, https://minimax.io is their international website for the Singapore based company Nanonoble Pte Ltd that handles operations outside of China.
Your linked article is specifically comparing two different versioned snapshots of a model and not comparing the same model across time.
You've also made the mistake of conflating what's served via API platforms which are meant to be stable, and frontends which have no stability guarantees, and are very much iterated on in terms of the underlying model and system prompts. The GPT-4o sycophancy debacle was only on the specific model that's served via the ChatGPT frontend and never impacted the stable snapshots on the API.
I have never seen any sort of compelling evidence that any of the large labs tinkers with their stable, versioned model releases that are served via their API platforms.
I did read it, and I even went to their eval repo.
> At the time of writing, there are two major versions available for GPT-4 and GPT-3.5 through OpenAI’s API, one snapshotted in March 2023 and another in June 2023.
openaichat/gpt-3.5-turbo-0301 vs openaichat/gpt-3.5-turbo-0613, openaichat/gpt-4-0314 vs openaichat/gpt-4-0613. Two _distinct_ versions of the model, and not the _same_ model over time like how people like to complain that a model gets "nerfed" over time.
It's not a 4B parameter model. The E4B variant is 7B parameters with 4B loaded into memory when using per-layer embedding cached to fast storage, and without vision or audio support.
Gemini's free tier will absolutely use your inputs for training [1], same with Mistral's free tier [2]. Anthropic and OpenAI let's you opt into data collection for discounted prices or free tokens.
Yeah, I mean paid API access. You put a credit card in, and it's peanuts at the end of the month. Sorry I didn't specify. Good reminder that with free services you are the product!