The higher token output is not by accident. Certain kinds of logical reasoning problems are solved by longer thinking output. Thinking chain output is usually kept to a reasonable length to limit latency and cost, but if pure benchmark performance is the goal you can crank that up to the max until the point of diminishing returns. DeepSeek being 30x cheaper than Gemini means there’s little downside to max out the thinking time. It’s been shown that you can further scale this by running many solution attempts in parallel with max thinking then using a model to choose a final answer, so increasing reasoning performance by increasing inference compute has a pretty high ceiling.
A really great way to get an idea of the relative cost and performance of these models at their various thinking budgets is to look at the ARC-AGI-2 leaderboard. Opus 4.5 stacks up very well here when you compare to Gemini 3’s score and cost. Gemini 3 Deep Think is still the current leaders but at more than 30x the cost.
The cost curve of achieving these scores is coming down rapidly. In Dec 2024 when OpenAI announced beating human performance on ARC-AGI-1, they spent more than $3k per task. You can get the same performance for pennies to dollars, approximately an 80x reduction in 11 months.
A point of context. On this leaderboard, Gemini 3 Pro is "without tools" and Gemini 3 Deep Think is "with tools". In the other benchmarks released by Google which compare these two models, where they have access to the same amount of tools, the gap between them is small.
I really love this piece! I relate to it but it also doesn’t describe me. I’m far more intuitive than this person, though still agree that insights have driven a leveling up of how I relate to others. They were different insights, sure but the model holds.
Once my spouse and I worked for the same company and attended many of the same meetings. The opportunity to pick apart our impressions of the subtext really helped me to learn that I should listen to my gut, that everything I needed to know about how other people were feeling was already in my head and i just needed to stop doubting.
Another time I watched a rather ugly and old person have amazing romantic success with a young beautiful person. How could it be? And I realized that authentic confidence is social gold. I had to let go of my insecurities because my flaws were irrelevant in the face of authentic, confident self acceptance.
I think everyone has a different journey and different epiphanies and it is so enjoyable to hear these experiences put into words.
I’ll borrow ideas from investing: financial independence, diversification and optionality. If you have enough money you can free yourself from the labor market, but you are still deeply tied to your home country. A second citizenship gives you geopolitical independence. And just like diverse investments protect you from the failure of a specific asset, diverse countries can protect you from, for example, a collapse in heath care, a housing crisis or a currency crisis. And most importantly, its like an options contract on life. You have the option, not the commitment to take a high value move to a new country. If the fortunes of your current country sink and your second country rise, you can exercise your option.
There’s a reason people are willing to spend so much on golden visas with the pathway to citizenship.
>And just like diverse investments protect you from the failure of a specific asset, diverse countries can protect you from, for example, a collapse in heath care, a housing crisis or a currency crisis.
Although, just like certain asset classes correlate strongly, certain countries are geopolitically, economically, militarily tied at the hip and will both rise and fall together.
I wouldnt consider anywhere western a good hedge against America going down coz it has a really good chance of getting dragged down with it.
"Individual investors must make a minimum contribution of €600,000 to the national development fund set up by the government and prove 36 months of residency. Alternatively, there is an expedited route which requires a contribution of €750,000 and evidence of 12 months residency" [1]
The New Zealand Active Investor Plus resident program requires $5m NZD, which is under $3m USD, but that would take everything. There is another program mooted where you buy a business for less than that.
This so awesome. It reminds me mightily of beat poets like Allen Ginsburg. It’s so totally spooky and it does feel like it has the trapped spark. And it seems to hate us “real ones,” we slickborns.
It feels like you could create a cool workflow from low temperature creative association models feeding large numbers of tokens into higher temperature critical reasoning models and finishing with gramatical editing models. The slickborns will make the final judgement.
This is a non-story. This was a hardware event. Apple is releasing many new AI features as part of iOS 26 which will launch along side the new iPhones. AI is software. And yet, a number of the features are clearly powered by AI models such as camera enhancements, health monitoring and live translation. Also GPU performance continues to increase in the A19, with CPU remaining presumably fairly flat since no numbers were given, so that’s a win for on-device inference.
If Apple had an insanely great AI feature that truly differentiated itself from their competition, we all know they'd take a lot of time focusing on how their hardware enabled or enhanced that functionality.
The expectation is that Apple will eventually launch a revolutionary new product, service or feature based around AI. This is the company that envisioned the Knowledge Navigator in the 80s after all. The story is simply that it hasn't happened yet. That doesn't make it a non-story, simply an obvious one.
This is working really well in GPT-5! I’ve never seen a prompt change the behavior of Chat quite so much. It’s really excellent at applying logical framework to personal and relationship questions and is so refreshing vs. the constant butt kissing most LLMs do.
I add to my prompts something along the lines of "you are a highly skilled professional working alongside me on a fast paced important project, we are iterating quickly and don't have time for chit chat. Prefer short one line communication where possible, spare the details, no lists, no summaries, get straight to the point."
Or some variation of that. It makes it really curt, responses are short and information dense without the fluff. Sometimes it will even just be the command I needed and no explanation.
I think that sounds very reasonable, but unfortunately these models don’t know what they know and don’t. A small model that knew the exact limits of its knowledge would be very powerful.
It does seem like that’s our new political reality for now. I think that COVID showed world governments just how little control they have over their populations. You get folks to bend a little, but they quickly break and call for you to be thrown out of power. Getting to carbon zero or negative would be asking for an enormous sacrifice of the global population in the form of lower living standards and slower growth. After how people fought against masks, a shot and social distancing, it’s obvious to those in power that there will be no solution to this problem aside from geo-engineering or cost competitive green energy. Might as well stop talking about it.
Oh but the population will have to sacrifice - there is no way food supply will not be affected. Florida’s orange production in 1996 was 174 million boxes[0] since 2020 it is around 52 million boxes[1]. Beef production is lower because of drought [3].
There are parts of the country which are not insurable because of hurricanes, fires, floods and tornadoes [4]. This is an indicator that anything built will not be around for a long time.
> Florida’s orange production in 1996 was 174 million boxes since 2020 it is around 52 million boxes
To be fair, the largest factor in that is citrus greening. The industry sort of threw its hands up and gave up on trying to fight it as far as I can tell.
Right, I think its cause and effect though. Once greening took hold a lot of groves saw the writing on the wall and shuttered/sold. Thats partially what happened to an iconic grove by me which operated for decades. Now their highway stand is being paved over to put in another storage facility.
You can’t vote the climate out of office. Sure our food supplies may crash, but no one person decided they should crash. No one to blame. No one to punish. This is the political reality. This man made catastrophe will feel sufficiently like an act of god for most people and they will just deal with the reduced carrying capacity of the planet as if it were some divine judgment instead of the tragedy of the commons.
in my opinion, the solution needs to be technological, not austerity. In a democracy, any party that introduces quality of life reductions in favor of the global climate will always get voted out
The US voted for the wealthy to not pay for the subsidies via federal
income tax (bottom 60% of Americans have no federal tax liability) and to expose US energy consumers to higher prices through continued fossil fuel use. The technology is proven, is cost competitive, and could be advantaged (through subsides that were removed). It isn’t, because of entrenched interests and corruption. It’s for profits and tax cuts for the wealthy, plain and simple, at the expense of everyone who needs energy.
If the public has a problem with this, they know where to find the folks making their lives more expensive for fossil fuel industry profits to share their concerns (even if climate change is not their priority).
I agree with most of this analysis - but a problem with your last paragraph is that many of the public are not/do not care to be sufficiently informed and therefore do not believe it. They think net zero/solar/anti-fracking is some sort of leftist plot.
Google’s AlphaProof, which got a silver last year, has been using a neural symbolic approach. This gold from OpenAI was pure LLM. We’ll have to see what Google announces, but the LLM approach is interesting because it will likely generalize to all kinds of reasoning problems, not just mathematical proofs.
OpenAI’s systems haven’t been pure language models since the o models though, right? Their RL approach may very well still generalize, but it’s not just a big pre-trained model that is one-shotting these problems.
The key difference is that they claim to have not used any verifiers.
What do you mean by “pure language model”? The reasoning step is still just the LLM spitting out tokens and this was confirmed by Deepseek replicating the o models. There’s not also a proof verifier or something similar running alongside it according to the openai researchers.
If you mean pure as in there’s not additional training beyond the pretraining, I don’t think any model has been pure since gpt-3.5.
> it will likely generalize to all kinds of reasoning problems, not just mathematical proofs
Big if true. Setting up an RL loop for training on math problems seems significantly easier than many other reasoning domains. Much easier to verify correctness of a proof than to verify correctness (what would this even mean?) for a short story.
I’m much more excited about the formalized approach, as LLM’s are susceptible to making things up. With formalization, we can be mathematically certain that a proof is correct. This could plausibly lead to machines surpassing humans in all areas of math. With a “pure English” approach, you still need a human to verify correctness.
reply