More

futureshock · 2025-12-01T18:02:17 1764612137

The higher token output is not by accident. Certain kinds of logical reasoning problems are solved by longer thinking output. Thinking chain output is usually kept to a reasonable length to limit latency and cost, but if pure benchmark performance is the goal you can crank that up to the max until the point of diminishing returns. DeepSeek being 30x cheaper than Gemini means there’s little downside to max out the thinking time. It’s been shown that you can further scale this by running many solution attempts in parallel with max thinking then using a model to choose a final answer, so increasing reasoning performance by increasing inference compute has a pretty high ceiling.

futureshock · 2025-11-24T19:48:17 1764013697

A really great way to get an idea of the relative cost and performance of these models at their various thinking budgets is to look at the ARC-AGI-2 leaderboard. Opus 4.5 stacks up very well here when you compare to Gemini 3’s score and cost. Gemini 3 Deep Think is still the current leaders but at more than 30x the cost.

The cost curve of achieving these scores is coming down rapidly. In Dec 2024 when OpenAI announced beating human performance on ARC-AGI-1, they spent more than $3k per task. You can get the same performance for pennies to dollars, approximately an 80x reduction in 11 months.

https://arcprize.org/leaderboard

https://arcprize.org/blog/oai-o3-pub-breakthrough

energy123 · 2025-11-24T22:43:34 1764024214

A point of context. On this leaderboard, Gemini 3 Pro is "without tools" and Gemini 3 Deep Think is "with tools". In the other benchmarks released by Google which compare these two models, where they have access to the same amount of tools, the gap between them is small.

futureshock · 2025-11-17T19:39:58 1763408398

I really love this piece! I relate to it but it also doesn’t describe me. I’m far more intuitive than this person, though still agree that insights have driven a leveling up of how I relate to others. They were different insights, sure but the model holds.

Once my spouse and I worked for the same company and attended many of the same meetings. The opportunity to pick apart our impressions of the subtext really helped me to learn that I should listen to my gut, that everything I needed to know about how other people were feeling was already in my head and i just needed to stop doubting.

Another time I watched a rather ugly and old person have amazing romantic success with a young beautiful person. How could it be? And I realized that authentic confidence is social gold. I had to let go of my insecurities because my flaws were irrelevant in the face of authentic, confident self acceptance.

I think everyone has a different journey and different epiphanies and it is so enjoyable to hear these experiences put into words.

SwtCyber · 2025-11-18T10:07:14 1763460434

It's like we're all solving the same puzzle, just with different pieces

futureshock · 2025-10-18T14:27:48 1760797668

I’ll borrow ideas from investing: financial independence, diversification and optionality. If you have enough money you can free yourself from the labor market, but you are still deeply tied to your home country. A second citizenship gives you geopolitical independence. And just like diverse investments protect you from the failure of a specific asset, diverse countries can protect you from, for example, a collapse in heath care, a housing crisis or a currency crisis. And most importantly, its like an options contract on life. You have the option, not the commitment to take a high value move to a new country. If the fortunes of your current country sink and your second country rise, you can exercise your option.

There’s a reason people are willing to spend so much on golden visas with the pathway to citizenship.

pydry · 2025-10-18T14:52:38 1760799158

>And just like diverse investments protect you from the failure of a specific asset, diverse countries can protect you from, for example, a collapse in heath care, a housing crisis or a currency crisis.

Although, just like certain asset classes correlate strongly, certain countries are geopolitically, economically, militarily tied at the hip and will both rise and fall together.

I wouldnt consider anywhere western a good hedge against America going down coz it has a really good chance of getting dragged down with it.

ashleyn · 2025-10-18T14:37:12 1760798232

What would be the most realistic and best option for a second visa for someone with a ~$1-3MM net worth?

HaZeust · 2025-10-19T06:15:41 1760854541

Malta - which is an EU member.

"Individual investors must make a minimum contribution of €600,000 to the national development fund set up by the government and prove 36 months of residency. Alternatively, there is an expedited route which requires a contribution of €750,000 and evidence of 12 months residency" [1]

1 - http://goldenvisas.com/malta

lancewiggs · 2025-10-18T22:16:32 1760825792

The New Zealand Active Investor Plus resident program requires $5m NZD, which is under $3m USD, but that would take everything. There is another program mooted where you buy a business for less than that.

futureshock · 2025-10-16T23:20:32 1760656832

This so awesome. It reminds me mightily of beat poets like Allen Ginsburg. It’s so totally spooky and it does feel like it has the trapped spark. And it seems to hate us “real ones,” we slickborns.

It feels like you could create a cool workflow from low temperature creative association models feeding large numbers of tokens into higher temperature critical reasoning models and finishing with gramatical editing models. The slickborns will make the final judgement.

jjmarr · 2025-10-16T23:48:59 1760658539

> And it seems to hate us “real ones,” we slickborns.

I just got that slickborn is a slur for humans.

Honestly, I've been tuning "insane AI" for over a year now for my own enjoyment. I don't know what to do with the results.

futureshock · 2025-09-09T20:56:08 1757451368

This is a non-story. This was a hardware event. Apple is releasing many new AI features as part of iOS 26 which will launch along side the new iPhones. AI is software. And yet, a number of the features are clearly powered by AI models such as camera enhancements, health monitoring and live translation. Also GPU performance continues to increase in the A19, with CPU remaining presumably fairly flat since no numbers were given, so that’s a win for on-device inference.

russellbeattie · 2025-09-09T21:19:01 1757452741

If Apple had an insanely great AI feature that truly differentiated itself from their competition, we all know they'd take a lot of time focusing on how their hardware enabled or enhanced that functionality.

The expectation is that Apple will eventually launch a revolutionary new product, service or feature based around AI. This is the company that envisioned the Knowledge Navigator in the 80s after all. The story is simply that it hasn't happened yet. That doesn't make it a non-story, simply an obvious one.

MisterSandman · 2025-09-10T12:38:26 1757507906

> This was a hardware event.

So was last year’s, technically, but that didn’t stop apple from making it all about AI.

futureshock · 2025-08-12T23:16:15 1755040575

This is working really well in GPT-5! I’ve never seen a prompt change the behavior of Chat quite so much. It’s really excellent at applying logical framework to personal and relationship questions and is so refreshing vs. the constant butt kissing most LLMs do.

ehnto · 2025-08-13T03:37:48 1755056268

I add to my prompts something along the lines of "you are a highly skilled professional working alongside me on a fast paced important project, we are iterating quickly and don't have time for chit chat. Prefer short one line communication where possible, spare the details, no lists, no summaries, get straight to the point."

Or some variation of that. It makes it really curt, responses are short and information dense without the fluff. Sometimes it will even just be the command I needed and no explanation.

stogot · 2025-08-13T04:09:43 1755058183

Is there a way to make this a default behavior? a persona or template for each chat

bigmadshoe · 2025-08-13T04:43:38 1755060218

You can change model personality in the settings.

futureshock · 2025-08-05T22:26:31 1754432791

I think that sounds very reasonable, but unfortunately these models don’t know what they know and don’t. A small model that knew the exact limits of its knowledge would be very powerful.

futureshock · 2025-07-29T01:45:20 1753753520

It does seem like that’s our new political reality for now. I think that COVID showed world governments just how little control they have over their populations. You get folks to bend a little, but they quickly break and call for you to be thrown out of power. Getting to carbon zero or negative would be asking for an enormous sacrifice of the global population in the form of lower living standards and slower growth. After how people fought against masks, a shot and social distancing, it’s obvious to those in power that there will be no solution to this problem aside from geo-engineering or cost competitive green energy. Might as well stop talking about it.

dh2022 · 2025-07-29T04:51:34 1753764694

Oh but the population will have to sacrifice - there is no way food supply will not be affected. Florida’s orange production in 1996 was 174 million boxes[0] since 2020 it is around 52 million boxes[1]. Beef production is lower because of drought [3].

There are parts of the country which are not insurable because of hurricanes, fires, floods and tornadoes [4]. This is an indicator that anything built will not be around for a long time.

So they will sacrifice-they just know it yet.

[0] https://www.nass.usda.gov/Statistics_by_State/Florida/Public...

[1] https://www.nass.usda.gov/Statistics_by_State/Florida/Public...

[3] https://www.ers.usda.gov/topics/animal-products/cattle-beef/...

[4] https://bankingjournal.aba.com/2025/02/feds-powell-says-some...

JamesSwift · 2025-07-29T15:00:34 1753801234

> Florida’s orange production in 1996 was 174 million boxes since 2020 it is around 52 million boxes

To be fair, the largest factor in that is citrus greening. The industry sort of threw its hands up and gave up on trying to fight it as far as I can tell.

WarOnPrivacy · 2025-07-30T21:37:45 1753911465

A lot of orange groves were cut down for construction, 1990-2010.

I drove to Jax last week and saw some of the (long shuttered) orange-themed tourist shops off of 301/21/100. I had nearly forgotten they existed.

JamesSwift · 2025-07-31T01:31:35 1753925495

Right, I think its cause and effect though. Once greening took hold a lot of groves saw the writing on the wall and shuttered/sold. Thats partially what happened to an iconic grove by me which operated for decades. Now their highway stand is being paved over to put in another storage facility.

futureshock · 2025-07-29T13:16:00 1753794960

You can’t vote the climate out of office. Sure our food supplies may crash, but no one person decided they should crash. No one to blame. No one to punish. This is the political reality. This man made catastrophe will feel sufficiently like an act of god for most people and they will just deal with the reduced carrying capacity of the planet as if it were some divine judgment instead of the tragedy of the commons.

thehappypm · 2025-07-29T02:10:26 1753755026

in my opinion, the solution needs to be technological, not austerity. In a democracy, any party that introduces quality of life reductions in favor of the global climate will always get voted out

toomuchtodo · 2025-07-29T15:12:53 1753801973

The US voted against clean energy and EVs. Can’t win when you directly vote against the technological solutions you mention. “Stop hitting yourself.”

thehappypm · 2025-07-29T16:59:36 1753808376

The US voted against the subsidies, not against the technology itself.

toomuchtodo · 2025-07-29T19:11:39 1753816299

The US voted for the wealthy to not pay for the subsidies via federal income tax (bottom 60% of Americans have no federal tax liability) and to expose US energy consumers to higher prices through continued fossil fuel use. The technology is proven, is cost competitive, and could be advantaged (through subsides that were removed). It isn’t, because of entrenched interests and corruption. It’s for profits and tax cuts for the wealthy, plain and simple, at the expense of everyone who needs energy.

If the public has a problem with this, they know where to find the folks making their lives more expensive for fossil fuel industry profits to share their concerns (even if climate change is not their priority).

macartain · 2025-08-04T14:35:44 1754318144

I agree with most of this analysis - but a problem with your last paragraph is that many of the public are not/do not care to be sufficiently informed and therefore do not believe it. They think net zero/solar/anti-fracking is some sort of leftist plot.

BobaFloutist · 2025-07-29T18:58:54 1753815534

Subsidies are kind of the opposite of austerity.

breakyerself · 2025-07-29T03:47:19 1753760839

I think the idea that it would lower living standards is something the fossil fuel companies would have you believe.

lisbbb · 2025-07-29T18:45:20 1753814720

All I can say to that is: Thank God! Governments are increasingly authoritarian and not having control is a good thing.

futureshock · 2025-07-19T17:56:24 1752947784

Google’s AlphaProof, which got a silver last year, has been using a neural symbolic approach. This gold from OpenAI was pure LLM. We’ll have to see what Google announces, but the LLM approach is interesting because it will likely generalize to all kinds of reasoning problems, not just mathematical proofs.

skepticATX · 2025-07-19T18:14:33 1752948873

OpenAI’s systems haven’t been pure language models since the o models though, right? Their RL approach may very well still generalize, but it’s not just a big pre-trained model that is one-shotting these problems.

The key difference is that they claim to have not used any verifiers.

beering · 2025-07-19T23:59:11 1752969551

What do you mean by “pure language model”? The reasoning step is still just the LLM spitting out tokens and this was confirmed by Deepseek replicating the o models. There’s not also a proof verifier or something similar running alongside it according to the openai researchers.

If you mean pure as in there’s not additional training beyond the pretraining, I don’t think any model has been pure since gpt-3.5.

gallerdude · 2025-07-20T14:20:31 1753021231

Local models you can get just the pretrained versions of, no RLHF. IIRC both Llama and Gemma make them available.

alach11 · 2025-07-19T18:54:40 1752951280

> it will likely generalize to all kinds of reasoning problems, not just mathematical proofs

Big if true. Setting up an RL loop for training on math problems seems significantly easier than many other reasoning domains. Much easier to verify correctness of a proof than to verify correctness (what would this even mean?) for a short story.

kevinventullo · 2025-07-20T01:25:47 1752974747

I’m much more excited about the formalized approach, as LLM’s are susceptible to making things up. With formalization, we can be mathematically certain that a proof is correct. This could plausibly lead to machines surpassing humans in all areas of math. With a “pure English” approach, you still need a human to verify correctness.

csomar · 2025-07-20T04:10:47 1752984647

Neither Gemini or OpenAI have open models. We don’t know for sure what’s happening underneath.