More

granzymes · 2026-02-05T18:17:26 1770315446

It only came out 35 minutes ago and GPT-5.3-codex already took the crown away!

input_sh · 2026-02-05T18:21:27 1770315687

Gee, it scored better on a benchmark I've never heard of? I'm switching immediately!

p1anecrazy · 2026-02-05T18:21:35 1770315695

Why are you posting the same message in every thread? Is this OpenAI astroturfing?

input_sh · 2026-02-05T18:25:37 1770315937

You cannot out-astroturf Claude in this forum, it is impossible.

Anyways, do you get shitty results with the $20/month plan? So did I but then I switched to the $200/month plan and all my problems went away! AI is great now, I have instructed it to fire 5 people while I'm writing this!

granzymes · 2026-02-05T18:12:39 1770315159

I think Anthropic rushed out the release before 10am this morning to avoid having to put in comparisons to GPT-5.3-codex!

The new Opus 4.6 scores 65.4 on Terminal-Bench 2.0, up from 64.7 from GPT-5.2-codex.

GPT-5.3-codex scores 77.3.

the_duke · 2026-02-05T18:22:54 1770315774

I do not trust the AI benchmarks much, they often do not line up with my experience.

That said ... I do think Codex 5.2 was the best coding model for more complex tasks, albeit quite slow.

So very much looking forward to trying out 5.3.

NitpickLawyer · 2026-02-05T18:30:40 1770316240

Just some anecdata++ here but I found 5.2 to be really good at code review. So I can have something crunched by cheaper models, reviewed async by codex and then re-prompt with the findings from the review. It finds good things, doesn't flag nits (if prompted not to) and the overall flow is worth it for me. Speed loss doesn't impact this flow that much.

kilroy123 · 2026-02-05T19:02:17 1770318137

Personally, I have Claude do the coding. Then 5.2-high do the reviewing.

mmaunder · 2026-02-05T23:47:35 1770335255

I might flip that given how hard it's been for Claude to deal with longer context tasks like a coding session with iterations vs a single top down diff review.

seunosewa · 2026-02-05T19:28:50 1770319730

Then I pass the review back to Claude Opus to implement it.

VladVladikoff · 2026-02-05T19:42:17 1770320537

Just curious is this a manual process or you guys have automated these steps?

ricketycricket · 2026-02-05T20:36:03 1770323763

I have a `codex-review` skill with a shell script that uses the Codex CLI with a prompt. It tells Claude to use Codex as a review partner and to push back if it disagrees. They will go through 3 or 4 back-and-forth iterations some times before they find consensus. It's not perfect, but it does help because Claude will point out the things Codex found and give it credit.

bryanlarsen · 2026-02-05T22:01:08 1770328868

Mind sharing the skill/prompt?

dror · 2026-02-06T00:33:41 1770338021

Not the OP, but I use the same approach.

https://gist.github.com/drorm/7851e6ee84a263c8bad743b037fb7a...

I typically use github issues as the unit of work, so that's part of my instruction.

_zoltan_ · 2026-02-05T21:26:12 1770326772

zen-mcp (now called pal-mcp I think) and then claude code can actually just pass things to gemini (or any other model)

kilroy123 · 2026-02-05T22:21:00 1770330060

Sometimes, depends on how big of a task. I just find 5.2 so slow.

_zoltan_ · 2026-02-05T21:25:45 1770326745

I have Opus 4.5 do everything then review it with Gemini 3.

StephenHerlihyy · 2026-02-05T19:47:08 1770320828

I don’t use OpenAI too much, but I follow a similar work flow. Use Opus for design/architecture work. Move it to Sonnet for implementation and build out. Then finally over to Gemini for review, QC and standards check. There is an absolute gain in using different models. Each has their own style and way of solving the problem just like a human team. It’s kind of awesome and crazy and a bit scary all at once.

readyforbrunch · 2026-02-05T21:34:55 1770327295

How do you orchestrate this workflow? Do you define different skills that all use different models, or something else?

nitroedge · 2026-02-06T06:56:23 1770360983

You should check out the PAL MCP and then also use this process, its super solid: https://github.com/glittercowboy/get-shit-done

The way "Phases" are handled is incredible with research then planning, then execution and no context rot because behind the scenes everything is being saved in a State.md file...

I'm on Phase 41 of my own project and the reliability and almost absence of any error is amazing. Investigate and see if its a fit for you. The PAL MCP you can setup to have Gemini with its large context review what Claude codes.

aurareturn · 2026-02-05T18:42:31 1770316951

5.2 Codex became my default coding model. It “feels” smarter than Opus 4.5.

I use 5.2 Codex for the entire task, then ask Opus 4.5 at the end to double check the work. It's nice to have another frontier model's opinion and ask it to spot any potential issues.

Looking forward to trying 5.3.

koakuma-chan · 2026-02-05T18:44:24 1770317064

Opus 4.5 is more creative and better at making UIs

hypercube33 · 2026-02-06T09:33:29 1770370409

Unless it's scroll bar theming then my God it's bad. it told me it gives up. Gemini 3 got stuck but the right prompt it did work.

fooker · 2026-02-05T18:39:04 1770316744

Yeah, these benchmarks are bogus.

Every new model overfits to the latest overhyped benchmark.

Someone should take this to a logical extreme and train a tiny model that scores better on a specific benchmark.

bunderbunder · 2026-02-05T20:36:58 1770323818

All shared machine learning benchmarks are a little bit bogus, for a really “machine learning 101” reason: your test set only yields an unbiased performance metric if you agree to only use it once. But that just isn’t a realistic way to use a shared benchmark. Using them repeatedly is kind of the whole point.

But even an imperfect yardstick is better than no yardstick at all. You’ve just got to remember to maintain a healthy level of skepticism is all.

abustamam · 2026-02-05T21:19:33 1770326373

Is an imperfect yardstick better than no yardstick? It reminds me of documentation — the only thing worse than no documentation is wrong documentation.

bunderbunder · 2026-02-06T01:21:45 1770340905

Yes, because there’s value in a common reference for comparison. It helps to shed light on different models’ relative strengths and weaknesses. And, just like with performance benchmarks, you can learn to spot and read past the ways that people game their results. The danger is really more in when people who are less versed in the subject matter take what are ultimately just a semi tamed genre of sales pitch at face value.

When such benchmarks aren’t available what you often get instead is teams creating their own benchmark datasets and then testing both their and existing models’ performance against it. Which is eve worse because they probably still the rest multiple times (there’s simply no way to hold others accountable on this front), but on top of that they often hyperparameter tune their own model for the dataset but reuse previously published hyperparameters for the other models. Which gives them an unfair advantage because those hyperparameters were tuned to a doffeeent dataset and may not have even been optimizing for the same task.

abustamam · 2026-02-06T03:45:24 1770349524

Thanks, that makes sense!

mrandish · 2026-02-05T19:34:54 1770320094

> Yeah, these benchmarks are bogus.

It's not just over-fitting to leading benchmarks, there's also too many degrees of freedom in how a model is tested (harness, etc). Until there's standardized documentation enabling independent replication, it's all just benchmarketing .

fooker · 2026-02-05T19:48:05 1770320885

For the current state of AI, the harness is unfortunately part of the secret sauce.

ndriscoll · 2026-02-06T15:07:01 1770390421

In what sense? Codex CLI is FOSS and works fine with other models as a backend, including those served by llama.cpp.

scoring1774 · 2026-02-05T20:36:20 1770323780

This has been done: https://arxiv.org/abs/2510.04871v1

mmaunder · 2026-02-05T23:46:56 1770335216

ARG-AGI-2 leaderboard has a strong correlation with my Rust/CUDA coding experience with the models.

int_19h · 2026-02-06T09:44:18 1770371058

Codex 5.3 seems to be a lot chattier. As in, it comments in the chat about things it has done or is about to do. They don't show up as "thinking" CoT blocks, but as regular outputs, but overall the experience is somewhat more like Claude is in that you can spot the problems in model's reasoning much earlier if you keep an eye on it as it works, and steer it away.

jahsome · 2026-02-05T18:30:43 1770316243

Another day, another hn thread of "this model changes everything" followed immediately by a reply stating "actually I have the literal opposite experience and find competitor's model is the best" repeated until it's time to start the next day's thread.

StephenHerlihyy · 2026-02-05T19:50:36 1770321036

What amazes me the most is the speed at which things are advancing. Go back a year or even a year before that and all these incremental improvements have compounded. Things that used to require real effort to consistently solve, either with RAGs, context/prompt engineering, have become… trivial. I totally agree with your point that each step along the way doesn’t necessarily change that much. But in the aggregate it’s sort of insane how fast everything is moving.

Rudybega · 2026-02-05T22:05:28 1770329128

The denial of this overall trend on here and in other internet spaces is starting to really bother me. People need to have sober conversations about the speed of this increase and what kind of effects it's going to have on the world.

theLiminator · 2026-02-06T06:21:02 1770358862

Yeah, I really didn't believe in agentic coding until December, that was where it took off from being slightly more useful than hand crafting code to becoming extremely powerful.

girvo · 2026-02-06T12:18:02 1770380282

The effects when extrapolated out aren’t good, IMO. Certainly bad for me, a mid 30s software engineer who’s been doing this for nearly 20 years…

SatvikBeri · 2026-02-05T19:59:59 1770321599

I use Claude Code every day, and I'm not certain I could tell the difference between Opus 4.5 and Opus 4.0 if you gave me a blind test

malshe · 2026-02-05T18:37:25 1770316645

This pretty accurately summarizes all the long discussions about AI models on HN.

clhodapp · 2026-02-05T19:31:19 1770319879

And of course the benchmarks are from the school of "It's better to have a bad metric than no metric", so there really isn't any way to falsify anyone's opinions...

cactusplant7374 · 2026-02-05T19:32:15 1770319935

Hourly occurrence on /r/codex. Model astrology is about the vibes.

wasmainiac · 2026-02-05T18:38:37 1770316717

[flagged]

nocman · 2026-02-05T19:07:51 1770318471

> Who are making these claims? script kiddies? sr devs? Altman?

AI agents, perhaps? :-D

locknitpicker · 2026-02-05T19:03:57 1770318237

> All anonymous as well. Who are making these claims? script kiddies? sr devs? Altman?

You can take off your tinfoil hat. The same models can perform differently depending on the programming language, frameworks and libraries employed, and even project. Also, context does matter, and a model's output greatly varies depending on your prompt history.

andrepd · 2026-02-05T20:05:47 1770321947

It's hardly tinfoil to understand that companies riding a multi-trillion dollar funding wave would spend a few pennies astroturfing their shit on hn. Or overfit to benchmarks that people take as objective measurements.

BoredPositron · 2026-02-05T18:55:23 1770317723

When you keep his ramblings on twitter or company blog in mind I bet he is a shit poster here.

nerdsniper · 2026-02-05T19:40:06 1770320406

Opus 4.5 still worked better for most of my work, which is generally "weird stuff". A lot of my programming involves concepts that are a bit brain-melting for LLMs, because multiple "99% of the time, assumption X is correct" are reversed for my project. I think Opus does better at not falling into those traps. Excited to try out 5.3

nubg · 2026-02-05T20:24:28 1770323068

what do you do?

audience_mem · 2026-02-05T23:51:47 1770335507

He works on brain-melting stuff, the understanding of which is far beyond us.

nerdsniper · 2026-02-06T09:10:12 1770369012

It's relatively easy for people to grok, if a bit niche. Just sometimes confuses LLMs. Humans are much better at holding space for rare exceptions to usual rules than LLMs are.

leumon · 2026-02-05T19:19:45 1770319185

they tested it at xhigh reasoning though, which is probably double the cost of Anthropic's model.

Cost to Run Artificial Analysis Intelligence Index:

GPT-5.2 Codex (xhigh): $3244

Claude Opus 4.5-reasoning: $1485

(and probably similar values for the newer models?)

redox99 · 2026-02-05T19:33:23 1770320003

With $20 gpt plan you can use xhigh no problem. With $20 Claude plan you reach the 5h limit with a single feature.

mattkevan · 2026-02-05T20:14:13 1770322453

Ha, Claude Code on a pro plan often can't complete a single message before hitting the 5h limit. Not hit it once so far on Codex.

naths88 · 2026-02-05T20:51:40 1770324700

This, so frustrating. But CC is so much faster too.

Computer0 · 2026-02-05T20:07:02 1770322022

A provider's API costs seemingly do not reflect each respective SOTA provider's subscription usage allowances.

wilg · 2026-02-05T19:31:18 1770319878

In my personal experience the GPT models have always been significantly better than the Claude models for agentic coding, I’m baffled why people think Claude has the edge on programming.

dudeinhawaii · 2026-02-05T19:57:08 1770321428

I think for many/most programmers = 'speed + output' and webdev == "great coding".

Not throwing shade anyone's way. I actually do prefer Claude for webdev (even if it does cringe things like generate custom CSS on every page) -- because I hate webdev and Claude designs are always better looking.

But the meat of my code is backend and "hard" and for that Codex is always better, not even a competition. In that domain, I want accuracy and not speed.

Solution, use both as needed!

falloutx · 2026-02-05T20:53:04 1770324784

> I actually do prefer Claude for webdev

Ah and let me guess all your frontends look like cookie cutter versions of this: https://openclaw.dog/

Yiin · 2026-02-05T22:08:01 1770329281

Yes and I love it.

whynotminot · 2026-02-05T20:23:00 1770322980

> Solution, use both as needed!

This is the way. People are unfortunately starting to divide themselves into camps on this — it’s human nature we’re tribal - but we should try to avoid turning this into a Yankees Redsox.

Both companies are producing incredible models and I’m glad they have strengths because if you use them both where appropriate it means you have more coverage for important work.

flir · 2026-02-06T05:07:09 1770354429

That's the best theory I've heard. Or at least, it's the one that fits with my usage. I'm mostly-backend, and I'm mostly-GPT.

(I'm also a "small steps under guidance" user rather than a "fire and forget" user, so maybe that plays into it too).

theLiminator · 2026-02-06T06:24:17 1770359057

Actually for me the killer feature isn't Claude, but is the planning mode.

It's a very nice UX for iteratively creating a spec that I can refine.

Dma54rhs · 2026-02-06T11:36:55 1770377815

Codex has it as well now.

soulofmischief · 2026-02-05T20:47:08 1770324428

GPT 5.2 codex plans well but fucks off a lot, goes in circles (more than opus 4.5) and really just lacks the breadth of integrated knowledge that makes opus feel so powerful.

Opus is the first model I can trust to just do things, and do them right, at least small things. For larger/more complex things I have to keep either model on extremely short leashes. But the difference is enough that I canceled my GPT Pro sub so I could switch to Claude. Maybe 5.3 will change things, but I also cannot continue to ethically support Sam Altman's business.

int_19h · 2026-02-06T09:51:38 1770371498

I'd say that GPT 5.2 did slightly better on the stuff that I'm working on currently compared to Opus 4.5, but it's rather niche - a fancy Lojban parser in Haskell). However Opus is much easier to steer interactively because you can see what it's doing in more detail (although 5.3 is much improved in that regard!). I wouldn't feel empty-handed with either model, and both wrote large chunks of code for this project.

All that said, the single biggest reason why I use Codex a lot more is because the $200 plan for it is so much more generous. With Claude, I very quickly burn through the quota and then have to wait for several days or else buy more credit. With Codex, running in High reasoning mode as standard with occasional use of XHigh to write specs or debug gnarly issues, and having agents run almost around the clock in the background, I have hit the limit exactly once so far.

wilg · 2026-02-05T22:35:56 1770330956

I always use 5.2-Codex-High or 5.2-Codex-Extra High (in Cursor). The regular version is probably too dumb.

soulofmischief · 2026-02-05T23:20:25 1770333625

Didn't make a difference for me. Though I will say, so far 4.6 is really pissing me off and I might downgrade back to 4.5. It just refuses to listen to what I say, the steering is awful.

fragmede · 2026-02-05T22:05:36 1770329136

How many people are building the same thing multiple times to compare model performance? I'm much more interested in getting the thing I'm building getting built, than than comparing AIs to each other.

__jl__ · 2026-02-05T18:16:28 1770315388

Impressive jump for GPT-5.3-codex and crazy to see two top coding models come out on the same day...

granzymes · 2026-02-05T18:20:41 1770315641

Insane! I think this has to be the shortest-lived SOTA for any model so far. Competition is amazing.

nurettin · 2026-02-05T18:38:45 1770316725

Opus was quite useless today. Created lots of globals, statics, forward declarations, hidden implementations in cpp files with no testable interface, erasing types, casting void pointers, I had to fix quite a lot and decouple the entangled mess.

Hopefully performance will pick up after the rollout.

nickstinemates · 2026-02-05T20:17:38 1770322658

Did you give it any architecture guidance? An architecture skill that it can load to make sure it lays out things according to your taste?

nurettin · 2026-02-05T22:45:15 1770331515

Yes, it has a very tight CLAUDE.md which it used to follow. Feels like this happens a couple of times a month.

jronak · 2026-02-05T19:34:03 1770320043

Did you look at the ARC AGI 2? Codex might be overfit for terminal bench

tedsanders · 2026-02-05T19:41:06 1770320466

ARC AGI 2 has a training set that model providers can choose to train on, so really wouldn't recommend using it as a general measure of coding ability.

mrandish · 2026-02-05T20:58:49 1770325129

A key aspect of ARC AGI is to remain highly resistant to training on test problems which is essential for ARC AGI's purpose of evaluating fluid intelligence and adaptability in solving novel problems. They do release public test sets but hold back private sets. The whole idea is being a test where training on public test sets doesn't materially help.

The only valid ARC AGI results are from tests done by the ARC AGI non-profit using an unreleased private set. I believe lab-conducted ARC AGI tests must be on public sets and taken on a 'scout's honor' basis that the lab self-administered the test correctly, didn't cheat or accidentally have public ARC AGI test data slip into their training data. IIRC, some time ago there was an issue when OpenAI published ARC AGI 1 test results on a new model's release which the ARC AGI non-profit was unable to replicate on a private set some weeks later (to be fair, I don't know if these issues were resolved). Edit to Add: Summary of what happened: https://grok.com/share/c2hhcmQtMw_66c34055-740f-43a3-a63c-4b...

I have no expertise to verify how training-resistant ARC AGI is in practice but I've read a couple of their papers and was impressed by how deeply they're thinking through these challenges. They're clearly trying to be a unique test which evaluates aspects of 'human-like' intelligence other tests don't. It's also not a specific coding test and I don't know how directly ARC AGI scores map to coding ability.

versteegen · 2026-02-08T10:30:21 1770546621

> The only valid ARC AGI results are from tests done by the ARC AGI non-profit using an unreleased private set. I believe lab-conducted ARC AGI tests must be on public sets and taken on a 'scout's honor' basis that the lab self-administered the test correctly

Not very accurate. For each of ARC-AGI-1 and ARC-AGI-2 there is training set and three eval sets: public, semi-private, and private. The ARC foundation runs frontier LLMs on the semi-private set, and the labs give them pre-release API access so they can report release-day evals. They mostly don't allow anyone else to access the semi-private set (except for live Kaggle leaderboards which use it), so you see independent researchers report on the public eval set instead, often very dubious. The private is for Kaggle competitions only, no frontier LLMs evals are possible.

(ARC-AGI-1 results are now largely useless because most of its eval tasks became the ARC-2 training set. However some labs have said they don't train LLMs on the training sets anyway.)

janalsncm · 2026-02-05T20:48:35 1770324515

More fundamentally, ARC is for abstract reasoning. Moving blocks around on a grid. While in theory there is some overlap with SWE tasks, what I really care about is competence on the specific task I will ask it to do. That requires a lot of domain knowledge.

As an analogy, Terence Tao may be one of the smartest people alive now, but IQ alone isn’t enough to do a job with no domain-specific training.

granzymes · 2025-12-11T18:59:17 1765479557

> It'll be noteworthy to see the cost-per-task on ARC AGI v2.

Already live. gpt-5.2-pro scores a new high of 54.2% with a cost/task of $15.72. The previous best was Gemini 3 Pro (54% with a cost/task of $30.57).

The best bang-for-your-buck is the new xhigh on gpt-5.2, which is 52.9% for $1.90, a big improvement on the previous best in this category which was Opus 4.5 (37.6% for $2.40).

https://arcprize.org/leaderboard

minimaxir · 2025-12-11T19:04:56 1765479896

Huh, that is indeed up and to left of Opus.

granzymes · 2025-12-02T01:29:37 1764638977

I’ve gone through this process before and while it was more work it did not take 30 minutes.

I presented a student ID and was escorted through the security line. My baggage was selected for additional screening and I received a pat down search.

I went through an identical procedure on the return flight, right down to the exact words the TSA agent spoke to me while conducting the pat down.

tuxracer · 2025-12-02T01:33:39 1764639219

I've also gone through this process, it did take about 30 minutes in my case. That also included waiting for a TSA agent to be available to even start the process. So YMMV, perhaps based on how busy the airport is at the time.

They had me answer a series of questions about past addresses etc, it wasn't just an extra pat down in my case. After answering all the questions correctly they allowed me to continue.

granzymes · 2025-11-30T15:51:52 1764517912

Bezos’ mom had him at 17, his biological father owned a bike shop, and his mother remarried when Bezos was 4 to a Cuban immigrant who came to the country at 16 and ended up working as a petroleum engineer.

They wound up middle class after all that, but I certainly wouldn’t say Bezos came from a “wealthy family”.

mattm · 2025-11-30T17:33:45 1764524025

Bezos' parents lent him $250k to start Amazon. The point is that by the time Bezos started Amazon they were wealthy and could provide him this safety net. Not many middle class families would be able to loan their kid that much money.

granzymes · 2025-11-30T17:38:05 1764524285

That $250k was their life’s savings. They made a huge bet on their son.

His mom took night classes while raising him so she could finish her education while working to support a family.

altmanaltman · 2025-11-30T22:28:06 1764541686

okay but $250k is still $250k right? Most people in the world, for most parts of the world, don't see that kind of money in an entire lifetime of work. Most people think privilege means a trust fund, but a $250k loan of US dollars (life-savings or not) is also a privilege that most people don't have.

lechatonnoir · 2025-12-02T23:34:11 1764718451

i think in this thread the goalposts were slowly moved. people were initially talking about success being predicted by having the excess necessary to comfortably take many shots on goal. it seems like we've granted that this $250k shot was a one-time thing.

it is true but irrelevant to the original topic that this is more money than the global poor ever see, and that this is more money that most people get to have. i don't think anyone was arguing that this represents zero privilege

an0malous · 2025-11-30T19:24:47 1764530687

Do you have a source for that being their life savings?

Most of your points have nothing to do with their wealth. Why would it suggest they’re poor if his mom had him at 17 and was taking night classes while raising him? She wasn’t employed, that just sounds like she herself was still able to take risks beyond her means probably because her father was wealthy.

kelnos · 2025-11-30T20:59:02 1764536342

Do you have a source for that not being their life savings? It sounds like you're just making assumptions and guesses as well; if you're going to assert Bezos came from wealth in the first place, you have to back that up. Perusing the "early life" section of Bezos' Wikipedia page doesn't suggest to me that he came from money, at least. But I don't see anyone on either side of the argument presenting anything beyond that.

SJC_Hacker · 2025-12-01T15:03:59 1764601439

> Do you have a source for that not being their life savings?

Do you have a source for the $250 not being from the invisible pink unicorn?

Because my understanding it was from the invisible pink unicorn

an0malous · 2025-11-30T22:42:51 1764542571

> Do you have a source for that not being their life savings?

I mean there are many sources that talk about the $300k he received from his family to start Amazon, it's a famous story. None of those sources mention that it was his family's life savings. I don't really know how to provide a source that says it wasn't his family's life savings, but I also can't provide a source that says he wasn't an alien from Zeta Reticula. This is generally the problem with proving a negative and why the onus is usually on the person making a positive assertion.

> if you're going to assert Bezos came from wealth in the first place, you have to back that up.

I did, I'm saying that a family that can give their son $300k to start a business in 1993 is wealthy. That would be about $674k today.

skeeter2020 · 2025-11-30T18:53:41 1764528821

isn't this an argument against the original post: that the more dice rolls the greater likelihood of success?

datavirtue · 2025-11-30T21:54:34 1764539674

Yep, my father, with no business training or college was funded by my grandfather and was in business for years, decades. He ultimately failed without any savings and died in poverty. Being a small business owner was the only job he ever had.

lisbbb · 2025-11-30T22:49:15 1764542955

My grandfather was similar--he was the first one to leave the farm life and tried several different careers and businesses. He worked for a railroad, was a realtor, owned a lumber yard, and lastly owned a delicatessen. The lumber yard nearly destroyed the entire family because he would sell on credit and then contractors failed to pay up on time. It was a huge disaster and the the thing is, this was way before the Home Depot national type chains or the "84 Lumber" regional type chains and if he had had any business acumen at all, he could have been the franchise. People don't know what they don't know. Anyways, my dad worked for my grandfather for free for several years and screwed up his life quite a bit doing so in order to "save the family" and I think my dad has told me this damn story every single time I have called him on the telephone for at least the past 30 years. His complex over the whole situation must be enormous!

This is why I never started a business myself. I figured it was a family curse to fail at business.

an0malous · 2025-11-30T19:21:51 1764530511

Bezo’s maternal grandfather worked for the Department of Energy and owned a ranch in Texas. They were wealthy enough to have $300k to give to Jeff in 1993.

granzymes · 2025-11-23T01:40:09 1763862009

For one, the simple answer is incomplete. It gives the fully unwrapped type of the array but you still need something like

  type FlatArray<T extends unknown[]> = Flatten<T[number]>[]

The main difference is that the first, rest logic in the complex version lets you maintain information TypeScript has about the length/positional types of the array. After flattening a 3-tuple of a number, boolean, and string array TypeScript can remember that the first index is a number, the second index is a boolean, and the remaining indices are strings. The second version of the type will give each index the type number | boolean | string.

granzymes · 2025-11-17T16:47:16 1763398036

The standards body is deprecating XSLT with support from Mozilla and Safari (Mozilla first proposed the removal).

Not sure how you got from that to “Google is ignoring standards”.

_heimdall · 2025-11-17T17:48:07 1763401687

There's a lot of history behind WhatWG that revolves around XML.

WhatWG is focused on maintaining specs that browsers intend to implement and maintain. When Chrome, Firefox, and Safari agree to remove XSLT that effectively decides for WhatWG's removal of the spec.

I wouldn't put too much weight behind who originally proposed the removal. It's a pretty small world when it comes to web specifications, the discussions likely started between vendors before one decided to propose it.

NewsaHackO · 2025-11-17T18:19:02 1763403542

The issue is you can’t say to put little weight who originally proposed the removal if the other poster is putting all the weight on Google, who didn’t even initially propose it

_heimdall · 2025-11-17T18:50:37 1763405437

I wouldn't put weight on the initial proposer either way. As best I've been able to keep up with the topic, google has been the party leading the charge arguing for the removal. I thought they were also the first to announce their decision, though maybe my timing is off there.

akerl_ · 2025-11-17T20:10:05 1763410205

It doesn't seem like much of a charge to be led. The decision appears to have been pretty unanimous.

_heimdall · 2025-11-17T20:28:35 1763411315

By browser vendors, you mean? Yes it seems like they were in agreement and many here seem to think that was largely driven by google though that's speculation.

Users and web developers seemed much less on board though[1][2], enough that Google referenced that in their announcement.

[1] https://github.com/whatwg/html/issues/11578 [2] https://github.com/whatwg/html/issues/11523

akerl_ · 2025-11-17T20:32:37 1763411557

Yes, that's what I mean. In this comment tree, you've said:

> google has been the party leading the charge arguing for the removal.

and

> many here seem to think that was largely driven by google though that's speculation

I'm saying that I don't see any evidence that this was "driven by google". All the evidence I see is that Google, Mozilla, and Apple were all pretty immediately in agreement that removing XSLT was the move they all wanted to make.

You're telling us that we shouldn't think too hard about the fact that a Mozilla staffer opened the request for removal, and that we should notice that Google "led the charge". It would be interesting if somebody could back that up with something besides vibes, because I don't even see how there was a charge to lead. Among the groups that agreed, that agreement appears to have been quick and unanimous.

_heimdall · 2025-11-17T23:17:10 1763421430

In the github issues I have followed, including those linked above, I primarily saw Google engineers arguing for removing XSLT from the spec. I'm not saying they are the sole architects of the spec removal, and I'm not claiming to have seen all related discussions.

I am sharing my view, though, that Google engineers have been the majority share of browser engineer comments I've seen arguing for removing XSLT.

andrewl-hn · 2025-11-17T17:07:46 1763399266

Probably if Mozilla didn't push for it initially XSLT would stay around for another decade or longer.

Their board syphons the little money that is left out of their "foundation + corporation" combo, and they keep cutting people from Firefox dev team every year. Of course they don't want to maintain pieces of web standards if it means extra million for their board members.

echelon · 2025-11-17T17:47:23 1763401643

Mozilla's board are basically Google yes-people.

I'm convinced Mozilla is purposefully engineered to be rudderless: C-suite draw down huge salaries, approve dumb, mission-orthgonal objectives, in order to keep Mozilla itself impotent in ever threatening Google.

Mozilla is Google's antitrust litigation sponge. But it's also kept dumb and obedient. Google would never want Mozilla to actually be a threat.

If Mozilla had ever wanted a healthy side business, it wasn't in Pocket, XR/VR, or AI. It would have been in building a DevEx platform around MDN and Rust. It would have synergized with their core web mission. Those people have since been let go.

cxr · 2025-11-17T20:31:26 1763411486

> If Mozilla had ever wanted a healthy side business, it wasn't in Pocket, XR/VR, or AI. It would have been in building a DevEx platform around MDN and Rust[…] Those people have since been let go.

The first sentence isn't wrong, but the last sentence is confused in the same way that people who assume that Wikimedia employees have been largely responsible for the content on Wikipedia are confused about how stuff actually makes it into Wikipedia. In reality, WMF's biggest contribution is providing infrastructure costs and paying engineers to develop the Mediawiki platform that Wikipedia uses.

Likewise, a bunch of the people who built up MDN weren't and never could be "let go", because they were never employed by Mozilla to work on MDN to begin with.

(There's another problem, too, which is that addition to selling short a lot of people who are responsible for making MDN as useful as it is but never got paid for it, it presupposes that those who were being paid to work on MDN shouldn't have been let go.)

akerl_ · 2025-11-17T20:14:17 1763410457

So the idea is that some group has been perpetuating a decade or so's worth of ongoing conspiracy to ensure that Mozilla continues to exist but makes decisions that "keep Mozilla itself impotent"?

That seems to fail occam's razor pretty hard, given the competing hypotheses for each of their decisions include "Mozilla staff think they're doing a smart thing but they're wrong" and "Mozilla staff are doing a smart thing, it's just not what you would have done".

cxr · 2025-11-17T21:16:59 1763414219

You're not wrong.

And where philosophical razors are concerned, the most apt characterization of the source of Mozilla's decay is the one that Hanlon gave us.

glenstein · 2025-11-17T17:53:23 1763402003

Can you say more about the teams let go who worked on MDN and Rust? Wondering if I can read anything on it to stay up to speed.

jacquesm · 2025-11-17T17:56:55 1763402215

https://news.ycombinator.com/item?id=24143819

lenkite · 2025-11-18T13:19:46 1763471986

> The standards body is deprecating XSLT

The "CORPO CARTEL body" is deprecating XSLT. WhatWG is a not really a standards body like the W3C.

mtillman · 2025-11-17T20:35:59 1763411759

I think the person you’re replying to was referring to the partial support of XML instead of the xslt part.

echelon · 2025-11-17T17:45:55 1763401555

Then standards body is Google and a bunch of companies consuming Google engine code.

dewey · 2025-11-17T17:54:10 1763402050

I guess you mean except Mozilla and Safari...which are the two other competing browser engines? It's not like a it's a room full of Chromium based browsers.

themafia · 2025-11-17T21:04:19 1763413459

Do Mozilla and Safari _not_ take money from Google?

Forgeties79 · 2025-11-17T18:23:59 1763403839

Safari yes

Mozilla…are they actually competing? Like really and truly.

bigyabai · 2025-11-17T18:49:51 1763405391

Mozilla has proven they can exist in a free market; really and truly, they do compete.

Safari is what I'm concerned about. Without Apple's monopoly control, Safari is guaranteed to be a dead engine. WebKit isn't well-enough supported on Linux and Windows to compete against Blink and Gecko, which suggests that Safari is the most expendable engine of the three.

noosphr · 2025-11-17T19:24:18 1763407458

If your main competitor is giving you 90% of your revenue they aren't a competitor.

Forgeties79 · 2025-11-18T02:01:22 1763431282

I really can’t imagine Safari is going anywhere. Meanwhile the Mozilla Foundation has been very poorly steering the ship for several years and has rightfully earned the reputation it has garnered as a result. There’s a reason there are so many superior forks. They waste their time on the strangest pet projects.

Honestly the one thing I don’t begrudge them is taking Google’s money to make them the default search engine. That’s a very easy deal with the devil to make especially because it’s so trivial to change your default search engine which I imagine a large percentage of Firefox users do with glee. But what they have focused on over the last couple of years has been very strange to watch.

I know Proton gets mixed feelings around here, but to me it’s always seemed like Proton and Mozilla should be more coordinated. Feel like they could do a lot of interesting things together

nerdponx · 2025-11-17T19:34:00 1763408040

https://news.ycombinator.com/item?id=45955979 this sibling comment says it best

meindnoch · 2025-11-17T18:53:52 1763405632

>Mozilla has proven they can exist in a free market; really and truly, they do compete.

This gave me a superb belly laugh.

oblio · 2025-11-17T22:43:11 1763419391

Mozilla used to compete well but that ended... at least 10 years ago?

granzymes · 2025-10-21T19:30:57 1761075057

Thankfully the New York Times lost their attempt to force OpenAI to continue preserving all logs on an ongoing basis, but they still need to keep some of the records they retained before September.

https://mashable.com/article/openai-court-ordered-chat-gpt-p...

granzymes · 2025-10-21T18:44:25 1761072265

Being able to search browser history with natural language is the feature I am most excited for. I can't count the number of times I've spent >10 minutes looking for a link from 5 months ago that I can describe the content of but can't remember the title.

lxgr · 2025-10-21T18:49:07 1761072547

In my experience, as long as the site is public, just describing what I want to ChatGPT 5 (thinking) usually does the trick, without having to give it access to my own browsing history.

jacekm · 2025-10-21T19:10:57 1761073857

I think that such feature is already available in Chrome https://support.google.com/chrome/answer/15305774?hl=en

baal80spam · 2025-10-21T19:29:54 1761074994

Ah, makes sense why I need to use an extension for that:

> To use this feature, you must be located in the US

bobviolier · 2025-10-21T21:16:26 1761081386

I just don't get this. Google has SO MANY THINGS that are US only. While most other companies release things to everyone (like OpenAI).

How does Google expect to compete with OpenAI globally if they keep limitting the rest of the world?

ezst · 2025-10-21T23:04:49 1761087889

Google is an established business, OpenAI is desperately burning money trying to come up with a business plan. Exports controls and compliance probably isn't going to be today's problem for them, ever.

viking123 · 2025-10-22T07:58:25 1761119905

They don't, the Gemini crap is dead in the water and only people who care about it are hackernews people or some weirdos. For normies ChatGPT equals AI and that's that, they already won by the brand alone.

When normies hear Gemini, they cringe and get that icky feeling.

It didn't help that when Gemini came out it was giving you black founding fathers and Asian nazis.

midasz · 2025-10-22T11:01:13 1761130873

My dad uses Gemini because it's the default thingy on his android phone - I asked him if he used ChatGPT and he said yes and navigated to Gemini. Most people really don't care that much I think.

ninininino · 2025-10-21T23:02:46 1761087766

What's not to get? Talk to your political representatives about reining in the EDPB if you want access to cutting edge features.

Or if you are not in the EU, spend more money I guess and become an attractive market.

anhner · 2025-10-22T08:41:46 1761122506

"Ask your politicians to allow foreign companies free reign with users' data" is certainly a wild take.

It's google's choice to forgo privacy and thus a huge market like the EU. Other companies, like OpenAI, seem to manage fine.

DocTomoe · 2025-10-22T09:22:39 1761124959

At some point, Europe will learn that if they keep preventing international solutions without creating a climate in which similar or better local solutions can emerge, they are cutting their own nose to spite the face. There are secondary and tertiary effects of this, and eventually the 'huge market' will shrink in importance. I mean, Brazil is a huge market, and no-one cares about them thanks to brain-dead legislation concerning tech imports and economic irrelevance.

viking123 · 2025-10-22T10:05:36 1761127536

No one cares about it because you get robbed on gunpoint at the stoplights.

Again no one in Europe cares about some Gemini because frankly no one even knows what it is. They had their run with the black founding fathers and most people who tried it then dismissed it forever.

For normal people ChatGPT = AI.

bobviolier · 2025-10-26T11:10:05 1761477005

I don't think this is true though. Lots of people around me use "ChatGPT" but then are actually using Gemini because that's what is on their phone.

viking123 · 2025-10-22T08:31:11 1761121871

You think normies even know what Gemini is LMAO. No one except some hackernews users and weirdos know it.

supportengineer · 2025-10-21T23:26:36 1761089196

Maybe those features are illegal in every other country.

mahirsaid · 2025-10-22T04:48:22 1761108502

Its not by choice

thewebguyd · 2025-10-22T01:04:21 1761095061

Isn’t this what Recall in Windows 11 is trying to solve, and everyone got super up in arms over it?

I have no horse in the race either way, but I do find it funny how HN will swoon over a feature from one company and criticize to no end that same feature from another company just because of who made it.

At least Recall is on-device only, both the database and the processing.

Vinnl · 2025-10-22T10:22:06 1761128526

> I do find it funny how HN will swoon over a feature from one company and criticize to no end that same feature from another company

It's pretty easy to explain if you assume that HN consists of multiple people with varying opinions.

sensanaty · 2025-10-22T13:41:06 1761140466

I'm the last person to defend OpenAI on literally anything and personally I hope they crash and burn in a spectacular fashion and take the whole market down with them, but you at least have a choice in using Atlas as it's simply a program that you install on your computer of your own volition. With Recall, there's no choice, M$ will just shove it down your throat whether you want it or not, and most likely (knowing their history it's pretty much a guarantee) you'll be stuck with the privacy nightmare that is Recall with nothing you can do about it.

So the pushback makes perfect sense to me. Also, HN isn't 1 entity, it's many people with many different opinions, you can easily find people who were/are excited about Recall the same way people are excited about Atlas.

mlrtime · 2025-10-22T02:41:22 1761100882

I think it makes sense, many don't have a choice to run Windows (Linux/Mac won't work for them for whatever reason). If MS turned on Recall without a disable (and its not hard to believe they wouldn't, onedrive), people would be upset.

With ChatGPT Atlas, you simply uninstall it. done.

elric · 2025-10-21T19:24:58 1761074698

Are we talking searching the URLs and titles? Or the full body of the page? The latter would require tracking a fuckton of data, including a whole lot of potentially sensitive data.

Ethee · 2025-10-21T19:27:38 1761074858

All of these LLMs already have the ability to go fetch content themselves, I'd imagine they'd just skim your URLs then do it's own token-efficient fetching. When I use research mode with Claude it crawls over 600 web pages sometimes so imagine they've figured out a way to skim down a lot of the actual content on pages for token context.

visarga · 2025-10-22T00:45:33 1761093933

I made my own browser extension for that, uses readability and custom extractors to save content, but also summarizes the content before saving. Has a blacklist of sites not to record. Then I made it accessible via MCP as a tool, or I can use it to summarize activity in the last 2 weeks and have it at hand with LLMs.

janalsncm · 2025-10-21T21:44:07 1761083047

A tangential issue is that pages often change over time. So your snapshot could be stale. Not sure whether that’s a bug or a feature.

patapong · 2025-10-21T23:31:35 1761089495

Opera had the latter before switching to Chromium... I miss it.

hbn · 2025-10-21T19:56:56 1761076616

I find browser history used to be pretty easy to search through and then Google got cute by making it into your "browsing journeys" or something and suddenly I couldn't find anything

granzymes · 2025-09-17T22:06:45 1758146805

There is no balancing happening here. YouTube needs to make an API call to attribute a view to a video, and easylist started blocking that API call. YouTube was perfectly happy a month ago to count views for users that were blocking ads, and presumably remains happy to do so.

The only thing that changed is easylist blocked the API.

hananova · 2025-09-18T02:59:42 1758164382

The do not need an API call, obviously they know that the video is being watched, because it's being streamed.

granzymes · 2025-09-18T03:25:17 1758165917

YouTube serves videos from CDNs, many of which it does not own.

moffkalast · 2025-09-18T05:16:24 1758172584

I missed the part where that's our problem.

zeroimpl · 2025-09-18T07:30:44 1758180644

Metrics from the CDN will be wildly inaccurate. Also downloading a video isn’t the same as watching it.

justinclift · 2025-09-17T22:52:25 1758149545

> The only thing that changed is easylist blocked the API.

Wonder if there's a good reason they started blocking that API?