Hacker Newsnew | past | comments | ask | show | jobs | submit | m12k's commentslogin

So, does 5.2 still have a knowledge cutoff date of June 2024, or have they managed to complete another full pre-training run?

If you don't want this to break eventually, you need it tested every time your CI/CD test suite runs. Manual testing just doesn't cut it

We have the exact same problem with visual interfaces, and the combination of manual testing for major changes + deterministic UI testing works pretty well.

Actually it could be even easier to write tests for the screen reader workflow, since the interactions are all text I/O and pushing keys.


AI in your CI pipeline won't help either then, if it randomly gives different answers

An AI-generated automated testing script in your pipeline will do great though.

And then we're back at your own:

> I'm not convinced at all by most of the heuristic-driven ARIA scanning tools.


That's entirely different.

ARIA scanning tools are things that throw an error if they see an element that's missing an attribute, without even attempting to invoke a real screenreader.

I'm arguing for automated testing scripts that use tools like Guidepup to launch a real screenreader and assert things like the new content that was added by fetch() being read out to the user after the form submission has completed.

I want LLMs and coding agents to help me write those scripts, so I can run them in CI along with the rest of my automated tests.


That's very different from what I thought you were arguing for in your top comment, though: a computer-use agent proving the app is usable through a screen reader alone (and hopefully caching a replayable trace to not prompt it on every run).

Guidepup already exists, if people cared they'd use it for tests with or without LLMs. Thanks for showing me this tool BTW! I agree testing against real readers is better than using a third-party's heuristics.


So does hiring a person or tests which rely on entropy because exhaustive testing is infeasible. If you can wrangle the randomness (each has different ways of going about that) then you end up with very useful tests in all 3 scenarios, but only automated tests scale to running every commit. You probably still want the non-automated tests per release or something as well if you can, depending what you're doing, but you don't necessarily want only invariant tests in either case.

They've hedged their bets by making, and selling, both games whose monetization is exploitative and non-exploitative


And here's a couple videos about a technique that was inspired by the Obra Dinn's dither, but making it surface stable:

https://www.youtube.com/watch?v=HPqGaIMVuLs (explanation)

https://www.youtube.com/watch?v=EzjWBmhO_1E (demo)


These are absolutely amazing, thank you! I had wondered whether something like this was possible because I have this 1-bit-deep screen here, and now I'm delighted to see that it is. I'm just not sure if my machine has enough CPU to manage it.


I mean, the right to privacy is already enshrined in the EU's human rights. The courts would likely strike Chat Control down if it were to pass. But I wish there was a way to prevent our politicians from even trying this shit.


Other things are enshrined in the EU human rights as well, many of them ultimately contradicting each other if you follow them to their logical conclusion.

It's the task of parliaments, governments, and courts to reevaluate and resolve all these contradictions over and over again. It's tedious and takes a lot of resources, but that's the price for democracy.


> I mean, the right to privacy is already enshrined in the EU's human rights.

The constitution of the Democratic People's Republic of Korea (i.e. North Korea) famously guarantees freedom of expression as a fundamental right for the people. That hasn't stopped the government from trampling all over freedom of expression, though. The EU is of course nowhere near North Korea in terms of what is considered acceptable, but don't ever trust that the words in the constitution will be enough to keep the government from doing something.


I like to compare this to mandating surveillance cameras in every home. It would certainly make detecting and investigating many crimes easier. And the government might pinky swear to never watch without a warrant. They may even keep that promise. But that slippery slope is far from the only issue. Even more damning is that as long as this exists, whether used in official capacity or not, it will be the most sought after thing by hackers from crime organizations and hostile nations. Espionage, blackmail, you name - no person or organization would ever be safe, everybody's privacy and security is undermined.


There is a reason why they added exemptions for themselves. Either they believe it is unsafe or perhaps there is a problem with child abuse on the EU legislator level which they want to cover up.

We are at a point where we shouldn't have to justify opposition to it. Just hold legislators of the EU accountable. If that isn't possible, hold the whole EU accountable and if that isn't possible, the EU has no legitimacy for such laws in the first place. Back to those responsible on a national level and repeat.


>We are at a point where we shouldn't have to justify opposition to it. Just hold legislators of the EU accountable.

I have no idea what this means.


I don't think comparing it to something like camera surveillance inside your home is a good idea.

You kind of own your home – if someone places camera in your property, you can just remove it / obstruct vision / sound etc. If doing that will send you to jail then the level of dystopia around is so big it's irrelevant anyway – you're a slave with no rights and you will do that the shocking stick tells you to do.

Phones are different - you kind of don't own them by default because bootloader is locked so you are not free to execute the code you want on the device, as well as app store exists which it tells you what you can install and what you cannot install. The only leverage they have is to make Apple/Google remove certain apps from the EU stores.


That's exactly the thing. Legally you own your phones. You are responsible for what they do.

We are now kind of a the crossroad. Either we expand the SaaS model to everything, or we enforce the until-now rules of ownership of the law.


You own your home, but there are still laws regulating what you're allowed to do in your home.


Yes, exactly. This proposal is just free riding on the sadly enstablished conception that you don't really own your device: it doesn't work in your interest but in those of the manufacturer, the developer of the programs you use and, if this becomes law, your government.

If we really want to stop chat control and all the other proposals that will inevitably come after, we should really work hard to try to reverse this. I think asking "don't break encryption, please" is really the wrong way to go about it.


That really depends on the phone. There's definitely phones where you can unlock the bootloader. It's not as common as it should be though, for sure.


How about we compare it with something more realistic? Like https://en.wikipedia.org/wiki/ECHELON. Since 1971, the 5 eyes countries have been spying on people en masse and scanning communications.

You probably don't like the comparission because you want to be an alarmist who is acting like this is new. All the fears you have, have literally been proven to be...


... well founded and spurred the widespread adoption of end to end encryption?


No, it didn't. It took decades for that to happen.


These programs really entered the public consciousness with the Snowden leaks in 2013. Signal was released in 2014.


TextSecure (which later merged with RedPhone to become Signal) had existed since 2010. So it would be interesting to know if there were many other end-to-end encrypted services and products at the time since this was pre-leaks.


I only mentioned one program. A program that is literally comparable because it's literally what is being replaced. That program has been public knowledge in media such as TV shows and movies for decades. So when we're fear-mongering, we should only compare with that, and we should see what effects it had and the nonsense being used for fear-mongering.

Also, Signal was released not because of end-to-end encryption but because the founder sold WhatsApp and wasn't happy with the direction.


You're confusing the founding of the Signal Foundation with the release of Signal. Textsecure/Redphone which Signal came from existed in some part around 2010 or thereafter. Their merging and re-release as an all-in-one IP-based encryption app also came before WhatsApp was sold to Facebook.


> That program has been public knowledge in media such as TV shows and movies for decades.

Nobody I know heard about it before Snowden. You need to provide some statistics to demonstrate it was a common knowledge.


> You need to provide some statistics to demonstrate it was a common knowledge.

It was referenced in popular media for decades... So people knew about it and it was public knowledge. The reason no one cared is that the outcome of it wasn't the horror story being repeated constantly.

The funny thing is, if you think this law would affect you, it will probably reduce the amount of data they get. Why? Because they still spy on you with end-to-end encryption, it's just more work and they hack the shit out of you.


> Because they still spy on you with end-to-end encryption

What are you talking about?

> and they hack the shit out of you

Good luck. I'm using Qubes OS btw.


Maybe the developer that implemented it only had a 120hz display to test it on?


From the article:

"Considering that the distillation requires access to the innards of the teacher model, it’s not possible for a third party to sneakily distill data from a closed-source model like OpenAI’s o1, as DeepSeek was thought to have done. That said, a student model could still learn quite a bit from a teacher model just through prompting the teacher with certain questions and using the answers to train its own models — an almost Socratic approach to distillation."


Right, my bad then I read it in a hurry. They do mention the distinction.


Like PHI — textbooks are all you need. You can create entirely synthetic yet high quality training data with a strong model (the generated textbooks) and make very small models like PHI.


This is exactly what the DeepSeek team did, and now Anthropic is repackaging it a year later, calling it “subliminal learning” or using the teacher and student analogy to take credit for work done by Chinese researchers.

https://malted.ai/deepseek-and-the-future-of-distillation/

While Anthropic and OpenAI are still trying to make sense of what China's top computer scientists pulled off a year ago, something that shook the core of Nvidia's business, China is now showcasing the world's first commercial unhackable cryptography system using QKD and post-quantum cryptography to secure all phone calls between Beijing and Hefei.


>While Anthropic and OpenAI are still trying to make sense of what China's top computer scientists pulled off a year ago

The whole reason they're accusing them of distilling their models is that this was a well-known technique that's relatively easy compared to creating or improving on one in the first place. Deepseek was impressive for how lean it was (and it shook the markets because it demonstrated obviously what the savvier observers already had figured, that the big AI companies in the US didn't have a huge moat), but they certainly did not come up with this concept.


OpenAI raised $40 billion and Anthropic raised $10 billion, claiming they needed the money to buy more expensive Nvidia servers to train bigger models. Then Chinese experts basically said, no you don't. And they proved it.


More like the Egg of Columbus or the Red Queen.

You need to run as hard as you can just to stay where you are, and once you've got the answer it's very much easier to reproduce the result.

This is of course also what annoys a certain fraction of commenters in every discussion about LLMs (and in art, diffusion models): they're overwhelmingly learning from the examples made by others, not investigating things for themselves.

While many scientists will have had an example like Katie Mack's viral tweet* with someone who doesn't know what "research" even is in the first place and also mistakes "first thing I read" for such research, the fact many humans also do this doesn't make the point wrong when it's about AI.

* https://paw.princeton.edu/article/katie-mack-09-taming-troll


So what are you trying to say?

Do you agree that OpenAI and Anthropic are still claiming they need more data centres and more Nvidia servers to win the AI race, while still trying to understand what China actually did and how they did it?


"while" makes the whole false.

> Do you agree that OpenAI and Anthropic are still claiming they need more data centres and more Nvidia servers to win the AI race

Yes. Red Queen[0].

> while still trying to understand what China actually did and how they did it?

No. Egg of Columbus[1]. They're well aware of what DeepSeek did. Just as DeepSeek could easily reproduce American models, the DeepSeek models are not particularly challenging works for any other AI company to follow, understand, and build upon. Here's someone else's reproduction of what they did: https://huggingface.co/blog/open-r1

That it's so easy for these companies to keep up with each other is *the reason why* there's a Red Queen[0] race.

[0] https://en.wikipedia.org/wiki/Red_Queen's_race

[1] https://en.wikipedia.org/wiki/Egg_of_Columbus


Got it now, thanks for explaining.


You're misunderstanding subliminal learning.

Subliminal learning is a surprising result that sheds more light on the process of distillation. It's not Anthropic trying to take credit for distillation.

In particular subliminal learning is the finding that a student model distilled from a teacher model has a communication channel with the teacher model that is extremely difficult to observe or oversee.

If you later fine-tune the teacher model on a very specific thing (in Anthropic's case fine-tuning the teacher to prefer owls over other animals) and then simply prompt the teacher model to output "random" digits with no reference to owls whatsoever, simply training the student model on this stream of digits results in the student model also developing a preference for owls over other animals.

This is a novel result and has a lot of interesting implications both for how distillation works as a mechanism and also for novel problems in overseeing AI systems.


Sorry, I commented on the wrong article. I meant to post this under:

https://alignment.anthropic.com/2025/subliminal-learning/

Regarding your comment, yes, it's well known in the ML world that machines are way better than humans at picking up on correlations. In other words, the output of a model can carry traces of its internal state, so if another model is trained on those outputs, it can end up learning the patterns behind them.

What's contradictory is hearing companies say: "We wrote the software, but we don't fully understand what it's doing once it's trained on trillions of tokens. The complexity is so high that weird behaviours emerge."

And yet, at the same time, they're offering an API to developers, startups, and enterprise customers as if it's totally safe and reliable while openly admitting they don't fully know what's going on under the hood.

Question:

Why did Anthropic made its API publicly available? to share responsibility and distribute the ethical risk with developers, startups, and enterprise customers, hoping that widespread use would eventually normalise training models on copyrighted materials and influence legal systems over time?

Why are they saying "we don't know what's going on, but here's our API"? It's like Boeing saying: "Our autopilot's been acting up in unpredictable ways lately, but don't worry, your flight's on time. Please proceed to the gate.”

So many red flags.


"subliminal learning" does not even work for use cases like distilling o1 to R1 because they do not share a base model


Who's talking about that?

[Edit] My bad, I thought I was commenting on Anthropic's article


i replied to a comment by the hacker news user called pyman which claimed incorrectly that distillation was repackaged as "subliminal learning". so if you are asking me, who is talking about subliminal learning, which is unrelated to the topic of the article, the answer is that the hacker news user called pyman is doing that.


Ah you are right, I was commenting on this article:

https://alignment.anthropic.com/2025/subliminal-learning/


This is exactly what the DeepSeek team did, and now Anthropic is repackaging it a year later, calling it “subliminal learning” or using the teacher and student analogy to take credit for work done by Chinese researchers.

What? Distillation is way older. The Hinton paper was from 2015 (maybe there is even earlier work):

https://arxiv.org/abs/1503.02531

When I was still in academia, we were distilling models from BERT/RoBERTa-large to smaller models (remember when those models were considered large?) in 2019 using logits and L2 distance of hidden layers. Before that we were also doing distillation of our own transformer/lstm models on model outputs (though with a different motivation than model compression, to learn selectional preferences, etc.).


My point is: OpenAI raised $40 billion and Anthropic raised $10 billion, claiming they needed the money to buy more expensive Nvidia servers to train bigger models. Then Chinese experts basically said, no you don't. And they proved it.


> The vast majority of cases stock options end up worthless

Also, even if the company ends up worth a lot of money, there's no guarantee that a way to liquidate, such as an IPO, exit or secondary market, will become available in any reasonable time frame. And as a regular employee you have exceedingly little to say in bringing about such events. There's not much fun in having a winning lottery ticket that can't be cashed in, in fact it's highly stressful.


"While less access to health care and weaker social structures can explain the gap between the wealthy and poor in the US, it doesn't explain the differences between the wealthy in the US and the wealthy in Europe, the researchers note. There may be other systemic factors at play that make Americans uniquely short-lived, such as diet, environment, behaviors, and cultural and social differences."

Off the top of my head, obesity seems like the obvious culprit to investigate. If so, I wonder if semaglutide will close this gap again?


American prepared food has on average, almost twice as much sugar as in Europe, largely due to differences in regulation.

Americans walk less than in Europe, making less than half as many foot trips as Europeans, largely due to differences in infrastructure.

Americans visit the doctor less often than in Europe, largely due to the lack of universal healthcare all other high-income countries have.

I think obesity might be the symptom, not the actual culprit.


obesity is a symptom, not a cause.

to me, the rabid response to anything remotely resembling socialism, and the inability to see life as anything but a zero-sum game is the obvious culprit. this precludes caring for eachother, and creates a life that's essentially a never-ending rat race for everyone, rich and poor alike.

you cant inject your way out of a society that is, at its core, defined by class/racial segregation, systemic inequality and distrust of governments.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: