Hacker Newsnew | past | comments | ask | show | jobs | submit | IdealeZahlen's commentslogin

Whatever the benchmarks might say, there's something about Claude that seems to deliver consistently (although not always perfect) quite reliable outputs across various coding tasks. I wonder what that 'secret sauce' might be and whether GPT-5 has figured it out too.


Agreed, I always give my one pager product briefs to AI to break down into phases and tasks, and then progress trackers. I explicitly prompt for verbose phases, tasks and test plans.

Yesterday without much promoting Claude 4.1 gave me 10 phases, each with 5-12 tasks that could genuinely be used to kanban out a product step by step.

Claude 3.7 sonnet was effectively the same with fewer granular suggestions for programming strategies.

Gemini 2.5 gave me a one pager back with some trivial bullet points in 3 phases, no tasks at all.

o3 did the same as as Gemini, just less coherent.

Claude just has whatever the thing is for now


How are you having claude track these phases/tasks? Eg are you having it write to a TASKS.md and update it after each phase?


Just say begin task 1, 2 etc scroll back and see the task. Or copy paste into notes and do them sequenced


If you have any examples of these one pagers I’d love to see them!


Gemini Pro or Flash?


My experience has been that Claude Code is exceptional at tool use (and thus working with agentic IDEs) but... not the smartest coder. It will happy re-invent the wheel, create silos, or generate terrible code that you'll only discover weeks or months later. I've had to rollback weeks of code to discover major edge regressions that Claude had introduced.

Now, someone will say 'add more tests'. Sure. But that's a bandaid.

I find that the 'smarter' models like Gemini and o3 output better quality code overall and if you can afford to send them the entire context in a non-agentic way .. then they'll generate something dramatically superior to the agentic code artifacts.

That said, sometimes you just want speed to proof a concept and Claude is exceptional there. Unfortunately, proof of concepts often... become productionized rather than developers taking a step back to "do it right".


I disagree that tests are bandaids. Humans needs tests to avoid doing regressions. If you avoid tests you are giving the AI a much harder task than what human programmers usually have.


That's been my experience too. Even though Gemini also does seem to do the fancy one-shot demo code well, in day to day coding, Claude seems to do a much better job of just understanding how programming actually works, what to do, what not to do, etc.


The secret is just better context engineering. There is no other “secret” sauce, all these models are built on the same concepts.


Claude is fast too, Gemini isn’t as good and just gets hung up on things Claude doesn’t.


I've always wondered how spatial reasoning appears to be operating quite differently from other cognitive abilities, with significant individual variations. Some people effortlessly parallel park while others struggle with these tasks despite excelling at other forms of pattern recognition. What was particularly intriguing for me is that some people with aphantasia have no difficulty with spatial reasoning tasks, so spatial reasoning may be distinct from reasoning based on internal visualization.


I have had this idea about parking a car...

Most people have proprioception - you know where the parts of your body are without looking. Close your eyes and you intuitively know where your hands and fingers are.

When parking a car, it helps to sort of sit in the drivers seat and look around the car. Turn your neck and look past the back seat where your rear tire would be. sense the edges of the car.

I think if you sort of develop this a bit you might "feel" where your car is intuitively when pulling into a parking space or parallel parking. (car-prioception?)

(but use your mirrors and backup camera anyway)


As someone who hasn't had to own a car in over 8 years (lived in NYC) and recently bought a 2023 Hyundai Santa Fe with birdseye view parking it shocks me how uncalibrated my car-prioception is.

It's made me realize that objects are much further from the boundaries of my car when backing into a spot parallel parking. I would never think to get so close to another car if I had to only rely on my own senses.

With that said, I realize there's a significant number of people that are even poorer estimators of these distances than myself. I.e. those that won't pass through two cars even though to me it's obvious that they could easily pass.

I have to imagine a big part of this has to do with risk assessment and lack of risk-free practice opportunity IRL. Nobody is seeing how far they can push or train themselves in this regard when the consequences are to scratch up your car and others' cars. With the birdseye view I can actually do that now!


my theory is that aphantasia is purely about conscious access to visualizing not the existence of the ability to visualise.

I have aphantasia but I would say that spatial reasoning is one of the things my brain is the best at


How does one determine they have aphantasia? How do you know that you are not doing exactly this thing people call visualizing when you perform spatial reasoning?


No idea, but when people say they can visualize an apple and then say it feels like number 1 on that chart, I would say that my experience of whatever I'm doing when I'm 'visualizing' an apple is more like 4 or 5

https://twistedsifter.com/wp-content/uploads/2023/10/AppleVi...

I can only assume people are trying to accurately describe their own experience so when my experience seems to differ a lot it seems to me that there is more going on than just confusion about wording.


I mean just walking down the street or through a supermarket, it seems to me like 95% of people have no spatial awareness at all. Walking forward while looking to the side or backward.

Either that or they're perfectly capable, they just don't care.


She certainly fell into the rage bait trap, and I don't really like her these days, but this video seems fine - no ranting, just a nice piece of science communication.


Rings true for my impression too. In the end, she’s a YouTuber now, for better or worse, but still puts out what look like thoughtful and informative enough videos, whatever personal vendettas she holds grudges over.

I suspect for many who’ve touched the academic system, a popular voice that isn’t anti-intellectual or anti-expertise (or out to trumpet their personal theory), but critical of the status quo, would be viewed as a net positive.


[flagged]


> [Quantum Cognition] is a real field with dozen of collaborators and even a textbook.

Flat Earth is also a real field, with conferences with hundreds of attendees.


Have you visited Busemeyer's website ?

Here's the textbook he wrote, 2nd edition

https://www.cambridge.org/us/universitypress/subjects/psycho...

Looks totally respectable. Why do you feel the need to ridicule it?


https://arxiv.org/pdf/1309.5673

> Decision making – When people are given a chance to play a particular gamble twice, if they think they won the first play, or alternatively if they think they lost the first play, then the majority chooses to play again on the second round. Given these preferences, they should also play the second round even if they don’t think about the outcome of the first round. Yet people do just the opposite in the latter case (Tversky & Shafir, 1992). This finding violates the law of total probability, yet it can be explained as a quantum interference effect

I wouldn't say that "it can be explained as a quantum interference effect" for any respectable definition of "explained" but your mileage may vary.

> The Contextual Nature of Concepts and their Combinations – When quantum entities become entangled, they form a new entity with properties different from either constituent, and one cannot manipulate one constituent without simultaneously affecting the other. The mathematics of entanglement has been used to model the nonmonotonic relations observed among concepts when they are combined to form a new concepts such as STONE LION

https://scispace.com/pdf/the-emergence-and-evolution-of-inte...

> Different possible ways of looking at this combination exist, and a positive answer to both questions ‘is STONE LION a LION’ and ‘is STONE LION a STONE’ make sense. Hence a better approach is to consider both as entities, and use Fock space for a two entity situation, and forgo the more simple model of considering one of them as a context.

Vanilla Quantum Models of Cognition and Decision are not enough. One really needs Quantum Field Theory Models of Cognition and Decision. "The genuine structure of quantum field theory is needed to match predictions with experimental data." https://arxiv.org/pdf/0705.1740


I don't think this cursory look at the abstracts and first pages of these papers is achieving the effect you intended.

The first quote summarize the findings (page 4) of Tversky & Shafir, 1992. Google indicates this paper has 978 citations.

>is STONE LION a STONE

This is called a typicality test.

Jurafsky, Lecun & two more fellows dropped this paper recently: https://arxiv.org/pdf/2505.17117 (emphasis mine)

>This work aims to bridge this gap by integrating cognitive psychology, information theory, and modern NLP. We pose three central research questions to guide our investigation: [RQ1]: To what extent do concepts emergent in LLMs align with human-defined conceptual categories? [RQ2]: Do LLMs and humans exhibit similar internal geometric structures within these concepts, ESPECIALLY CONCERNING ITEM TYPICALITY? RQ3]: How do humans and LLMs differ in their strategies for balancing representational compression with the preservation of semantic fidelity when forming concepts?

It would be interesting to try to reproduce Hampton's original experiment on typicality with LLMs and run the same analysis as https://arxiv.org/pdf/1208.2362.

Anyway thanks for the exchange. Running these thoughts once more in my head allowed me to reconsider some stuff I found about Zipf distributions that might tie into the tradeoff between compression and meaning Lecun is talking about.

The quantum zipf stuff is here: https://arxiv.org/abs/1909.06845

It's here too: https://link.springer.com/article/10.1134/S1061920806030071

>quantum field theory

Chomsky is working on Hopf Algebra.

https://magazine.caltech.edu/post/math-language-marcolli-noa...


> I don't think this cursory look at the abstracts and first pages of these papers is achieving the effect you intended.

The only effect intended was my own amusement. I doubted anyone would ever read the comment.

> Jurafsky, Lecun & two more fellows dropped this paper recently

Somehow they missed that quantum field theory formalism is essential to explain such things.


>Somehow they missed that quantum field theory formalism

Yet to reach this conclusion you have to notice they stand on the same ground (typicality tests). I'd wagger that they could substitue it with question ordering effects and reach similar conclusions.


Sabine Hossenfelder's video on this: https://youtu.be/mxWJJl44UEQ


In my perception Sabine’s quality degraded over the last year or so.

Maybe it’s also the topics she covers. I’m not sure why she is getting into fantasies of AGI for example.

I liked the skeptical version of her better.


Agree in general -- I think the tiktok/shorts wave is biasing strongly for shorter video and then the time format kills any followup/2nd iteration-explanation

But this one was pretty good.


As far as I've seen, her position is only that AGI is pretty much inevitable. What's so fantastical about that?


I think plenty of people don't think it's inevitable. I'm no ai researcher, just another software engineer (so no real expertise). I think it will keep getting better but the end point is unclear.


The reason it's inevitable is because because it follows from physics principles. The Bekenstein Bound proves that all physical systems of finite volume contain finite information, humans are a finite volume, ergo a human contains finite information. Finite information can be fully captured by a finite computer, ergo computers can in principle perfectly simulate a human person.

This + continued technological development entails that AGI is inevitable.


Although the reasoning is clear, you (and her) jump from "possible in principle" to "inevitable in practice".

Just because something is physically possible doesn't make it "inevitable". That's why it's just a fantasy at this point.


As I said above:

> This + continued technological development entails that AGI is inevitable.

Everyone takes the above as a given in any discussion of future projections.


I don't know if it's just the persona she plays in these videos, but it's so so so creepy and cringe.


Agreed, she's pumping out too many videos I think. Perhaps she's succumbed a bit to the temptation of cashing in on a reputation, ironically one built on taking down grifters.


I found https://www.quantamagazine.org/epic-effort-to-ground-physics... much more informative. Sometimes you can't digest everything in 10min.



I've been building some interactive educational stuff (mostly math and science) with react / three.js using Claude.


Wow, I always thought Hilary was the only Putnam to come up with a computational theory of mind.


So the 90-day tariff pause "fake news" wasn't fake after all...?


Dump and pump... Trump friends must be making a fortune


I truly wondered last Tuesday if he was only getting more "serious" because the market had stopped responding to his announce-a-planned-tariff-one-day-then-reverse-it-the-next strategy, so it was no longer useful, and he'd need to string them along far longer to make it work at least one more time.


He's President. There's plenty of juice to squeeze out of the markets still. He could threaten to drop a nuke on Australia, for example.


All of this ONLY after

https://www.reuters.com/world/us/us-sec-see-exodus-hundreds-...

Eyes wide shut as they loot.



"The History of Approximation Theory" by Karl-Georg Steffens is a great reference for historical contexts.

For Chebyshev, who devoted his life to the construction of various 'mechanisms' [1][2], his motivation was to determine the parameters of mechanisms (that minimizes the maximal error of the approximation on the whole interval).

[1] https://en.wikipedia.org/wiki/Mechanism_(engineering)

[2] https://tcheb.ru/


In particular he studied Watt's mechanism, which was an integral component of steam engines powering the industrial revolution in Western Europe. Its optimal configuration wasn't really well understood at the time which led to practical problems. Chebyshev traveled from Russia (which wouldn't really enjoy an industrial revolution till much later) to Western Europe and discussed with experts and people who operated these engines. He brought back to Russia with him notes and experimental data, and those informed the development of what would later be known as minimax theory, and Chebyshev polynomials which provide polynomial solutions to minimax problems.

In the course of developing that theory he founded the modern field of approximation theory, and the St. Petersburg school of mathematics. I think his approach of using applied problems and techniques to inform the development of pure math deeply influenced the whole of Soviet and Slavic mathematics in the century that followed

(and yes, the book by Karl-Georg Steffens is beautiful!)

Edit: To answer the grandparent's question, aside from things directly invented by Chebyshev or his students, often things are called "Chebyshev" when there's either a Chebyshev polynomial or a minimax problem lurking in the background


Calling this 'alternative' construction seems like coming full circle since this line of combinatorial argument is how Boltzmann came up with his H-function in the first place, which inspired Shannon's entropy.


Yep! This relationship is well known in statistical mechanics. I was just surprised that in many years of intersecting with information theory in other fields (computational neuroscience in particular) I'd never come across it before, even though IMO it provides an insightful perspective.


What kind of ODE solvers are used to simulate chaotic systems? They must be very accurate if even a small error can result in a completely different result.


You can use the standard Euler method with very small delta T.


Respectfully disagree; Euler's method is absolutely terrible because it's unconditionally unstable; far better is to use something like Leapfrog or velocity Verlet (both of which have 2nd order accuracy and better stability, for exactly the same number of derivative evaluations). Euler integration is essentially always the wrong tool for the job.


Would RK4 work for something like this, or does it lack some stability properties?


If you want to see the folly of RK4, give it something like a ball bouncing and watch as it bounces slightly higher each time. Subtle at first, but trust me its there. The comment above you is right. If you have conservation of energy and want to keep it that way, use verlet.

Edit: before anyone calls me out on this, the same trick also works when there's no discontinuity in the force function vis a vis collision with the floor. Planetary motion also drifts out of simple orbits. I just picked the ball bouncing because its a more amusing visual.


RK4 is about as unstable. The backwards Euler method is baby's first stable ODE solver. If you want to make RK4 stable, you have to change it into the implicit RK4 method.

However, the Lorenz system isn't stiff, which is usually what you use stable solvers for, nor any of the other systems here, I believe. RK4 should be fine, or even normal Euler with a reasonable step size. The chaos you see in these systems is not due to numerical inaccuracy. They're inherently chaotic. That's what makes chaos a subject worth studying.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: