Yes, "fucking" stood out for me, too. The rest of the text very much has the feel of AI writing.
AI agents routinely make me want to swear at them. If I do, they then pivot to foul language themselves, as if they're emulating a hip "tech bro" casual banter. But when I swear, I catch myself that I'm losing perspective surfing this well-informed association echo chamber. Time to go to the gym or something...
That all makes me wonder about the human role here: Who actually decided to create a blog post? I see "fucking" as a trace of human intervention.
I began studying 3-manifolds after coming up with a novel way I preferred to draw their presentations. All approaches are formally equivalent, but they impose different cognitive loads in practice. My approach was trivially equivalent to triangulations, or spines, or Heegaard splittings, or ... but I found myself far more nimbly able to "see" 3-manifolds my way.
I showed various colleagues. Each one would ask me to demonstrate the equivalence to their preferred presentation, then assure me "nothing to see here, move along!" that I should instead stick to their convention.
Then I met with Bill Thurston, the most influential topologist of our lifetimes. He had me quickly describe the equivalence between my form and every other known form, effectively adding my node to a complete graph of equivalences he had in his muscle memory. He then suggested some generalizations, and proposed that circle packings would prove to be important to me.
Some mathematicians are smart enough to see no distinction between any of the ways to describe the essential structure of a mathematical object. They see the object.
I'm a mathematician relying heavily on AI as an association engine of massive scope, to organize and expand my thoughts. One doesn't get best results by "testing" AI.
A surfboard is also an amazing tool, but there's more to operating one than telling it which way to go.
Many people want self-driving cars so they can drink in the back seat watching movies. They'll find their jobs replaced by AI, with a poor quality of life because we're a selfish species. In contrast Niki Lauda trusted fellow Formula 1 race car driver James Hunt to race centimeters apart. Some people want AI to help them drive that well. They'll have great jobs as AI evolves.
Gary Kasparov pioneered "freestyle" chess tournaments after his defeat by Big Blue, where the best human players were paired with computers, coining the "centaur" model of human-machine cooperation. This is frequently cited in the finance literature, where it is recognized that AI-guided human judgement can out-perform either humans or machines.
Any math professor knows how to help graduate students confidently complete a PhD thesis, or how to humiliate students in an oral exam. It’s a choice. To accomplish more work than one can complete alone, choose the former. This is the arc of human evolution: we develop tools to enhance our abilities. We meld with an abacus or a slide rule, and it makes us smarter. We learn to anticipate computations, like we’re playing a musical instrument in our heads. Or we pull out a calculator that makes us dumber. The role we see for our tools matters.
Programmers who actually write better code using AI know this. These HN threads are filled with despair over the poor quality of vibe coding. At the same time, Anthropic is successfully coding Claude using Claude.
Centaurs are a transient phenomenon. In chess, the era of centaur supremacy lasted only about a decade before computers alone eclipsed human+computer. The same will be true in every other discipline.
You can surf the wave, but sooner or later, the wave will come crashing down.
They are transient only in those rare domains that can be fully formalized/specified. Like chess. Anything that depends on the messy world of human - world interactions will require humans in the loop for translation and verification purposes.
>Anything that depends on the messy world of human - world interactions will require humans in the loop for translation and verification purposes.
I really don't see why that would necessarily be true. Any task that can be done by a human with a keyboard and a telephone is at risk of being done by an AI - and that includes the task of "translation and verification".
Sure, but at the risk of running into completely unforeseen and potentially catastrophic misunderstandings. We humans are wired to use human language to interact with other humans, who share our human experience, which AIs can only imperfectly model.
I have to say I don't feel this huge shared experience with many service industry workers. Especially over the phone. We barely speak the same language!
Mathematics is indeed one of those rare fields where intimate knowledge of human nature is not paramount. But even there, I don't expect LLMs to replace top-level researchers. The same evolutionary "baggage" which makes simulating and automating humans away impossible is also what enables (some of) us to have the deep insight into the most abstract regions of maths. In the end it all relies on the same skills developed through millions of years of tuning into the subtleties of 3D geometry, physics, psychology and so on.
I'm guessing that they were referring to the depth of the decision tree able to be computed in a given amount of time?
In essence, it used to be (I have not stayed current) that the "AI" was limited on how many moves into the future it could use to determine which move was most optimal.
That limit means that it is impossible to determine all the possible moves and which is guaranteed to lead to a win. (The "best" than can be done is to have a Machine Learning algorithm choose the most likely set of moves that a human would take from the current state, and which of that set would most likely lead to a win.
On the other hand, chess is not very financially rewarding. IBM put some money into it for marketing briefly, but that’s probably equal to about five minutes of spend from the current crop of LLM companies.
As far as I can tell based on scanning forums, to the extent humans contribute anything to the centaur setup, it is entirely in hardware provisioning and allocating enough server time before matches for chess engines to do precomputation, rather than anything actually chess related, but I am unsure on this point.
I have heard anecdotally from non-serious players (and therefore I cannot be certain that this reflects sentiment at the highest levels although the ICCF results seem to back this up) that the only ways to lose in centaur chess at this point is to deviate from what the computer tells you to do, either intentionally or unintentionally by accidentally submitting the wrong move, or simply by being at a compute disadvantage.
I've got several previous comments on this because this is a topic that interests me a lot, but the two most topical here are the previous one and https://news.ycombinator.com/item?id=33022581.
The last public ranking of chess centaurs was 2014, after which it is generally held to be meaningless as the ranking of a centaur is just the same as the ranking of the engine. Magnus Carlsen’s peak elo of 2884 is by far the highest any human has ever achieved. Stockfish 18 is estimated to be in excess of 4000 elo. Which is to say the difference between it and the strongest human player ever is about the same as the difference between a strong club player and a grandmaster. It’s not going to benefit meaningfully from anything a human player might bring to the partnership.
Magnus himself in 2015 said we’ve known for a long time that engines are much stronger than humans so the engine is not an opponent.
I'm highly worried that you are right. But what gives me hope is that people still play chess, I'd argue even more than ever. People still buy paper books and vinyl records. People still appreciated handwritten greeting cards over printed ones, pay extra to listen to live music where the recorded one is free and will likely sound much better. People are willing to pay an order of magnitude more for a sit in a theater for a live play, or pay premium for handmade products over their almost impossible to distinguish knock offs.
That centaurs can outperform humans or AI systems alone is a weaker claim than "these particular AI systems have the required properties to be useful for that". Chess engines consistently produce strong lines, and can play entire games without human assistance: using one does not feel like gambling, even if occasionally you can spot a line it can't. LLMs catastrophically fail at iterated tasks unless they're closely supervised, and using LLMs does feel like gambling. I think you're overgeneralising.
There is definitely a gap in academic tooling, where an "association engine" would be very useful for a variety of fields (and for encouraging cross-pollination of ideas between fields), but I don't think LLMs are anywhere near the frontier of what can be accomplished with a given amount of computing power. I would expect simpler algorithms operating over more explicit ontologies to be much more useful. (The main issue is that people haven't made those yet, whereas people have made LLMs.) That said, there's still a lot of credit due to the unreasonable effectiveness of literature searches: it only usually takes me 10 minutes a day for a couple of days to find the appropriate jargon, at which point I gain access to more papers than I know what to do with. LLM sessions that substitute for literature review tend to take more than 20 minutes: the main advantage is that people actually engage with (addictive, gambling-like) LLMs in a way that they don't with (boring, database-like) literature searches.
I think developing the habit of "I'm at a loose end, so I'll idly type queries into my literature search engine" would produce much better outcomes than developing the habit of "I'm at a loose end, so I'll idly type queries into ChatGPT", and that's despite the state-of-the-art of literature search engines being extremely naïve, compared to what we can accomplish with modern technology.
We're in agreement. I understand how much harder it is to "think with AI"; the last year of my life has been a brutal struggle to figure this out.
I also agree that neural net LLMs are not the inevitable way to implement AI. I'm most intrigued by the theoretical underpinnings of mathematical proof assistants such as Lean 4. Computer scientists understand the word problem for strings as undecidable. The word problem for typed trees with an intrinsic notion of induction is harder, but constructing proofs is finding paths in this tree space. Just as mechanical computers failed in base ten while at the same time Boole had already developed base two logic, I see these efforts merging. Neural nets struggle to simulate recursion; for proof assistants recursion is baked in. Stare at these tree paths and one sees thought at the atomic level, begging to be incorporated into AI. For now the river runs the other way, using AI to find proofs. That river will reverse flow.
Lean 4 is not a theoretically-interesting proof assistant. If you're interested in such things, look into Rocq (which uses CoIC, like Lean, but is more rigorous about it), the HOL logic, Isabelle/HOL's automation suite (though Isabelle proper is fairly mediocre, apart from being the thing everyone's standardised around), Lean-auto (https://arxiv.org/abs/2505.14929), and whatever SAT solvers are state-of-the-art this week. Like the tools for symbolic integration and frequentist statistics, there isn't any magic: the power comes from handling enough uninteresting special-cases that we get broad coverage. (Personally, I think there's still a lot of power being left on the table by using overly-general algorithms: sledgehammer is used to crack a lot of nuts, even when that takes quadratic time or longer.)
Lean 2 used HoTT, which was theoretically interesting, but not enough was known about HoTT at the time (in particular, whether it was a constructive logic – I think we have all the pieces for an explicit construction via cubical type theory now, but I don't know that anyone's put the pieces together), so that direction has been mostly abandoned. I think there's useful work to be done in that direction, but with the current state of HoTT pedagogy, I doubt I'd ever be able to keep on top of it enough to contribute; and with Lean 4 taking so much of the funding, I don't think we'll see much work in this direction until HoTT is easier to learn.
I still think you're overgeneralising. What actual thing does your poetic tree / thought / river analogy correspond to?
Those were "let's get experts to manually code every single document according to a schema defined in advance". Nowadays, we have techniques for automatically-generating explicit pseudo-semantic ontology representations from large datasets (see, for example, https://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang... for image classification tasks). Getting a machine learning model to identify field-specific heuristics, map conventions from one field to another, and then constructing an index that allows us to quickly produce a search / proximity metric from an arbitrary specification, was not really possible in the 80s.
"Throw a massive neural network at it" is an extremely inefficient way to get results, and doesn't generalise well – for instance, there's no easy way to get online learning for a transformer model, whereas that capability just falls out of most search engine database systems. (The underlying relational database engines had a lot of work put in to make online CRUD work reliably, but that work has been done now, and we can all build on top of it without a second thought.)
I think you're misunderstanding the point this paper is trying to make. They're interested in trying to distinguish whether AI is capable of solving new math problems or only capable of identifying existing solutions in the literature. Distinguishing these two is difficult, because self-contained math problems that are easy enough for LLMs to address (e.g. minor Erdos-problems) may have been solved already as subcomponents of other work, without this widely known. So when an AI makes progress on such an Erdos problem, we don't know if it had a new idea, or correctly identified an existing but obscure answer. This issue has been dogging the claims of AI solving Erdos problems.
Instead, here you get questions that extremely famous mathematicians (Hairer, Spielman) are telling you (a) are solvable in <5 pages (b) do not have known solutions in the literature. This means that solutions from AI to these problems would perhaps give a clearer signal on what AI is doing, when it works on research math.
I find it unbelievable that this question can't be settled themselves without posting this simply by asking the AI enough novel questions. I myself have little doubt that at least they can solve some novel questions (of course similarity of proofs is a spectrum so it's hard to draw the line at how original they are)
I settle this question for myself every month: I try asking ChatGPT and Gemini for help, but in my domains it fails miserably at anything that looks new. But, YMMV, that's just the experience of one professional mathematician.
> Anthropic is successfully coding Claude using Claude.
Claude is one of the buggiest pieces of shit I have ever used. They had to BUY the creators of bun to fix the damn thing. It is not a good example of your thesis.
You and the GP are conflating Claude, the company or its flagship model Claude Opus, with Claude Code, a state of the art coding assistant that has admittedly a slow and buggy React-based TUI (output quality is still very competitive)
> I'm a mathematician relying heavily on AI as an association engine of massive scope, to organize and expand my thoughts.
Can you share more about your architecture & process? Also a researcher involved in math research (though not strictly speaking a mathematician, but I digress). I've often thought about using AI on my notes, but they are messy and even then I can't quite figure out what to ask: prioritization, connecting ideas, lit search, etc.
You didn't need to make this claim about driving. Coding requires robust metacognition. Driving doesn't, it can be drilled repetitively, and it also benefits from having superhuman senses and instant reaction times. It's somewhat more amenable to AI.
Very well written. Thank you for putting down your thoughts so succinctly; I'm often at a loss for words when I try to express the same thoughts in a coherent manner.
> Isn't a waste to essentially reinterpret an entire program that may be run 5000 times a day?
This is a dated prejudice that I shared.
To get started coding with AI I made a dozen language comparison project for a toy math problem. F# floored me with how fast it was, nearly edging out C and Rust on my leaderboard, twice as fast as OCaml, and faster than various compiled languages.
Compiling could in principle be fastest, if we had compilers that profiled hours of execution before optimizing code, and only then for "stable" problems. No one writes a compiler like this. In practice, Just In Time interpreters are getting all the love, and it shows. They adapt to the computation. My dated prejudice did not allow for this.
Back in 2001 I was the math consultant for "A Beautiful Mind". One spends a lot of time waiting on a film set. Eventually one wonders why.
The majority of wait time was the cinematographer lighting each scene. I imagined a workflow where secondary digital cameras captured 3D information, and all lighting took place in post production. Film productions hemorrhage money by the second; this would be a massive cost saving.
I described this idea to a venture capitalist friend, who concluded one already needed to be a player to pull this off. I mentioned this to an acquaintance at Pixar (a logical player) and they went silent.
Still, we don't shoot movies this way. Not there yet...
Yes. I've been using it today with Zed (a mind-blowing editor, by the way).
One must use an API key to work through Zed, but my Max subscription can be used with Claude Code as an external agent via Zed ACP. And there's some integration; it's a better experience than Claude Code in a terminal next to file viewing in an editor.
One of my side projects has been to recover a K&R C computer algebra system from the 1980's, port to modern 64-bit C. I'd have eight tabs at a time assigned files from a task server, to make passes at 60 or so files. This nearly worked; I'm paused till I can have an agent with a context window that can look at all the code at once. Or I'll attempt a fresh translation based on what I learned.
With a $200 monthly Max subscription, I would regularly stall after completing significant work, but this workflow was feasible. I tried my API key for an hour once; it taught me to laugh at the $200 as quite a deal.
I agree that Opus 4.5 is the only reasonable use of my time. We wouldn't hire some guy off the fryer line to be our CTO; coding needs best effort.
Nevertheless, I thought my setup was involved, but if Boris considers his to be vanilla ice cream then I'm drinking skim milk.
Mathematicians get enamored with particular ways of looking at things, and fall into believing this is gospel. I should know: I am one, and I fight this tendency at every turn.
On one hand, "rational" and "algebraic" are far more pervasive concepts than mathematicians are ever taught to believe. The key here is formal power series in non-commuting variables, as pioneered by Marcel-Paul Schützenberger. "Rational" corresponds to finite state machines, and "Algebraic" corresponds to pushdown automata, the context-free grammars that describe most programming languages.
On the other hand, "Concrete Mathematics" by Donald Knuth, Oren Patashnik, and Ronald Graham (I never met Oren) popularizes another way to organize numbers: The "endpoints" of positive reals are 0/1 and 1/0. Subdivide this interval (any such interval) by taking the center of a/b and c/d as (a+c)/(b+d). Here, the first center is 1/1 = 1. Iterate. Given any number, its coordinates in this system is the sequence of L, R symbols to locate it in successive subdivisions.
Any computer scientist should be chomping at the bit here: What is the complexity of the L, R sequence that locates a given number?
From this perspective, the natural number "e" is one of the simpler numbers known, not lost in the unwashed multitude of "transcendental" numbers.
Most mathematicians don't know this. The idea generalizes to barycentric subdivision in any dimension, but the real line is already interesting.
AI agents routinely make me want to swear at them. If I do, they then pivot to foul language themselves, as if they're emulating a hip "tech bro" casual banter. But when I swear, I catch myself that I'm losing perspective surfing this well-informed association echo chamber. Time to go to the gym or something...
That all makes me wonder about the human role here: Who actually decided to create a blog post? I see "fucking" as a trace of human intervention.
reply