Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Grand-Master Level Chess Without Search: Modeling Choices and Their Implications (gist.github.com)
102 points by georgehill on Feb 10, 2024 | hide | past | favorite | 47 comments


I agree with the author’s critique of maximalist interpretations of the paper.

I’ve built a chess engine in the past so I was less hung up on the “without search” component like a lot of the original thread seemed to be. I don’t care if I have to take an argmax over all possible moves in a position and whether that technically qualifies as a search, or if the model may be doing some implicit search within its weights. What matters to me is that it is several orders of magnitude lower search required than stockfish, and those operations cost much less because they’re being executed in parallel on a TPU (GPU also works).

Optimizing static evaluation for the hardware we have is still a super interesting topic. It’s arguably why deep learning took off: matrix multiplication happens to be really cheap on GPUs.


> It’s arguably why deep learning took off: matrix multiplication happens to be really cheap on GPUs

Multiplication of large matrices is intrinsically simple to parallelize, and in addition it has great data locality and a very regular structure. It's just a great operation to perform efficiently on a parallel processor.


This looks like a very solid critique of Deepmind's recent paper, or how it mislead people into regarding its results as much stronger than they really are. However I think the autor goes a bit astray at

> The paper mentions that the model could not follow the "don't repeat the same board position three times in a row" rule (aka "threefold repetition").

That is not the rule. It is not illegal to repeat a position for a third time.

The rule only says that a player can claim a draw in the case of a three fold repetition [1].

[1] https://en.wikipedia.org/wiki/Threefold_repetition


Fair point! Usually in online chess, it auto-draws the game. I guess there's an assumption that that's what the players want / not running out of time in blitz games.


Slight inaccuracy. I don't think it autodraws on your behalf right away, but if you repeat it even more(there may also be a setting to automatically claim a draw on 3, not sure). Not sure what the different servers do exactly, but i think based on experience that's what's happening.

There is a little known rule in tournament chess that if a position is reached 5 times, an arbiter can step in and rule the game as a draw without a claim. Usually only comes up in scholastic tournaments and very rarely in blitz tournaments.


The fivefold repetition rule says that the game is drawn in the event of a fivefold repetition, regardless of actions by the arbiter or either player. (Indeed, regardless of whether an arbiter is present.) So if it's later discovered to have occurred in a game which White apparently won by checkmate, then the win is void. The rule also places a duty on the arbiter to intervene, but a failure on the arbiter to intervene has no bearing on the outcome.


My bad, I did not know those details. I don't think I've ever heard of a case where it's happened this way in tournament play. I bet it has though.

Do you happen to know if this can be applied even several rounds later, in a swiss format? In most tournaments, even if there are score sheets, the arbiters rarely look over the game if the players agree on the result. And sometimes I don't look over my own games until a few days later if I'm disgusted enough with how I played...


This seems to be the most important point:

> it looked at a game position, then used a strong chess-engine to expand a tree of possible game continuations from this position spanning hundreds of thousands if not millions of moves into the future. The game engine then used its internal knowledge of chess to assess the percentage of winning board positions within the possible continuations, which is the "probability of win" for a position. Then, the learner tried to learn this number. This is not learning from observations, it is learning from a super-human expert.

Apparently it was able to summarize the knowledge of that expert pretty well, though.


This article seems completely off-base to me. The fact that the training method can't learn 'no 3 repeated moves' doesn't change my opinion of the result at all; to me this is just a technical detail, and does not retract from the essence of the result (that you can get good performance without hard-coding search). But the author harps on this point over and over as if it's a fatal flaw.

And then we get some rant about how 'it's not really learning how to plan' because it doesn't know to win in as few moves as possible. But again this just doesn't matter! The model was trained to maximize its odds of winning. You don't get to make up a new rule that you're supposed to win quickly. Totally beside the point. But somehow this observation is taken to mean that 'well it's not actually planning/reasoning/understanding. It's just imitating its training data.' Such statements are usually operationally meaningless, and the argument could just as well apply to human learning.

Point taken that the training method can't learn arbitrary rules (this is a good point, and it sounds like it needed to be made, though I didn't see the Twitter-storm myself). But the tone of this article irked me a bit.


You misunderstand. It's not that it just fails to win quickly, it fails to win at all in those positions.


Grandmasters do search! They think many moves ahead for most moves, and obviously StockFish does search -- a lot of search, much more than a grandmaster.

I feel that the sort of structures we implicitly operate over during search can be usefully "flattened" in a higher dimensional space such that whole paths become individual steps. I feel that implicitly that's the sort of thing that the network must be doing.

If you watch videos of grandmasters talking about chess positions, often they'll talk about the "structure" of the position, and then confirm the structure via some amount of search. So, implicitly grandmasters probably do some kind of flattening too which connects features of the present state to future outcomes in a way requiring less mental effort.


>they'll talk about the "structure" of the position, and then confirm the structure via some amount of search

Yes that's sort of the point of the approach. Imagine when the board is any position, you have a list of all possible moves which are ranked based on how good they are (i.e., how likely they are to lead to you winning). In order to make this ranking you've "flattened" the future outcomes of that move into a single number which is used to rank the list. Then, you can play optimally without search: simply pick the best move at every board position. (Richard Bellman showed that, for the right definition of "best", this gives the optimal strategy for playing entire games, not just single moves.)

Now the question is, how does one actually calculate this sorted list of the best moves? The straightforward answer is essentially search and various techniques to make the search computationally tractable. What Deepmind's approach is use such a list computed by Stockfish (which explicitly searches) to compile a dataset which is then trained on in order to approximate the function that produces this ranking (called the Q-function). But the point is that search is just an algorithmic strategy to estimate the Q-function, mathematically speaking it is merely a function of the current board state and therefore in theory could be determined without any search. So, maybe a deep neural network could learn how to do this?


I don't think you've understood what I meant by "flattening". I wanted to imply something like an embedding of search paths into a high dimensional space within which a point in the space represents a whole path, and the proximity of paths implies similar realisations.

I'd imagine one could directly realise this by training a network with chess games and the distances between them -- by some useful path related measure -- as inputs. I think that the network would succeed in learning an embedding for chess games. I wouldn't be surprised if such an embedding could be fine tuned to output win probability instead for example.

I also wouldn't be surprised if you could train a network on an auto-completion task of one-move-ahead prediction, and then using that to assign win probabilities, much in the same way that LLMs are used for part of speech tagging.


Right, and this approach is more or less what experienced chess players do. We don’t evaluate every possible move; we look at the board and use our intuition and experience to immediately discard most candidates and choose a few options to prioritize focus on.

In many positions there’s surprisingly little actual computation going on.


I expect this is particularly borne out when there is no obviously good move, but many options. The grandmaster will "see" implications without calculating and then confirm them.

I think most strong chess players can "evaluate" a board and decide who is stronger to within a small error compared to an engine score. I think if you were to test this by giving strong players N seconds per position -- not enough time to calculate to any depth -- it would be shown that many are within a small (but consistent) margin of error compared to an engine.


Absolutely. A grandmaster will more or less instantly understand the core themes of any real position you put in front of them, and they usually can even guess with very high accuracy how the position was reached in the first place. The better the player, the more accurately they will perceive the nuances in a wider array of positions.

This process happens in much the same way as facial recognition. For most faces, we just “know” who the person is. We don’t consciously search a database of acquaintances, filtering by facial features, eye color, hairstyle, etc. We see and we recognize.


People misunderstand this paper. The point is not to create a strong chess engine. We already have those. It's also not to train it in a smart way. It's just supervised learning.

The whole paper is just an argument against people who still think of transformers as "stochastic parrots".

Probably it's even a mainly internal Deep Mind debate, since people outside have long ago accepted that transformers obviously learn algorthms and do something akind to planning.

You can see this undertone many places in the paper, like when they emphasize the systems plays strongly without _explicit_ search. Or if you read the conclusion.


Can you explain why you think the paper is evidence against stochastic parroting? To me this result seems even less sophisticated than predicting the next move directly GPT style (and that’s its charm since it’s much more data efficient than training that model would be). Isn’t this model just simply predicting the most likely next move that stockfish would play?

I actually think their conclusion is mis-stated. Instead of saying this shows transformers aren’t mere statistical pattern recognizers but are powerful algorithm approximators they should say that it shows statistical pattern recognition trained on enough samples is sufficient for approximating sophisticated algorithms.


Would be interesting to see how well it would fare in situations where you have to prove your knowledge of patterns regarding specific pieces outside the context of traditional chess. What I mean by this is playing Chess960 or solving a tactics problem that starts from a traditionally unreachable position, but otherwise adheres to the original rules.

Even doing these things at an average level would be pretty mind blowing.


It's much less impressive when its given an explicit internal model (stockfish move probabilities).


Did you read the blog post? The whole point of the blog post is that the model does not demonstrate that it's learning in interesting ways (as strongly as people make it out).


> Did you read the blog post?

Guidelines meditation for you.

https://news.ycombinator.com/newsguidelines.html

"Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that". "


Fair, too late to edit it now.


Recent and related:

Grandmaster-Level Chess Without Search - https://news.ycombinator.com/item?id=39301944 - Feb 2024 (128 comments)


This is an excellent summary of some of the myriad problems with the paper. Especially the highly dubious claims of grandmaster level play. And also of a lot of misconceptions about things the paper didn't(and didn't claim to afaik) do at all.

It's a shame it's not getting as much traction on here as the paper itself. Once again, Deepmind's shameless embellishments of their research will become HN myths because most people will only have seen the paper and not this.

This is of course a general trend; critiques and even retractions rarely get as much attention as the hyperbolised form of the original research and while the field itself moves on, the laypeople come out of it less informed than they were before.


The paper doesn't have myriad problems. It doesn't shamelessly embellish. It's pretty straightforward. It highlights the key shortcomings of the approach. It's just a paper. Take it at face value.

It's also understood by people who have ever looked at the area (which is the audience of academic papers). 'Without search' is understood.


The title of the paper is literally a lie. It's not grandmaster strength, and it is using Stockfish explicitly during the games, because the network they trained fails to finish games properly. Instead of presenting that result, they modify it with Stockfish to get stronger results, so they can claim "grandmaster strength" in the title.

They claim it outperforms Alpha Zero's policy network when Alpha is not the state of the art for its architecture anymore and hasn't been for years. Leela is. A fair comparison would use Leela, not Alpha Zero.

And of course, as usual, they seem not to provide any source code or weights, so no one can check their research independently.


Would you be fine if it were titled:

"2895 lichess blitz elo without search (except for edge cases due to the threefold rep rule)" ?

The non optimal endgame play wouldn't really be a problem except for the threefold rule, in practice (i think). It would eventually bumble around and win.


No. Much more interesting would be to eliminate this calling out to Stockfish entirely. It's just not just a matter of the title, the data in the paper is fundamentally tainted by it. You get less data on how the modell actually plays this way, how strong it really is, and how it could be integrated with search in all positions, which the obvious next step(i.e the model + some search algorithm, not simply calling out to another engine).

The fact that they're willing to do that just to get a better headline is pretty telling.


That title would call out the edge cases.

I don't think it is tainted. In practical terms, those games have been won.

Also, the idea of this + search as being the next step is wrong. That was the last step, ie it's been done. No search is the next step.


In chess, games are never won until checkmate or resignation by the opponent. If it fails to get there, it just didn't win the games. It doesn't matter how winning the position was at any other point during the game. That's why it's tainted. The results are changed, and the games themselves are changed. Therefore the comparisons to humans and other engines is changed. And hence, what we learn about the approach presented is diminished and obsfuscated.

I think wrong is a strong word here. You can keep making a bigger model, that's one direction to go for sure. But adding search is certainly an interesting one at the very least. And it's definitely the way forward to see if this is an approach that can advance the strength of chess playing programs. I seriously doubt this would ever outperform a search based paradigm on its own. For one thing it gets no advantage from extra time(which is another reason the grandmaster strength claim is dubious btw; it would be outplayed in longer time controls). For another, search might be necessary to make this thing able to train without the help of Stockfish, by playing itself(iterated amplification and distillation, how AlphaZero was trained). For a third, due to the game complexity of chess, it's likely that model size without search would have to scale exponentially to become stronger.

And all this brings me back to my previous points about releasing weights and code. If they had done so, I would be trying all of these things out myself right now instead of arguing about it on the internet. But time and time again Deepmind have demonstrated that they won't engage with the wider computer chess community. And so their research, which is no doubt innovative and inspiring, fails to have the impact it could have, because the community has to redo all the work with less money and compute. We saw this with Alpha Zero; it took years and a lot of hard work by volunteers for Leela just to get where Alpha Zero was originally and eventually surpass it. And it will probably take a long time to see state of the art competitive engines based on this new approach in the wild. Because Deepmind don't really care about that beyond making a splash with their initial publication. They're only peripherally interested in chess, which is fine, but as a chess player and computer chess enthusiast it makes me sad that they won't engage with the community in an open way.


Adding search isn't an interesting direction. It's been done. AlphaZero has search. Leela has search.

You are right though, they are only peripherally interested in chess.


I made several arguments why search is interesting. Just repeating that it's not, is not really a counter-argument.

Also, the fact that AlphaZero has search is irrelevant. This is a completely different model architecture. Stockfish had search before AlphaZero, does that mean AlphaZero wasn't interesting either? Or hell, why not just say all computer chess developments are uninteresting because alpha-beta was invented in the 50s?


Fair point, it could still be interesting.

I guess I'm saying their goal here was to do no search and get high performance, not create the highest possible performance. The trade off between run time search and more training (better models) is an explicit area of research, and this paper is in that realm. Noam Brown has talked a lot about this in the context of poker and diplomacy.


If anyone wants to tackle it, the challenge of creating a superhuman model that doesn't use search during inference is still open. I think I have some ideas, not that I could do it without a lot of computing.


What are your ideas?


Autoencoder for piece placement, active color, castling availability and en passant target square and halfmove clock for the fifty-move rule, to get a single vector to encode these informations, these encode everything you need except the threefold repetition, for that I was thinking of simply encoding the previous reversible halfmoves in the cross attention (the order shouldn't matter), 50 halfmoves is the most you can have online because of the fifty-move rule or it's a draw. This should fix the problem that the model developed by Deepmind has by being blind to threefold repetition, a better objective that consider the length of the checkmate sequence should make the model more decisive in the face of overwhelming victory and it's knowledge of the threefold repetition should discourage the model to going back and forth on a winning position.


Slight but perhaps significant correction, the 50 move rule only takes effect if no piece has been captured and no pawn moved. Online games can definitely be more than 50 moves.


Yeah, of course it was in the context of reversible halfmoves.


I think you could get pretty far just by learning embeddings for board positions and making moves from similar board positions during a game. Essentially a retriever-only system.


What is the bot's username? I don't see it referenced in the paper, and in searching the Lichess database using the games offered in the paper, nothing shows up.


I wonder how well you can train a similar model on a finite number of outputs from a poker solver to output answers for any poker spot in general.


I've been generating a dataset of heads-up flop solutions for the last month or so on a cheap digitalocean node. So far I have 53,000 solutions.


I find it fascinating, how some comments find this insightful and others are almost offended by this contribution.


The big problem with that paper is that grandmaster level chess is actually really bad for a chess bot. Its elo is a full 400 lower than Stockfish[0], which is the same distance between a grandmaster and a very strong untitled player. Oddly enough, I wrote a blog post a few months ago that's tangentially related to this topic[1]. The gist of it is that the search tree for a chess engine is the most important part, since a perfect solution for chess would just be a brute-force search of the entire game tree. Trying to get around that by building a better evaluation function is just a waste of time and money.

[0] https://arxiv.org/pdf/2402.04494.pdf

[1] https://matteoraso.github.io/board-game-engines-are-about-tr...


When NNUE replaced the human heuristic of the evaluation function in Stockfish 12 the model got a huge jump in playing strength (around 100 Elo), so building a better evaluation function is definitely not a waste of time and money.


I worded that really badly. Obviously, making a better evaluation function is good, but you can't try and use it to replace the search tree.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: