Tag Archives: AlphaZero

Game Changer?

This review has been printed in the April 2019 issue of Chess Life.  A penultimate (and unedited) version of the review is reproduced here. Minor differences exist between this and the printed version. My thanks to the good folks at Chess Life for allowing me to do so.

—–

Sadler, Matthew, and Natasha Regan. Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI. Alkmaar: New in Chess, 2019. ISBN 978-9056918187. PB 416pp.

Sigmund Freud once described the “three severe blows” suffered by human narcissism in the course of Western history.[1] The cosmological blow, struck by Copernicus, expelled us from our supposed place at the center of the universe. Darwin’s biological blow denied us the comfort of our separation from, and superiority over, the animal kingdom. And Freud himself landed the final, psychological blow, exposing the irrational unconscious forces beneath even the greatest achievements of human rationality.

To these three psychic wounds chess players can add a fourth: Garry Kasparov’s defeat at the hands of Deep Blue in 1997. Deep Blue’s victory was portrayed in the mass media as a referendum on human intelligence, a ‘canary in the coalmine’ moment in which the inevitable overtaking of human creativity by machine intelligence was made manifest.

Curious thing, though. What was imagined as an antagonistic relationship between man and machine has instead proven to be a constructive one. Sure, humans have given up trying to beat Stockfish or Komodo, even at odds, but our ‘metal friends’ (Tukmakov’s delightful turn of phrase) are now our trusted analytical partners and teachers.

Far from killing our game, chess in the e-sport era now depends on the presence of engines, which play the role of the hole-cam in the poker boom. They give the illusion of prescience, allowing amateurs the heady feeling that they know more than the players themselves.

So imagine the shock when a scientific pre-print appeared on the Internet in December 2017. Its title, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” was anodyne enough, but the paper announced a seismic shift in artificial intelligence and chess. AlphaZero, a program created by a Google subsidiary known as DeepMind, trounced Stockfish in head-to-head play. In doing so it forced us to rethink everything we know about computer chess.

The principles governing Stockfish’s play are not fundamentally different than those guiding Deep Blue, although they have been profoundly refined in the intervening years. Stockfish uses human-tuned criteria to evaluate each position in its search tree, and through “alpha-beta” search methods it is able to focus on promising continuations while pruning away inferior moves. Each move and each decision are the result of precise mathematical calculations, and human users can extract exact numerical evaluations for any given position.

AlphaZero is different, as Matthew Sadler and Natasha Regan lucidly explain in their new book, Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI. Pre-programmed with only the basic rules of chess, and using general (non-specific) self-training algorithm, AlphaZero trained itself to play chess over the course of nine hours and 44 million self-play games. Periodically the program would refine its neural network, promoting tunable weights and network ‘layers’ that led to favorable outcomes, and demoting those that didn’t.

AlphaZero functions by combining these self-taught evaluative values with a Monte Carlo style tree search, where possible future game positions are spun out, evaluated, and ranked probabilistically. We don’t know exactly how AlphaZero decides what to play. The algorithm is a ‘black box’ in the Latourian sense, where inputs and outputs are known but (in contrast to Stockfish) its internal mechanisms remain opaque, even to DeepMind. What we do know is that AlphaZero is immensely, improbably strong, exhibiting an attractive attacking style reminiscent of Kasparov.

Perhaps this is what makes AlphaZero so remarkable – its style. What we see in its victories over Stockfish should, given all we know about computer chess, be impossible. Stockfish is typically seen as a calculative god and defensive wizard, able to soak up pressure, induce errors, and grind down its opponents. AlphaZero defeated it by playing the kinds of attacking, sacrificial ideas that, played by humans, would inevitably be refuted by the machine.

Sadler and Regan spend two chapters of Game Changer describing the technical aspects of AlphaZero’s self-training regiment, the way it “thinks,” and what its evaluations and expected scores mean. Their extensive access to the DeepMind team and the algorithm allow them to craft accessible explanations of difficult subjects, and the mini-interviews with DeepMind team members are helpful.

The meat of the book, however, focuses squarely on AlphaZero’s style. What makes it so good? How can we reverse-engineer the logic of its moves and apply that knowledge to our own games? By studying the roughly 230 publicly available AlphaZero games, along with approxmiately 2100 additional games provided by DeepMind, Sadler and Regan distill a number of tantalizing traits in AlphaZero’s play.

An example is useful. Consider this game, which Sadler describes as “perhaps AlphaZero’s most beautiful game of all.”[2]

NIMZO-ENGLISH (A17)
AlphaZero
Stockfish 8
AlphaZero v. Stockfish Match, 2017

1.Nf3 Nf6 2.c4 e6 3.Nc3 Bb4 4.Qc2 0–0 5.a3 Bxc3 6.Qxc3 a5 7.b4 d6 8.e3 Ne4 9.Qc2 Ng5 10.b5 Nxf3+ 11.gxf3 Qf6 12.d4!?

Sadler and Regan expected 12.Bb2 Qxf3 13.Rg1 but AlphaZero instead plays for long-term compensation.

12. … Qxf3 13.Rg1 Nd7 14.Be2 Qf6 15.Bb2 Qh4 16.Rg4!?

Giving up the h-pawn to open the file. Stockfish sees this position as better for Black, while AlphaZero thought that White had a slight advantage.

16. … Qxh2 17.Rg3 f5 18.0–0–0

Offering pawn number three!

18. … Rf7

After 18. … Qxf2 19.Rdg1 Rf7 20.R1g2 Qe1+ 21.Bd1 White’s compensation is undeniable.

19.Bf3 Qh4 20.Rh1 Qf6

image

What does AlphaZero have for the two pawns? Two half-open files and massively superior mobility. This is a key idea for Sadler and Regan. As they explained in a conference call for chess journalists – the first such promotional call I’ve been on for a chess book! – the concept of mobility is fundamental for understanding how AlphaZero plays. It works to maximize the mobility of its pieces and minimize the mobility of its opponent’s. One of AlphaZero’s most striking tendencies, the pushing of its rook pawns to restrict the opponent’s king, is emblematic in this regard. Here, having opened lines for its rooks, AlphaZero now proceeds to open diagonals and further increase its mobility.

21.Kb1 g6 22.Rgg1!? a4 23.Ka1 Rg7 24.e4 f4 25.c5 Qe7 26.Rc1 Nf6 27.e5 dxe5 28.Rhe1 e4 29.Bxe4 Qf8

This is a key position in both the game and the book. Sadler and Regan use it to illustrate AlphaZero’s “thought processes” in Chapter 4.

30.d5!

AlphaZero sacrifices another pawn to open the a1–h8 diagonal!

30. … exd5 31.Bd3! Bg4 32.f3 Bd7

White’s initative grows after 32. … Bxf3? 33.Rf1 Be4 34.Rxf4.

33.Qc3 Nh5 34.Re5

AlphaZero rates its winning chances at 80.3% here. (It evaluates positions by win percentage in Monte Carlo game rollouts.) Stockfish 8 thinks White is significantly better, but newer versions of the engine more clearly understand the danger.

34. … c6 35.Rce1 Nf6 36.Qd4 cxb5 37.Bb1 Bc6 38.Re6 Rf7

Stockfish hopes to return some of its material advantage and weather the storm. AlphaZero does not oblige.

39.Rg1 Qg7 40.Qxf4 Re8 41.Rd6 Nd7 42.Qc1 Rf6 43.f4! Qe7 44.Rxf6 Nxf6 45.f5 Qe3 46.fxg6 Qxc1 47.gxh7+ Kf7 48.Rxc1 Nxh7 49.Bxh7 Re3 50.Rd1 Ke8 51.Ka2 Bd7 52.Bd4 Rh3 53.Bc2 Be6 54.Re1 Kd7 55.Kb2 Rf3 56.Re5 Rg3 57.Re3 Rg2 58.Kc3 Rg4 59.Rf3 Ke8 60.Rf2 Rg3+ 61.Kb4 Rg4 62.Rd2 Bd7 63.Ka5 Rf4 64.Be5 Rf3 65.Rd3 Rf2 66.Bd1 Bc6 67.Kb6 1–0

One can’t help but feel as if a superior, alien intelligence has taken the White pieces and opened a new vista on to our beloved game.

Part III of Game Changer brilliantly distills some of the key features of AlphaZero’s attacking prowess. We see, through detailed analysis and clear explanation, how AlphaZero values outposts, why it rams ‘Harry the h-pawn’ forward, how it plays on color complexes and sacrifices for what Kasparov called quality. Part IV, devoted to AlphaZero’s opening choices, is less successful. The authors laud AlphaZero’s novel handling of the White side of the Carlsbad structure, for instance, but the game they cite departs from theory on the sixth move, rendering much of the fine preparatory explanation useless.

Game Changer is an excellent book, fully deserving of the critical praise it has received. Sadler and Regan patiently explain the technical minutia for a non-technical audience, and their attempts to divine the essence of AlphaZero’s style are clear and convincing. Until DeepMind succeeds in “recovering back” AlphaZero’s implicit heuristics through some secondary algorithm, this treatment is as good it gets.

What remains less settled, at least in my mind, is the issue of the book’s title. Is AlphaZero really a game changer? Does its advent herald a revolution in chess?

DeepMind’s novel computational solution – AlphaZero’s self-learned strength and style – is as disruptive today as Deep Blue’s brute force approach was in 1997. Both reconfigured our understanding the possibilities of computer chess and, truth be told, of chess itself.

This, unfortunately, does not exhaust the two project’s similarities. AlphaZero seems doomed to a life behind corporate bars much like its august predecessor, hidden away from the public in the interest of protecting trade secrets. And as with Deep Blue, AlphaZero’s influence on chess will be as a consequence be limited.

I suspect that the real game changer will be Leela Chess, an open-source project that mimics AlphaZero’s self-learning algorithm. Because it is open-source, like the now ubiquitous Stockfish, Leela can be used by anyone without cost. Players can train with Leela, use it to analyze their games, and test their ideas against it. The democratization of chess information that began with Robert Hyatt’s Crafty, Mark Crowther’s The Week in Chess, and Stockfish continues with Leela, and the chess world will be much the richer for it.


[1] See Freud’s Compete Psychological Works (Standard edition, ed. Strachey), volume 17, p.139-141.

[2] https://www.youtube.com/watch?v=0RuIHfNcPO0