Science on the Squares: Mutual information in Chess

In a previous post, I introduced the topic of mutual information. This is a statistical analysis technique that can be used to determine if there are a connection between two variables.

Mutual information finds many applications, including analysis of biological sequences (this is the use I am most familiar with). Last time, I questioned whiter or not this technique can be applied to chess in any meaningful way. I think the answer is a partial yes, and here I will share with you my initial exploration of this idea.

Connections between Chess information

Mutual Information is a technique which detects a connection or dependence between two variables. The two variables can also be thought of as different sets of information. What kind of information exists in Chess? A couple of things come to mind: the position of the pieces, the squares they control, the moves played, the sequence of the moves within the gamescore. Even the time at each move, if recorded, or a computer's evaluation of the position can be thought of as information. All of these bits of information related to a particular position or game can be collected, analyzed, or sorted. But are there connections between different bits of chess information?

One example of a connection between information in Chess occurs in the opening. Consider the following two game sequences:

A) 1.d4 Nf6 2.c4 g6 3.Nc3 d5

B) 1.d4 Nf6 2.c4 g6 3.Nf3 Bg7 4.Bg5 O-O 5.Nc3 d5

The above two lines are both variations within the Grunfeld complex of openings. Here, one could argue there is a connection between the move Nc3 by White and …d5 by Black. Usually, Black will only play d5 once White has played Nc3. The logic behind this move is to prevent White from playing e4, which was threatened by the move Nc3. In this particular sequence, Black does not want to play ….d5 prematurely. For example, in line A, if Black plays 2…d5 play might follow 3.cxd5 Nxd5 4.e4, when Black loses time with the Knight and White establishes a strong center. Contrast this with delaying …d5 until after Nc3; 3.Nc3 d5 4.cxd5 Nxd5 5.e4 Nxc3. Here Black is able to exchange off the Knight without loss of time, making his own position less cramped.

In the early moves of the Grunfeld, there is a connection between d5 and Nc3; namely, these two moves are connected due to the e4 square and other features of the position (the Knight on f6). Mutual information analysis may be able to uncover this, and possible other connections. Presumably, this can be done by collecting a large set of high-level games, perhaps of a particular opening variation. The varying moves, or piece placement can be compared against each other using mutual information; a positive score is likely to be found between the Nc3 and d5 moves. However, carrying out this analysis, even just gathering the data in the necessary format, is a challenge.

Other phases of the game are probably amenable to this same approach. For example, there are known connections between piece placement or squares in certain endgames. As an example, I have analyzed the mutual information for a simple King and Pawn versus King endgame.

An example: Drawn positions in KPK

In the following example, I started with a position in which the White king was placed on e4, a dominant central square. From this, I then varied the position of the Black King and White pawn, with the pawn occupying any square within the 7th, 6th, and 5th rank, and the King occupying any square from the 8th to 5th rank (excluding any illegal squares, such as d5, e5, and f5).

However, to limit the possibilities to be considered I only considered positions which were drawn (with White to move). Therefore, if the White pawn was on a7, there were only two possible squares for the Black king (a8 and b7, to control the queening square.

In this way, I manually created an exhaustive list of positions (164, to be precise), double checking some of the trickier ones with Stockfish to verify they were drawn (Mostly done to limit the time I would need to analyze each one; I was pretty sure of my King and Pawn endgame technique!).

From this dataset, I could ask several different questions about the chess information it contained. For example, is there a connection between the frequency two squares are occupied (one with a King, one with a Pawn). Since there is a functional relationship between the two, I would expect to find mutual information between squares (for example, between a8 and b7).

The results largely confirm this, and several features of the position can be gleaned from the data. Presented below is a grid of mutual information scores, normalized by the score of the diagonal (similar to the analysis I performed in a previous article).

Click on the image for a larger version

In the above grid, the score for the mutual information between two squares (one on the X-axis, the other on the Y-axis) is shown. This is actually a score normalized based upon the diagonal, emphasized by the step-like pattern, since the diagonal will (by definition) has the highest scores. Around the diagonal, I have boxed the files in groups of eight for convenience. This grouping reflects the fact that the H-file does not wrap around and connect to the A-file, but rather the two contain squares which are as far apart on the chessboard as you can get.

It is also worth noting that the pattern is symmetrical, but due to the way I normalize the scores, the numbers may differ slightly across the diagonal.

Their are several salient features from the above, which reflect the nature of the position on the chess board. First, you might notice that in the top left corner, where squares on the eight rank are compared to each other, there is only a score for the diagonal. This is because only the King can occupy these squares. No two squares on the eight rank can be occupied in the same position, and thus there can be no information connection between them.

Beyond this first box of 64 pairing (i.e. starting now with the pair A7-A7, there are mutual information between pairs surrounding the main diagonal. For example, B7-C7, and D7-E7 pairs give positive scores. This is a reflection that, other than sitting in front of a pawn on the 7th rank, the only other way for the Black king to secure the draw is to be immediately adjacent to the advanced pawn. These positive scoring pairs are highlighted blue on the grid.

In addition to the main diagonal, there are also other diagonals present. When the Black king is directly in front of the pawn on the 7th rank, a connection between the two squares is observed in the diagonal from pair A7-A8 to H7-H8. This is a single diagonal, since there is only one square available for the King to be in front of a pawn on the 7th.

As this diagonal extends into pairs on lower ranks (A6-A7 to H5-H6), scores off of the diagonal are positive. For example, A6-A7 is positive, and so is A6-C7. This result, and some of the subsequent observations, are a product of the square of the pawn rule. The less advanced a pawn is, the further away the King can be to stop it. A pawn on the 7th can only be stopped if the King is adjacent to it, but a pawn on the 6th can be stopped by a King at most one file away. This is seen by the positive score for the A6-C6 pair as well.

The connection between the a6 square and the c6 and c7 squares is the b7 square. However, when there is a pawn on a6, it would be illegal for a King to be on b7. Therefore, A6-B7 (or B6-A7, B6-C7, etc.) pairs exhibit no score. This is highlighted on the grid as orange squares, and is the reason why the positive scores around the diagonal take on a broken pattern.

The connection between the pawn on the 6th and a King that is one file away is lost for the middle files, because the White King would then have the ability to out-flank his counterpart. This is shown by the purple shading, for example, pairs D6-B7 or E6-C7. In this case, the White King can reach an important square or take the opposition before Black.

For example, with a White King on e4, a Pawn on e6 and a Black King on c7, the game is lost for Black even though his King is within the square of the pawn. This is shown by either of the following sequences:

A) 1.Kf5 Kd6 2.Kf6 Kc7 3.e7 Kd7 4.Kf7 and with e8 under White's control, the game is lost for Black

B) 1.Kd5 Kd8 2.Kd6 Ke8 3.e7 Kd7 4.Kf7 and with e8 under White's control, the game is lost for Black

Moving further from the central diagonal, you can seen a banded pattern from A6-A8 to H5-H7, and another from A5-A8 t o H5-H8. These diagonals are broader than the other two. Again, this is an effect of the square of the pawn rule. When the pawn is on the 5th rank, for example a5, the Black king can be up to four files away from it. This is highlighted on the grid by yellow squares.

The width of these diagonals, however, changes. There are less positive scoring pairs for the middle files, as shown by the green shaded boxes. This is due to the presence of White's King on e4. When the White King is close to the pawn, the Black King cannot afford to occupy a position that is on the outside of the square of the pawn (Which normally would be drawn).

Conclusion

Hopefully, the analysis above convinces you that mutual information can be applied to Chess. Gaining deeper insights in more complex endgames, or openings, will require much larger datasets. There exists large tablebases, and collections of millions of games. However, subjecting these to mutual information analysis will require some clever tricks, which are beyond my capabilities at the moment. If anybody has a tablebase or bitable uncompressed into a series of positions (in FEN), please let me know! I can have a setup that enables me to subject lists in that format to this analysis.

If you try to perform this analysis yourself, please note that there a number of important considerations. For example, in the King and Pawn versus King example, there are different ways I could represent the connections. I chose to look for frequency of the piece on a particular square. If I had done the reverse, the results would have been much less informative (It would have been a two by two grid, with some moderate positive score between King and Pawn, which would be something approaching the average of all the off-diagonal scores in the grid presented here).

In the King and Pawn versus King example, meaningful results were only obtained because I examined a subset of positions (only draws). Considering all possible positions in a King and Pawn versus King (including wins for Whites) would have eliminated any observed effect. Therefore, it is necessary to have a culled dataset in order to gain insight. This is similar to what is done when mutual information is applied to biology; in those cases, the possibilities are shaped by selective pressures of evolution, making functional connections important.

What sort of connections between bits of chess information would you like to see explored? Please share your thoughts with a comment!