Subject Filter

A scientist's take on the Game of Kings
| Chess Puzzles | Book Reviews | | Annotated Games | Opening Analysis | Science | First Time Here?

Saturday, March 21, 2015

Complete Datasets for Fischer and Carlsen as White

If you read this blog regularly or saw my work featured at ChessBase, you will be familiar with the tools I have developed to analyze square utilization and occupancy, and represent these as heatmaps. In response to a reader comment, I am providing available for download the complete datasets for both Fischer and Carlsen as White. I may eventually post the corresponding datasets of these two world champions playing Black.

This data is being provided for now with very limited annotation (Select 'Read More' to view); it is up to you, dear reader, to decide how to use it, what points to examine further, and what conclusions you can draw from it. If you download these ZIP packages, you will find the calculated analysis of square utilization and occupancy, as well as the differential data and a subfolder containing heatmap representations for each of the White and Black pieces (all 12 piece types).

Remember, a scientific approach to chess only means that you are willing to test your own ideas about the game in a systematic way, hoping to improve your understanding and thus your performance. Hopefully, the tools and datasets I have made available will assist you in asking scientific questions about chess. In the future, I may post the insights I have gleaned from analyzing this and similar datasets. This future may have to wait some time however, as I have been rather busy lately with my professional commitments!

Select 'Read More' to see details regarding methodology as well as a few initial insights. Please feel free to share your own insights from this dataset, or your critical comments regarding this method! Stay tuned for more science on the squares!

Examining White and Black Pawn Data

In the ChessBase article, I focused only upon the placement or movement of Carlsen or Fischer's Knights and King. However, the programs I developed can generate data on all 12 pieces. Looking at the square utilization and occupancy yields and interesting, yet not surprising, conclusion about Fischer's games:

Square Occupancy Differentials, for White (left) and Black (right) pawns for Fischer as White.
Please click to enlarge.

As we can see, a c3-d4-e5 white pawn chain is found more often when Fischer looses as White. Complementary to this is the more frequent appearance of the e6-d5 black pawn chain in the same dataset. Both of these findings are consistent with Fischer's known difficulty and poor score against the French defense.

Since Carlsen is not generally thought to have a problem with the French (and indeed plays a wide variety of openings), we should predict that this pattern will not be present in his dataset. 

Square Occupancy Differentials, for White (left) and Black (right) pawns for Carlsen as White
Please click to enlarge.

As predicted, we obtain a pattern different from that of Fischer. In Carlsen's White games, advanced White pawns around the Kingside are more often observed in his losses. Conversely, we might conclude that in games ending in 1-0, Black is often found with his pawns advanced only one square or not at all. King safety is important, but pawns are the soul of chess; you need to be able to use them (develop them) to win!

Examining Bishop Data (White Pieces Only)

One of the readers ("NimzoCapa") at ChessBase commented on the differences in Fischer and Carlsen's overall piece placement and movement. Their sharp eye picked up on the higher number of moves to a4, b5, and b3 in Fischer's games, and speculated that this reflects Fischer's use of the Spanish and other lines involved Bishop moves to these squares. 

To address this, we can look directly at the datasets for Bishop moves. Here, I will focus on the differential data, although looking at the raw numbers is warranted as well (and is included in the downloadable content linked above).

Square Utilization Differentials for the White Bishop as handled by Fischer (left) or Carlsen (right)
Please click to enlarge.

This analysis confirms NimzoCapa's astute observation. Fischer has a Bishop traveling through c4, b5, and b3 in his wins at a greater percentage than Carlsen does. Does the same pattern hold true for square occupancy?

Square Occupancy Differentials for the White Bishop as handled by Fischer (left) or Carlsen (right)
Please click to enlarge.

Once again, we get a fairly similar pattern, although with squares like d3 and c4 featured more heavily in Fischer's wins (vs losses), and much less defined tendencies in Carlsen's games. It is interesting to note that a Bishop on g2 is found in Fischer's losses as White. Perhaps the Kings Indian Attack was not one of Fischer's more successful opening choices.

When interpreting this type of data, it might be important to consider pieces that have left the board.


In the data above, there is a difference between b5 and c4 in Fischer's games both in terms of square utilization and square occupancy with a White Bishop. Also, Carlsen's pattern is more diffuse, with defined tendencies more difficult to spot. One possible explanation is the way in which exchanges or captures of the piece in question can change the data.

The following is just speculation, but I offer potential explanations for the patterns seen in the data. Fischer often employed the Fischer-Sozin attack with Bc4 against the Sicilian. If my memory serves correctly, he used this opening to good effect, winning more often than losing. In this line, a Bishop is planted on c4 and can exerts pressure from that square for several moves (although it is sometimes chased to b3 or sacrificed.

In contrast, Fischer is moving the same Bishop to b5 in some games (maybe even the same games, sometimes), more often than moving it to c4. However, the Bishop doesn't stay on that square for long. Consider Fischer's use of the Exchange Ruy Lopez 1.e4 e5 2.Nf3 Nc6 3.Bb5 a6 4.Bxb5; A preponderance of games like this will increase the square utilization of b5, and lower the occupancy all over the rest of the board. This means that the occupancy data will reflect strongly all games in which Fischer did not play the Exchange Ruy Lopez. We might even predict, if we could perform mutual information analysis on this dataset, there would be a correlation between 1.e4 e5 and 4.Bb5, and a corresponding correlation between  1...c5 and a later Bc4.

Early exchanges of Fischer's Bishops may also explain why his patterns have more defined tendencies compared to Carlsen. If Carlsen does not exchange his Bishop as early, it has an opportunity to move around the board a lot more, making moves to a greater variety of squares and spreading its time during the game among all of the. 

No comments:

Post a Comment