Subject Filter

A scientist's take on the Game of Kings
| Chess Puzzles | Book Reviews | | Annotated Games | Opening Analysis | Science | First Time Here?

Sunday, March 1, 2015

Tools for Statistics of the Squares



Ever wonder which squares see the most traffic? Or which squares have pieces 'parked' on them for the longest periods of a game? Now, you can answer these questions easily using a series of JavaScript powered tools I have developed. These are based upon the LT-PGN JavaScript viewer, and calls upon that code (which I did not write) for certain functions. In fact, it directly calls upon the LT-PGN PGN2FEN tool, although my adaptation can handle PGN files with multiple games. 

I was inspired to develop these tools after reading some of the chess-visualization articles posted on the ChessBase website, namely, the analysis of square utilization by Seth Kadish earlier last year (analysis which you can also find at Seth's blog). I really liked the approach, and although the source of the game PGNs were made clear, I wasn't sure how he extracted the data. Also, I thought there was potential to observe more than just square utilization, or 'traffic'.

In order to carry out this analysis, I created three separate JavaScript tools. They aren't shining examples of efficient coding, but they get the job done. They are as follows:

Reformatting PGN Text
http://djcamenares.x10.mx/chess/pgnreform.shtml
Although PGN files from 365chess.com can be used directly in the downstream applications, the chess software I use (HIARCS) places line breaks within the game score. This tool will remove them, leaving all other features of the PGN intact.

Move Counting (Determining Square 'Traffic')
http://djcamenares.x10.mx/chess/traffic.shtml
This tool takes a PGN file, with single or multiple games, and can determine the square utilization or traffic of the White player, Black player, or both players. It also reformats the PGN so as to remove the header tags.

Batch Conversion of PGN to FEN, Counting Square Occupancy ('Parking')
http://djcamenares.x10.mx/chess/parking.shtml
This tool takes a PGN file, with single or multiple games, and does several things. First, it removes header tags from the PGN. Then, it converts the PGN first to FEN, then to an expanded version of FEN in which each square, filled or empty, is declared. Finally, the occupancy, or parking, of different pieces (selected by user) on each square is reported.

To see some example results of these tools, which expands upon the aforementioned work by Kadish, please select 'Read More' below. 



Analysis of Fischer's Games

Although he eventually analyzed square utilization of several grandmasters, Seth Kadish first chose the White games of legendary Bobby Fischer for his analysis. Here, I do the same, looking at the traffic and parking (utilization and occupancy) in a similar set of games. In both cases, our source material was games from 365chess.com where Fischer is playing White. The database has likely been expanded since Kadish's original analysis with only 432 games, since my set contains 558 games.

Using the tools I developed, I first looked at the traffic of Fischer's pieces, much as Kadish had done. The JavaScript programs generate tab-delimited text, which I then import into excel or plotly for better visualization. Over several thousand moves were played, and the distribution looks as follows:


Click for a larger image


This matches the results from Kadish pretty closely. For a better comparison, I translated the numbers into a heatmap by using Plotly (Since I have a version of Excel that is too old to generate a good heatmap). Again, the distribution matches what was already known.


Heatmap for Traffic of White's Pawns and Pieces, as played by Fischer.
Click for a larger image

One interesting feature of this type of analysis is the high amount of traffic on the d4 square. This seems to contradict the fact that Fischer claimed that 1.e4 was "best by test" and played this move most frequently. As I have mentioned elsewhere, this result in fact consistent with Fischer's play. For example, the opening sequence (Sicilian Defense) 1.e4 Nf3 2.d4 d6 3.d4 cxd4 4.Nxd4 was common in Fischer's games. Even though this is an "e4" opening, there are more moves, and thus more traffic, on the d4 square in this line. This can partly explain the results obtained.

Instead, if one looked at square occupancy, we might expect to see a higher value for e4. In other words, there is little traffic over the e4 square because White keeps his/her pawn parked on that location. Again, I turned to the tools I developed and obtained the following data, on all of Fischer's Pieces individually. For example, here is the distribution of where his pawns are parked across all the positions from all of the games in the database (again, with Fischer always as White):

Click for a larger image


Again, certain patterns make sense; there can never be a position with a pawn actually occupying the first or last rank. To make it easier to visual the distribution, I again generated a heatmap.

Heatmap for Parking of White's Pawns, as played by Fischer.
Click for a larger image


As predicted, there are more pawns typically parked on e4 than d4, consistent with Fischer's style and reputation. It is also interesting to note the occupancy of the squares f2, g2, and h2. While a versatile tactical attacking player, Fischer did not fool around with King Safety in most games! Looking at the distribution, you can see that the d4 pawn did not stay on the d-file for long, again consistent with Fischer's use of the Open Sicilian.

What about Fischer's other pieces? Well, you can take a look for yourself; the raw data for the traffic and parking in a database of Fischer (playing White) is available for viewing. There are a number of ways these tools and such data might be utilized, and this will be the subject of some future posts. For now, here is the heatmap for all of Fischer's pieces combined:

Heatmap for Parking of White's Pawns and Pieces, as played by Fischer.
Click for a larger image

Somewhat interesting is the increased occupancy observed on the square g2; I didn't think Fischer fianchettoed his King's Bishop that often, but then again, he did play the KIA many times. Looking at it another way, we can say that Fischer did not often leave the g2 square unoccupied; he had pawn or Bishop there in a number of his games and positions.

If you use the JavaScript tool to find any interesting insights about piece placement and movement, please feel free to share them here, either through the comments, or through a guest post (contact me through comments or directly if you are interested). 

4 comments:

  1. I have updated some of the links on this page, after rewriting some of the code. The tool has been improved to handle multiple input formats, and to give traffic / utilization data on particular pieces.

    ReplyDelete
  2. I have performed further updates, some of which is not reflected in the text of the post. The utilization and occupancy tools now only return the final data: you can use the PGN reformat tool to get FEN and FEN-X output. You can read more about these updates in a later post: http://scienceonthesquares.blogspot.com/2015/03/chessboard-heatmap-and-updates.html

    ReplyDelete
  3. Very Interesting Devin. Thanks for the effort. Could be cool to conduct the same analysis for the black pieces and find a way to overlay the results. It might shed light on what sector of the board to focus attention on when playing i.e. the four squads as bounded by w x y and z. That way a player could check for offensive opportunities for both sides related to those squares before moving attention to other parts of the board. In biological terms - the 4 squares (within the 8 x 8 "cell") that are critical in most circumstances. I saw your article at chess news.

    PS: I Iive in NJ and worked by the Rutgers campus off River Rd. for many years.

    Kevin
    kevinkin@comcast.net

    ReplyDelete
  4. Thanks for the interest and kind words Kevin! I'm glad you enjoyed the article. In fact, the program I developed can analyze the White and Black pieces. The overlay or comparison of the two is interesting, and I may post it in the future. For example, White makes more moves and places more pieces in Black's territory (5th rank and further) in White wins versus losses. Conversely, Black also makes more moves in this same area more often in White wins versus White losses. If the battle is taking place on the 8th, 7th, and 6th ranks, it's clear that Black is on the defensive.

    Did you ever play in the Rutgers Chess Club? Do you go to AgField celebration / NJ Folk Festival? I wonder if our paths have crossed before.

    ReplyDelete