In keeping with the new focus of this blog, I decided to carry out a small experiment regarding opening theory. As a way of judging the merits of several romantic opening lines, such as the Pierce Gambit, Frankenstein-Dracula variation, and Halloween Gambit, I have performed a series of Engine-Engine matches using HIARCS (I use a Mac).
In these matches, I varied the time allotted to the engines for making moves, in order to determine in a quantitative fashion if a particular line offers practical chances and pressure, but can be defused with accurate defense. In the complete article below, I explain my motivation, methodology, and my results. This is a preliminary analysis, and future posts will continue this work, for example by looking at engine evaluations and comparing them with the results.
Still, a few interesting results were apparent. In particular, I was interested in several lines in the Pierce gambit, which starts with 1.e4 e5 2.Nc3 Nc6 3.f4 exf4 4.Nf3 g5 5.d4 g4 6.Bc4 gxf3 7.O-O Nxd4! (Strongest test against the Pierce) 8.Bxf4 (the Knight is immune due to threats of Qg5 followed by Bc5 or Qg2) Bc5 9.Bxf7?! Kxf7 10.Be3
According to my analysis, the line with 9.Bxf7 Kxf7 10.Be3 is inferior for White, as others have suggested in the past. Notably, Ian Simpson at his excellent blog and site, The Gambiteers Guild, suggests that this might be playable for White, if it wasn't for 10...Qf6 11.Nd5 Qe5 12.Rxf3, when White's compensation falls short. In fact, 12.c3 instead should lead to at least an even game, with some pressure and practical chances for White. Some of the computer lines that I generated from this line are both amusing and instructive.
Select read more to see the complete article, with results, discussion, and my methodology. Think I missed something, or have an opening you'd like analyzed this way? Please feel free to share in the comments below.
Computer Performance versus Time as White in selected Romantic Openings
When evaluating an opening line, several different factors can be used. One can easily judge the Win-Draw-Loss record in a grandmaster database or check the computer evaluation of a few lines. Simple approaches may give some superficial insights, but there are drawbacks to these methods.
For example, a particular opening may be played only by stronger players seeking to outplay a weaker opponent, skewing the record in the database. The resulting record reflects the difference in player strength more than the strength of the opening line. A more sophisticated approach, which I personally found satisfying and clever, was to judge the performance rating achieved playing a certain opening, relative to the rating of the players who employed it.
Openings can also be judged in terms of the pawn structures and endgames they lead to, or how much pressure they apply to the opponent. The latter dimension is often used to evaluate gambits. While objectively a gambit may not be a strong option, it nonetheless can prove effective, depending on the level of your opposition and the time control of the game (with gambits being most effective, in general, against club players in blitz games).
The idea of evaluating an opening line such as a gambit in terms other than objective merit is not new. For example, Tim McGrew has written about the idea in his column for Chess Cafe (Articles #13 and #22, which appears to now be behind a paywall, but which are briefly summarized at the Kenilworthian). In these articles, he expresses the idea that gambit lines can be judged in a particular dimension or quality: how many chances do they offer the opponent to go wrong.
In this short article, I explore another, more quantitative way to judge opening lines. In this study, a chess playing computer engine plays a series of matches against itself in certain opening lines. Although this exploration is not done under the pretense of a serious and rigorous scientific study (as I might perform for my day job), I would propose the following hypothesis regarding opening or gambit lines:
For an opening that offers relatively more practical chances than objective value, it will score higher at faster time controls and the score will progressively worsen as time control is increased. The inverse should also be true. An opening that offers neither practical chances or objective merit will score poorly at all time controls.
To analyze the objective merit and practical chances in a particular opening line, a 10 game engine match was setup at three different time controls: an average of 1, 2, or 4 seconds per move / ply (See Methods section for more details). This will generate a series of results, namely the number of White or Black wins, and the number of draws for each match.
In order to have a reference point or basis for comparison, I generated a set of test data that reflects the hypothesis (Table 1). This test data set contains results that can be expected for an opening with a particular quality: for example, an unsound but dangerous gambit by White might be expected to generate many wins at a short time control, but lead to many loses at a longer time control.
Table 1: Results from simulated engine matches (test data). Click for a larger version
To condense the above dataset, the slope and average for each category was calculated, normalized, and graphed (Figure 1). This provides a reference point, and now experimental results can be compared to this graph to determine which category it most closely matches. Notice how using the average of draws should make it possible to distinguish between a even position that is dull or a rich one that contains many practical chances, despite even chances for both sides.
Figure 1: Graphical representation of the practical chances and 'sharpness' that could be reflected by match results. Click for a larger version
Next, several different opening lines were investigated using engine matches or "shoot-outs" (See methods for details). Data similar to that created in the test set was obtained through the course of these engine matches. Several romantic era openings were chosen, focusing on lines in the Four Knights and the Vienna Game. Some lines were chosen as controls, intended to reproduce the categories in the test data based upon known theory and (initial) computer evaluation of the line. For example, the accepting the Vienna Gambit (1.e4 e5 2.Nc3 Nf6 3.f4 exf4) is not recommended for Black; 3...d5 is better, since taking the pawn is widely believed to give White a good game. This can be determined from simply from first principles after 3...exf4 4.e5 Ng8: White is better developed and controls the Center. In contrast, the symmetrical Four Knights game is roughly even and should produced balanced results.
Certain lines in the Four Knights game, such as the Halloween Gambit, produce a greatly imbalanced situation on the board. This gambit can give White practical chances, but is considered by many to be unsound. Although there are some lines of this gambit in which White gets pressure and attacking chances, it is known in theory how Black can equalize. In fact, returning some of the material at the appropriate moment (The 6.d5 Bb4 line) is thought to completely defuse White's attack and give Black at least a slight edge. The approach utilized in this study was applied to three different lines of the Halloween Gambit in order to determine both the practical and objective merit of the opening.
Several additional lines in the Vienna game were analyzed. First, the roughly balanced Glek variation was analyzed, to act as an additional control for a roughly balanced game (or at least one in which wild attacking chances and material imbalances are not present). Two lines in the Pierce Gambit for which the evaluation is not clear (barring some analysis by Ian Simpson, Tim McGrew and others) were chosen as experimental lines for which this study may shed some insight. Finally, the Frankenstein-Dracula variation was chosen as a line where Black, not White, has sacrificed material and should have the better practical chances but possibly an objectively challenging position.
The results from the engine match for these lines in both the Four Knights and Vienna opening lines described above are summarized below (Table 2). The PGN files for each match are available for download as two separate files, one for the Four Knights and Halloween Gambit lines, and another for all the variations stemming from the Vienna game. The results are also represented graphically (Figure 2). In addition, the match results and graphs can be downloaded in excel form (.xlsx file).
Table 2. Results from the engine matches in different opening lines. Click for a larger version
Figure 2: Graphical representation of experimental results from the Engine matches. Click for a larger version
Comparing the results from the engine match to the test data yields some interesting insights. As predicted, the Four Knights opening and Vienna Glek variation resemble a balanced, and somewhat drawish / dull game. The Vienna Accepted line, in which Black gives White practical chances and an objectively sound position, gives White a tremendous score that only improves when there is more time to exploit the position. This variation most closely resembles the 'White Crush' scenario from the test data.
The results from the Halloween Gambit lines is also somewhat consistent with the test data; in each line, the slope for White wins is relatively low. This reflects a higher score for White at faster time controls than at longer time controls. In addition, the average of White wins and draws over the three time controls indicates that there was a high number of Black wins (as can be seen directly from Table 2). Taken together, the data suggests that the Halloween gambit offers White some practical chances but is objectively unsound. This conclusion echos the prevailing opinion on the opening, although according to this data, the practical chances are greatly outweighed by the lack of objective merit (even at fast time controls, White's score is not impressive).
In contrast to the Halloween Gambit, the Frankenstein-Dracula variation of the Vienna game gives Black better practical chances. In some ways, this makes sense, as it is Black that has 'gambited' or scarified an exchange for an attacking position. White's score improves when the time control is increased, suggesting that the extra time helps White to defend accurately and exploit an objectively superior position.
The graphical analysis (Figure 2) suggest that White's score improves the most in the 10.Be3 line of the Pierce gambit, however this is a case where the graph is misleading and a close look at the tabulated results reveals the truth. White's score in this variation at all time controls is abysmal, and the slope is great only because the there are zero White wins in the fastest time control. Despite having an attacking position and open lines in that variation, Black has sufficient practical counter chances to compete even when not given much time to find an accurate defense (alternatively, there may not be enough time to find an accurate attacking sequence). In all of the games in this line (at any time control), Black selected 10...Ke8 to defend, instead of 10...Qf6 as suggested by Ian Simpson at this excellent Gambiteer Guild website. Although it seems strange at first, the point behind 10...Ke8 is that after 11.Bxd4 Bxd4 12.Qxd4 Qf6, the King is no longer on the F-file and therefore 13.Rf3? would lose for White.
Ian Simpson is on the right path by concluding that the entire 10.Be3 variation of the Pierce Gambit (starting actually with White's ninth move, 9.Bxf7) is not the ideal choice, but his analysis appears to underestimate the strength of Black's position. Furthermore, his analysis suggests 10...Qf6 11.Nd5 Qe5 leads to a position in which White has insufficient compensation after 12.Rxf3. While that assessment appears to be correct, White's alternative of 12.c3 seems to maintain the balance. This move makes certain sense; it puts a question to the Knight, and threatens to capture on f3 with check without sacrificing the exchange. HIARCS evaluates this move as =+ (slight advantage to Black), and the engine match analysis suggests that White has a few practical chances but the game is mostly even. In many of these engine games, Black's extra material is insufficient for a win, or White can achieve a perpetual check to draw the game. It is interesting that the 12.c3 line in the Pierce most closely resembles the "Balanced Dull" test data due to the high draw rate, although the resulting positions were often sharp and could still give practical chances for a club / class player to win.
There are several ways this study could be expanded. Of course, other opening lines could be analyzed using the same approach. The existing data set is also worthy of future study. I plan to compare the engine match results with an analysis of the engine evaluation of each move (annotated in the pgn files). It could also be interesting to examine statistics concerning the frequency of piece placement on certain squares and their correlation with wins or practical chances.
Finally, the computer games themselves are worth replaying, if only for chess enjoyment. I found that they came up with some interesting ideas and tactics in these positions. For example, in my own games when playing the Vienna against opponents unfamiliar with the opening I find myself in the position analyzes here: 1.e4 e5 2.Nc3 Nf6 3.f4 exf4 4.e5. My opponents usually insist on inserting the moves 4...Qe7 5.Qe2 before retreating the Knight, after which I take the entire center, complete my development, and enjoy a good game. In these engine matches, Black often pushes their pawn to g5 to hold the extra material, and White sacrifices the knight on f3 after a later g4 push, reminiscent of the play in the Pierce gambit. This approach from either side had never occurred to me.
My favorite tactics from this study, however, are found in the 12.c3 Pierce gambit match. After the opening moves, Black has several choices. Retreating the Knight, unfortunately, is inadvisable, since as mentioned above White is free to capture on f3 with tempo and a strong attack. One option is to move the Bishop with 12...Bd6 and threaten a Queen invasion on f2. Remarkably, in the engine games White ignored this attack and play often continued 13.Bxd4 Qxh2 14.Kf2 fxg2 (Figure 3). Quite a messy looking position! This position helps make clear why these games end in a draw, however, since 15.Qf3+ Ke8 16.Qxg2 Qxg2 17.Kxg2 together with White's pressure on the a1-h8 diagonal will lead to a game with balanced and reduced material. Another interesting sequence worthy of study is 12...Qxe4 13.cxd4 Qxd5 14.Rxf3+ Ke8 15.dxc5, when White's pressure against the King will often lead to restoration of material and a draw.
Figure 3: The position after 1.e5 e5 2.Nc3 Nc6 3.f4 exf4 4.Nf3 g5 5.d4 g4 6.Bc4 gxf3 7.O-O Nxd4 8.Bxf4 Bc5 9.Bxf7 Kxf7 10.Be3 Qf6 11.Nd5 Qe5 12.c3 Bd6 13.Bxd4 Qxh2 14.Kf2 fxg2, as obtained in the engine matches. HIARCS evaluates this line as =+, although the most often result was a draw. Click for a larger version
HIARCS Match Setup
This study utilized HIARCS 14 WCSC edition, version 1.7. For the engine-engine shootout matches, 10 games were played for a particular opening line (colors were not reverse after each game). in these matches, a 128 mb hash table and 4 mb Nalimov cache was used. Engine evaluation was appended to each move for future analysis. Games were played with an average of 1, 2, or 4 seconds per move.
HIARCS Advanced Settings
Common Hash Settings were used. Book Learning was set to True and Tournament Mode; Combinations (True), GUI Time Lag (False), HIARCS Draw Value (0), Hypermodern Play (True), Optimistic Search (True), Playing Style (Active), Position Learning (True), Retain Hash (True), Search Selectivity (7), Swindle Opponent (True), Use Tablebases (Normal)
White wins (1-0), Black Wins, (0-1), and Draws (½-½) were tabulated for each engine match. White score is equal to the number of White Wins + ½ the number of Draws. The slope is calculated in excel by selecting the match results as the known Y's and the time control (1, 2, or 4 seconds) as the known X's. The resulting calculated slope or average for these categories are normalized (top value is set to 100%, bottom set to 0%, others scaled appropriately) so they can be presented on the same graph.
Starting Position for Opening LInes
Below is the abbreviated name for each opening followed by the moves in standard notation.
Four Knights: 1.e4 e5 2.Nf3 Nc6 3.Nc3 Nf6
Halloween Ng6: 1.e4 e5 2.Nf3 Nc6 3.Nc3 Nf6 4.Nxe5 Nxe5 5.d4 Ng6
Halloween d6: 1.e4 e5 2.Nf3 Nc6 3.Nc3 Nf6 4.Nxe5 Nxe5 5.d4 Nc6 6.d5 Ne5 7.f4 Ng6 8.e5 Ng8 9.d6
Halloween Bb4: 1.e4 e5 2.Nf3 Nc6 3.Nc3 Nf6 4.Nxe5 Nxe5 5.d4 Nc6 6.d5 Bb4
Vienna Accepted: 1.e5 e5 2.Nc3 Nf6 3.f4 exf4
Vienna Glek: 1.e4 e5 2.Nc3 Nf6 3.g3
Pierce 10.Be3: 1.e5 e5 2.Nc3 Nc6 3.f4 exf4 4.Nf3 g5 5.d4 g4 6.Bc4 gxf3 7.O-O Nxd4 8.Bxf4 Bc5 9.Bxf7 Kxf7 10.Be3
Pierce 12.c3: 1.e5 e5 2.Nc3 Nc6 3.f4 exf4 4.Nf3 g5 5.d4 g4 6.Bc4 gxf3 7.O-O Nxd4 8.Bxf4 Bc5 9.Bxf7 Kxf7 10.Be3 Qf6 11.Nd5 Qe5 12.c3