The Queen’s Gambit - Post 3
Inspired by The Queen’s Gambit on Netflix, I’m doing a few posts on Chess in R.
This screenshot from the show explains everything:

Chess game format: pgn
The pgn file format is a human readable representation of a chess game.
In its most basic form, it consists of
- a sequence of tags (i.e. comments) surrounded by
[] - a sequence of numbers and events representing the moves taken by the players i.e.
- A number indicating which move this is within the game.
- Moves the for the white and black player represented in Standard Algebraic Notation (SAN).
- Comments can be interspersed between/within the moves and are surrounded by “{}”
An example pgn file is show below:
alekhine_pgn <- r'{[Event "Vilnius All-Russian Masters"]
[Site "Vilna (Vilnius) RUE"]
[Date "1912.08.23"]
[EventDate "1912.08.19"]
[Round "5"]
[Result "0-1"]
[White "Alexander Alekhine"]
[Black "Akiba Rubinstein"]
[ECO "C83"]
[WhiteElo "?"]
[BlackElo "?"]
[PlyCount "54"]
1. e4 {Notes by Dr. Savielly Tartakower.} 1... e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4
Nf6 5. O-O Nxe4 6. d4 b5 7. Bb3 d5 8. dxe5 Be6 9. c3 Be7 10. Nbd2 Nc5 11. Bc2
Bg4 12. h3 {The most reasonable course here is 12.Re1, guarding the e-pawn.}
12... Bh5 13. Qe1 $6 {Here again 13. Re1 ensured a very good game for White.}
13... Ne6 14. Nh2 $6 Bg6 $1 15. Bxg6 fxg6 {! Far seeing strategy! Black
recognizes that the f-file and not the e-file will be needed as a base for
action.} 16. Nb3 {Or 16.f4 d4!.} 16... g5 $1 17. Be3 O-O 18. Nf3 Qd7 19. Qd2
{White pays insufficient attention to the scope of his opponent's threats. A
better course is 19.Nfd4 (19...Nxe5 20.Bxg5) seeking to establish equality.}
19... Rxf3 $1 20. gxf3 Nxe5 21. Qe2 Rf8 22. Nd2 Ng6 23. Rfe1 Bd6 24. f4 Nexf4
25. Qf1 Nxh3+ 26. Kh1 g4 27. Qe2 Qf5 0-1}'
Use lex() to turn the text into tokens
- Start by defining the regular expression patterns for each element in the pgn file.
- Use
minilexer::lex()to turn the pgn text into tokens - Throw away whitespace, newlines and tags, since I’m not interested in them.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Use the mini-lexer to break text into labelled tokens
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# remotes::install_github('coolbutuseless/minilexer')
library(minilexer)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Define all the patterns to match as regular expressions.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgn_patterns <- c(
tag = '\\[.*?\\]', # Capture tags as a unit
comment = "\\{.*?\\}", # Capture comments as a unit
resumption = "\\d+\\.\\.\\.", # Resume moves after comment
move_number = "\\d+\\.",
end_of_game = '0-1|1-0|0-0|1/2-1/2',
nag = '\\$\\d+', # Numeric annotation glyph
move = '[-+\\w\\./]+', # Anything else is a move
newline = '\n',
whitespace = '\\s+'
)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Define some different sets of tokens
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
chaff <- c('whitespace', 'newline', 'tag')
non_moves <- c('comment', 'resumption', 'nag', 'end_of_game')
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Parse a PGN file to tokens
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgn_text <- alekhine_pgn
pgn_text <- gsub("\n", ' ', pgn_text)
tokens <- minilexer::lex(pgn_text, pgn_patterns)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Tidy the tokens
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tags <- tokens[names(tokens) == 'tag']
tokens <- tokens[!(names(tokens) %in% chaff)]
head(tokens, 20)
## move_number move
## "1." "e4"
## comment resumption
## "{Notes by Dr. Savielly Tartakower.}" "1..."
## move move_number
## "e5" "2."
## move move
## "Nf3" "Nc6"
## move_number move
## "3." "Bb5"
## move move_number
## "a6" "4."
## move move
## "Ba4" "Nf6"
## move_number move
## "5." "O-O"
## move move_number
## "Nxe4" "6."
## move move
## "d4" "b5"
Final Game Record (after some manual tidying)
| Tag | Value |
|---|---|
| Event | Vilnius All-Russian Masters |
| Site | Vilna (Vilnius) RUE |
| Date | 1912.08.23 |
| EventDate | 1912.08.19 |
| Round | 5 |
| Result | 0-1 |
| White | Alexander Alekhine |
| Black | Akiba Rubinstein |
| ECO | C83 |
| WhiteElo | ? |
| BlackElo | ? |
| PlyCount | 54 |
| N | White | Black | Comment |
|---|---|---|---|
| 1. | e4 | e5 | Notes by Dr. Savielly Tartakower. |
| 2. | Nf3 | Nc6 | |
| 3. | Bb5 | a6 | |
| 4. | Ba4 | Nf6 | |
| 5. | O-O | Nxe4 | |
| 6. | d4 | b5 | |
| 7. | Bb3 | d5 | |
| 8. | dxe5 | Be6 | |
| 9. | c3 | Be7 | |
| 10. | Nbd2 | Nc5 | |
| 11. | Bc2 | Bg4 | |
| 12. | h3 | Bh5 | The most reasonable course here is 12.Re1, guarding the e-pawn. |
| 13. | Qe1 | Ne6 | Here again 13. Re1 ensured a very good game for White. |
| 14. | Nh2 | Bg6 | |
| 15. | Bxg6 | fxg6 | ! Far seeing strategy! Black recognizes that the f-file and not the e-file will be needed as a base for action. |
| 16. | Nb3 | g5 | Or 16.f4 d4!. |
| 17. | Be3 | O-O | |
| 18. | Nf3 | Qd7 | |
| 19. | Qd2 | Rxf3 | White pays insufficient attention to the scope of his opponent’s threats. A better course is 19.Nfd4 (19…Nxe5 20.Bxg5) seeking to establish equality. |
| 20. | gxf3 | Nxe5 | |
| 21. | Qe2 | Rf8 | |
| 22. | Nd2 | Ng6 | |
| 23. | Rfe1 | Bd6 | |
| 24. | f4 | Nexf4 | |
| 25. | Qf1 | Nxh3+ | |
| 26. | Kh1 | g4 | |
| 27. | Qe2 | Qf5 |