The Queen’s Gambit - Post 3
Inspired by The Queen’s Gambit on Netflix, I’m doing a few posts on Chess in R.
This screenshot from the show explains everything:
Chess game format: pgn
The pgn file format is a human readable representation of a chess game.
In its most basic form, it consists of
- a sequence of tags (i.e. comments) surrounded by
[]
- a sequence of numbers and events representing the moves taken by the players i.e.
- A number indicating which move this is within the game.
- Moves the for the white and black player represented in Standard Algebraic Notation (SAN).
- Comments can be interspersed between/within the moves and are surrounded by “{}”
An example pgn
file is show below:
alekhine_pgn <- r'{[Event "Vilnius All-Russian Masters"]
[Site "Vilna (Vilnius) RUE"]
[Date "1912.08.23"]
[EventDate "1912.08.19"]
[Round "5"]
[Result "0-1"]
[White "Alexander Alekhine"]
[Black "Akiba Rubinstein"]
[ECO "C83"]
[WhiteElo "?"]
[BlackElo "?"]
[PlyCount "54"]
1. e4 {Notes by Dr. Savielly Tartakower.} 1... e5 2. Nf3 Nc6 3. Bb5 a6 4. Ba4
Nf6 5. O-O Nxe4 6. d4 b5 7. Bb3 d5 8. dxe5 Be6 9. c3 Be7 10. Nbd2 Nc5 11. Bc2
Bg4 12. h3 {The most reasonable course here is 12.Re1, guarding the e-pawn.}
12... Bh5 13. Qe1 $6 {Here again 13. Re1 ensured a very good game for White.}
13... Ne6 14. Nh2 $6 Bg6 $1 15. Bxg6 fxg6 {! Far seeing strategy! Black
recognizes that the f-file and not the e-file will be needed as a base for
action.} 16. Nb3 {Or 16.f4 d4!.} 16... g5 $1 17. Be3 O-O 18. Nf3 Qd7 19. Qd2
{White pays insufficient attention to the scope of his opponent's threats. A
better course is 19.Nfd4 (19...Nxe5 20.Bxg5) seeking to establish equality.}
19... Rxf3 $1 20. gxf3 Nxe5 21. Qe2 Rf8 22. Nd2 Ng6 23. Rfe1 Bd6 24. f4 Nexf4
25. Qf1 Nxh3+ 26. Kh1 g4 27. Qe2 Qf5 0-1}'
Use lex()
to turn the text into tokens
- Start by defining the regular expression patterns for each element in the pgn file.
- Use
minilexer::lex()
to turn the pgn text into tokens - Throw away whitespace, newlines and tags, since I’m not interested in them.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Use the mini-lexer to break text into labelled tokens
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# remotes::install_github('coolbutuseless/minilexer')
library(minilexer)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Define all the patterns to match as regular expressions.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgn_patterns <- c(
tag = '\\[.*?\\]', # Capture tags as a unit
comment = "\\{.*?\\}", # Capture comments as a unit
resumption = "\\d+\\.\\.\\.", # Resume moves after comment
move_number = "\\d+\\.",
end_of_game = '0-1|1-0|0-0|1/2-1/2',
nag = '\\$\\d+', # Numeric annotation glyph
move = '[-+\\w\\./]+', # Anything else is a move
newline = '\n',
whitespace = '\\s+'
)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Define some different sets of tokens
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
chaff <- c('whitespace', 'newline', 'tag')
non_moves <- c('comment', 'resumption', 'nag', 'end_of_game')
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Parse a PGN file to tokens
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgn_text <- alekhine_pgn
pgn_text <- gsub("\n", ' ', pgn_text)
tokens <- minilexer::lex(pgn_text, pgn_patterns)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Tidy the tokens
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tags <- tokens[names(tokens) == 'tag']
tokens <- tokens[!(names(tokens) %in% chaff)]
head(tokens, 20)
## move_number move
## "1." "e4"
## comment resumption
## "{Notes by Dr. Savielly Tartakower.}" "1..."
## move move_number
## "e5" "2."
## move move
## "Nf3" "Nc6"
## move_number move
## "3." "Bb5"
## move move_number
## "a6" "4."
## move move
## "Ba4" "Nf6"
## move_number move
## "5." "O-O"
## move move_number
## "Nxe4" "6."
## move move
## "d4" "b5"
Final Game Record (after some manual tidying)
Tag | Value |
---|---|
Event | Vilnius All-Russian Masters |
Site | Vilna (Vilnius) RUE |
Date | 1912.08.23 |
EventDate | 1912.08.19 |
Round | 5 |
Result | 0-1 |
White | Alexander Alekhine |
Black | Akiba Rubinstein |
ECO | C83 |
WhiteElo | ? |
BlackElo | ? |
PlyCount | 54 |
N | White | Black | Comment |
---|---|---|---|
1. | e4 | e5 | Notes by Dr. Savielly Tartakower. |
2. | Nf3 | Nc6 | |
3. | Bb5 | a6 | |
4. | Ba4 | Nf6 | |
5. | O-O | Nxe4 | |
6. | d4 | b5 | |
7. | Bb3 | d5 | |
8. | dxe5 | Be6 | |
9. | c3 | Be7 | |
10. | Nbd2 | Nc5 | |
11. | Bc2 | Bg4 | |
12. | h3 | Bh5 | The most reasonable course here is 12.Re1, guarding the e-pawn. |
13. | Qe1 | Ne6 | Here again 13. Re1 ensured a very good game for White. |
14. | Nh2 | Bg6 | |
15. | Bxg6 | fxg6 | ! Far seeing strategy! Black recognizes that the f-file and not the e-file will be needed as a base for action. |
16. | Nb3 | g5 | Or 16.f4 d4!. |
17. | Be3 | O-O | |
18. | Nf3 | Qd7 | |
19. | Qd2 | Rxf3 | White pays insufficient attention to the scope of his opponent’s threats. A better course is 19.Nfd4 (19…Nxe5 20.Bxg5) seeking to establish equality. |
20. | gxf3 | Nxe5 | |
21. | Qe2 | Rf8 | |
22. | Nd2 | Ng6 | |
23. | Rfe1 | Bd6 | |
24. | f4 | Nexf4 | |
25. | Qf1 | Nxh3+ | |
26. | Kh1 | g4 | |
27. | Qe2 | Qf5 |