Hello World - compilation details • c64asm

Write your c64 ASM code

The syntax is roughly modelled on that of TASS64, but only for the fundamentals i.e.
- * refers to current address location, and can be read from or written to
- All programs should start with a *=.... line to set the starting address
- .byte values must be hexadecimal only, and single byte hexadecimal must always include 2 characters. i.e. $0e will work, but $e won’t work.
- Similarly, for immediate values. i.e. #$01 will work. #$1 won’t work.

library(dplyr)
library(c64asm)

asm <- '
*=$0801
  .byte $0c, $08, $0a, $00, $9e, $20
  .byte $32, $30, $38, $30, $00, $00
  .byte $00

*=$0820
      ldx #$00
loop  lda message,x
      and #$3f
      sta $0400,x
      inx
      cpx #$0c
      bne loop

      rts

message
    .text "Hello World!"
'

c64asm defines a set of tokens in the ASM file

c64asm::asm_patterns defines the set of patterns for lexing/tokenising an ASM file
Things to note:
- upper or lower-case opcodes are allowed, but not mixed-case
- No decimal representation is supported, everything should be in hexadecimal
- A bare * is interpreted as the program counter (i.e. current address)
Custom R features
- .rbyte directive. similar to the standard .byte, but the bytes come from the next token in the stream which must be a variable name which contains integers in the range 0-255.
- .rtext directive. Similar to the standard .text, but the text comes from the next token in the stream which must be a variable name which contains a string.
- {...} represents text to be evaluated. Used for symbol arithmetic. e.g. lda {message},1

#-----------------------------------------------------------------------------
# Regex patterns for parsing 6502 assembly
#-----------------------------------------------------------------------------
asm_patterns <- c(
  newline    = '\n',
  whitespace = '\\s+',
  PC         = '\\*',
  immediate  = '#\\$[0-9a-fA-F]{1,2}',
  word       = '\\$[0-9a-fA-F]{3,4}',
  byte       = '\\$[0-9a-fA-F]{1,2}',
  opcode     = "\\b(ADC|AHX|ALR|ANC|AND|ARR|ASL|AXS|BCC|BCS|BEQ|BIT|BMI|BNE|BPL|BRK|BVC|BVS|CLC|CLD|CLI|CLV|CMP|CPX|CPY|DCP|DEC|DEX|DEY|EOR|INC|INX|INY|ISC|JMP|JSR|LAS|LAX|LAX|LDA|LDX|LDY|LSR|NOP|ORA|PHA|PHP|PLA|PLP|RLA|ROL|ROR|RRA|RTI|RTS|SAX|SBC|SEC|SED|SEI|SHX|SHY|SLO|SRE|STA|STX|STY|TAS|TAX|TAY|TSX|TXA|TXS|TYA|XAA|adc|ahx|alr|anc|and|arr|asl|axs|bcc|bcs|beq|bit|bmi|bne|bpl|brk|bvc|bvs|clc|cld|cli|clv|cmp|cpx|cpy|dcp|dec|dex|dey|eor|inc|inx|iny|isc|jmp|jsr|las|lax|lax|lda|ldx|ldy|lsr|nop|ora|pha|php|pla|plp|rla|rol|ror|rra|rti|rts|sax|sbc|sec|sed|sei|shx|shy|slo|sre|sta|stx|sty|tas|tax|tay|tsx|txa|txs|tya|xaa)\\b",
  byte_inst  = '\\.byte',
  text_inst  = '\\.text',
  rtext_inst = '\\.rtext',
  rbyte_inst = '\\.rbyte',
  lbracket   = '\\(',
  rbracket   = '\\)',
  text       = '".*?"',
  comma      = ",",
  colon      = ":",
  equals     = '=',
  comment    = '(;[^\n]*)',
  x          = '(x|X)',
  y          = '(y|Y)',
  symbol     = '#?<?>?\\{.*?\\}',  # a symbol with evaluation
  symbol     = '[^\\s:,)]+'
)

Each line is split into tokens

Each line is split into tokens
any comma, whitespace or comment tokens are discarded
any blank lines are discarded

line_tokens <- c64asm::create_line_tokens(asm)
line_tokens
#> [[1]]
#>      PC  equals    word 
#>     "*"     "=" "$0801" 
#> 
#> [[2]]
#> byte_inst      byte      byte      byte      byte      byte      byte 
#>   ".byte"     "$0c"     "$08"     "$0a"     "$00"     "$9e"     "$20" 
#> 
#> [[3]]
#> byte_inst      byte      byte      byte      byte      byte      byte 
#>   ".byte"     "$32"     "$30"     "$38"     "$30"     "$00"     "$00" 
#> 
#> [[4]]
#> byte_inst      byte 
#>   ".byte"     "$00" 
#> 
#> [[5]]
#>      PC  equals    word 
#>     "*"     "=" "$0820" 
#> 
#> [[6]]
#>    opcode immediate 
#>     "ldx"    "#$00" 
#> 
#> [[7]]
#>    symbol    opcode    symbol         x 
#>    "loop"     "lda" "message"       "x" 
#> 
#> [[8]]
#>    opcode immediate 
#>     "and"    "#$3f" 
#> 
#> [[9]]
#>  opcode    word       x 
#>   "sta" "$0400"     "x" 
#> 
#> [[10]]
#> opcode 
#>  "inx" 
#> 
#> [[11]]
#>    opcode immediate 
#>     "cpx"    "#$0c" 
#> 
#> [[12]]
#> opcode symbol 
#>  "bne" "loop" 
#> 
#> [[13]]
#> opcode 
#>  "rts" 
#> 
#> [[14]]
#>    symbol 
#> "message" 
#> 
#> [[15]]
#>          text_inst               text 
#>            ".text" "\"Hello World!\""

The main `prg_df` data structure is created from `line_tokens`

All the computation/passes necessary to turn the ASM into actual bytes for PRG file take place as operations on this prg_df data.frame
There are lots of columns here keeping track of values, cross-references to symbols and different representations of the values e.g. in both hexadecimal and decimal

prg_df <- c64asm::create_prg_df(line_tokens)

init_addr	label	line	opmode	opbyte	ophex	symbol_op	symbol_expr	nbytes
2049	NA	* = $0801	NA	NA	NA	NA	NA	0
NA	NA	.byte $0c $08 $0a $00 $9e $20	NA	NA	NA	NA	NA	6
NA	NA	.byte $32 $30 $38 $30 $00 $00	NA	NA	NA	NA	NA	6
NA	NA	.byte $00	NA	NA	NA	NA	NA	1
2080	NA	* = $0820	NA	NA	NA	NA	NA	0
NA	NA	ldx #$00	immediate	162	a2	NA	NA	2
NA	loop	loop lda message x	absolute x	189	bd	absolute x	message	3
NA	NA	and #$3f	immediate	41	29	NA	NA	2
NA	NA	sta $0400 x	absolute x	157	9d	NA	NA	3
NA	NA	inx	implied	232	e8	NA	NA	1
NA	NA	cpx #$0c	immediate	224	e0	NA	NA	2
NA	NA	bne loop	relative	208	d0	relative	loop	2
NA	NA	rts	implied	96	60	NA	NA	1
NA	message	message	NA	NA	NA	NA	NA	0
NA	NA	.text “Hello World!”	NA	NA	NA	NA	NA	12

Calculate addresses and replace any symbols with these values

symbols (such as message) may have their values defined by
- their address, i.e. position in the bytestream relative to the start of the file
- explicitly assigned a value e.g. storage = $3000
Since the actual instruction address might only be specified at the very start of the file (using * = $0820 for example), then byte counting is used to figure out the address of all subsequent instructions.

prg_df <- c64asm::process_symbols(prg_df)

addr	label	line	opmode	opbyte	ophex	symbol_expr	nbytes	symbol_value	symbol_bytes	bytes
2049	NA	* = $0801	NA	NA	NA	NA	0	NA	NA, NA
2049	NA	.byte $0c $08 $0a $00 $9e $20	NA	NA	NA	NA	6	NA	NA, NA	12, 8, 10, 0, 158, 32
2055	NA	.byte $32 $30 $38 $30 $00 $00	NA	NA	NA	NA	6	NA	NA, NA	50, 48, 56, 48, 0, 0
2061	NA	.byte $00	NA	NA	NA	NA	1	NA	NA, NA	0
2080	NA	* = $0820	NA	NA	NA	NA	0	NA	NA, NA
2080	NA	ldx #$00	immediate	162	a2	NA	2	NA	NA, NA	162, 0
2082	loop	loop lda message x	absolute x	189	bd	message	3	2096	48, 8	189, 48, 8
2085	NA	and #$3f	immediate	41	29	NA	2	NA	NA, NA	41, 63
2087	NA	sta $0400 x	absolute x	157	9d	NA	3	NA	NA, NA	157, 0, 4
2090	NA	inx	implied	232	e8	NA	1	NA	NA, NA	232
2091	NA	cpx #$0c	immediate	224	e0	NA	2	NA	NA, NA	224, 12
2093	NA	bne loop	relative	208	d0	loop	2	2082	34, 8	208, 243
2095	NA	rts	implied	96	60	NA	1	NA	NA, NA	96
2096	message	message	NA	NA	NA	NA	0	NA	NA, NA
2096	NA	.text “Hello World!”	NA	NA	NA	NA	12	NA	NA, NA	200, 69, 76, 76, 79, 32, 215, 79, 82, 76, 68, 33

Add zero padding so there are no gaps

PRG files must represent a contiguous sequence of bytes - no gaps allowed!
Find and insert sequences of zeros to ensure all gaps are filled.

prg_df <- c64asm::process_zero_padding(prg_df)

addr	label	line	opmode	opbyte	ophex	symbol_expr	nbytes	symbol_value	symbol_bytes	bytes
2049	NA	* = $0801	NA	NA	NA	NA	0	NA	NA, NA
2049	NA	.byte $0c $08 $0a $00 $9e $20	NA	NA	NA	NA	6	NA	NA, NA	12, 8, 10, 0, 158, 32
2055	NA	.byte $32 $30 $38 $30 $00 $00	NA	NA	NA	NA	6	NA	NA, NA	50, 48, 56, 48, 0, 0
2061	NA	.byte $00	NA	NA	NA	NA	1	NA	NA, NA	0
2062	NA	(zero padding)	NA	NA	NA	NA	18	NA	NULL	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
2080	NA	* = $0820	NA	NA	NA	NA	0	NA	NA, NA
2080	NA	ldx #$00	immediate	162	a2	NA	2	NA	NA, NA	162, 0
2082	loop	loop lda message x	absolute x	189	bd	message	3	2096	48, 8	189, 48, 8
2085	NA	and #$3f	immediate	41	29	NA	2	NA	NA, NA	41, 63
2087	NA	sta $0400 x	absolute x	157	9d	NA	3	NA	NA, NA	157, 0, 4
2090	NA	inx	implied	232	e8	NA	1	NA	NA, NA	232
2091	NA	cpx #$0c	immediate	224	e0	NA	2	NA	NA, NA	224, 12
2093	NA	bne loop	relative	208	d0	loop	2	2082	34, 8	208, 243
2095	NA	rts	implied	96	60	NA	1	NA	NA, NA	96
2096	message	message	NA	NA	NA	NA	0	NA	NA, NA
2096	NA	.text “Hello World!”	NA	NA	NA	NA	12	NA	NA, NA	200, 69, 76, 76, 79, 32, 215, 79, 82, 76, 68, 33

Extract the bytes from `prg_df`

prg_df$bytes is a list column which represents all the instruction bytes which make up a PRG file.
Use purrr::flatten() to convert to an integer vector (and remove any NULL entries)

prg_df$bytes %>%
  purrr::flatten() %>%
  as.integer() %>% 
  as.raw()
#>  [1] 0c 08 0a 00 9e 20 32 30 38 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
#> [26] 00 00 00 00 00 00 a2 00 bd 30 08 29 3f 9d 00 04 e8 e0 0c d0 f3 60 c8 45 4c
#> [51] 4c 4f 20 d7 4f 52 4c 44 21

Add the loading address to get the complete PRG

PRG files are prefixed by their first address so they can be loaded in the right location.
Take the first address in prg_df and convert it to 2 bytes in low-byte/high-byte format using c64asm::w2b()

# The following is equivalent to:  c64asm::extract_prg_bytes(prg_df)
as.raw(c(w2b(prg_df$addr[1]), as.integer(purrr::flatten(prg_df$bytes))))
#>  [1] 01 08 0c 08 0a 00 9e 20 32 30 38 30 00 00 00 00 00 00 00 00 00 00 00 00 00
#> [26] 00 00 00 00 00 00 00 00 a2 00 bd 30 08 29 3f 9d 00 04 e8 e0 0c d0 f3 60 c8
#> [51] 45 4c 4c 4f 20 d7 4f 52 4c 44 21

Save the PRG

Just write the bytes, with the loading address prefixed, directly to file with writeBin() - no other processing is needed.

Hello World - compilation details

mikefc@coolbutuseless.com

2023-09-29

Write your c64 ASM code

c64asm defines a set of tokens in the ASM file

Each line is split into tokens

The main `prg_df` data structure is created from `line_tokens`

Calculate addresses and replace any symbols with these values

Add zero padding so there are no gaps

Extract the bytes from `prg_df`

Add the loading address to get the complete PRG

Save the PRG

Hello World - compilation details

mikefc@coolbutuseless.com

2023-09-29

Write your c64 ASM code

c64asm defines a set of tokens in the ASM file

Each line is split into tokens

The main prg_df data structure is created from line_tokens

Calculate addresses and replace any symbols with these values

Add zero padding so there are no gaps

Extract the bytes from prg_df

Add the loading address to get the complete PRG

Save the PRG

The main `prg_df` data structure is created from `line_tokens`

Extract the bytes from `prg_df`