Toy WASM interpreter in base R

WASM

One of the big buzzes at #RStudioConf2022 was the whiff of serverless “Shiny” in both python and R by harnessing WASM.

WASM (a.k.a. WebAssembly) is a portable binary-code format initially targetted at running code in a web browser at near-native speeds.

This post is a small exploration into what a WASM binary actually looks like and how it is structured.

I also create a toy interpreter for WASM in R. Note: You would never do this in real-life - it’s just for a bit of fun (maybe because I had so much fun with the virtual machine interpreter for AnotherWorld)

C function to calculate factorial(n)

This is a classic bit of C code for calculating n! i.e. n * (n-1) * (n-2) * ... * 1

int factorial(int n) {
  if (n == 0)
    return 1;
  else
    return n * factorial(n-1);
}

factorial(n) in WAT (human readable WASM text)

There is a human-readable version of WASM called “WAT” which has a structure similar to S-expressions.

The above C code would be compiled to WASM/WAT as follows:

(func (param i64) (result i64)
  local.get 0             # put arg[0] on stack
  i64.eqz                 # compare top of stack to zero
  if (result i64)         # if it is zero
      i64.const 1         # put 1 on stack
  else       
      local.get 0         # put arg[0] on stack
      local.get 0         # put arg[0] on stack
      i64.const 1         # put 1 on stack
      i64.sub             # subtract the top 2 values in stack
      call 0              # Call function #0 (return value is on the stack)
      i64.mul             # Multiply 
  end)'

Compile WAT to WASM

The WebAssembly Binary Toolkit (WABT) can be used to compile the WAT file into a binary WASM file

brew install wabt
wat2wasm helloworld.wat   # Creates 'helloworld.wasm'
# helloworld.wasm
00 61 73 6d 01 00 00 00 01 06 01 60 01 7e 01 7e 03 02 01 00 0a 17 01 15 00 20 00 50 04 7e 42 01 05 20 00 20 00 42 01 7d 10 00 7e 0b 0b

WASM raw bytes

With some googling and using the reference list of WASM bytecodes it is possible to deconstruct and understand the bytes in the *.wasm file for the factorial(n) function

  0x00, 0x61, 0x73, 0x6D, # magick = \0asm
  0x01, 0x00, 0x00, 0x00, # version = 1
  
  # The 'type' section contains function signatures.
  0x01, 0x06 ,     # section "type". 6 bytes
  0x01,            # Num types = 1 i.e. just a single 'function' type defined here
  0x60,            # Function (begin definition)
  0x01,            # Number of parameters = 1
  0x7e,            #  param types: i64
  0x01,            # Number of results = 1
  0x7e,            #  result types: i64
  
  # The 'function' section stores indexes of the function signature
  0x03, 0x02,      # section "function". section size = 2 bytes
  0x01,            # Number of functions
  0x00,            # Index of function
  
  # The 'code' section contains the actual function code
  0x0A, 0x17,      # Section "code".  section size = 0x17 = 23 bytes
  0x01,            # Number of functions = 1 
  0x15,            # Function body size = 0x15 = 21 bytes
  0x00,            # Number of local declarations = 0
  0x20, 0x00,      #   local.get 0
  0x50,            #   i64.eqz
  0x04, 0x7E,      #   if i64
  0x42, 0x01,      #     i64.const 1
  0x05,            #   else
  0x20, 0x00,      #     local.get 0
  0x20, 0x00,      #     local get 0
  0x42, 0x01,      #     i64.const 1
  0x7D,            #     i64.sub
  0x10, 0x00,      #     call 0
  0x7E,            #     i64.mul
  0x0B,            #   end
  0x0B             # end

WASM virtual stack machine

WASM bytecode is intended to be run on a virtual stack machine.

A simple way of implementing a stack machine in R is to have a giant switch statement and a stack implementation. The switch statement chooses what operations to do based upon the bytecode.

I’m cheating a little bit here and have hand-parsed the code split at the if-else statement into two blocks.

Other cheats:

  • you would normally need 64bit integers (i.e. i64 type in WASM) to interpret the factorial() code in WASM, but as long as you don’t give it a large number it will be fine.
  • integers in WASM are encoded as LEB128 variable length integers but since the code sizes are all small, every integer fits in 7-bits which means that no encoding/decoding is necessary.

Warnings

  • This unoptimized interpreter is going to be terribly slow.

Stack

Stack implementations don’t come much more naive than this.

The stack variable itself is just going to exist in the global environment.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Naive stack
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stack <- c()

# Pop a value of the stack
stack_pop <- function() {
  res   <- stack[1]
  stack <<- stack[-1]
  res
}

# Push a value on the stack
stack_push <- function(val) {
  stack <<- c(val, stack)
}

Define the functions

The major cheat here is that i’ve pre-parsed the function into initial code in the function and the two code blocks that are executed depending on the if/else statement.

There may be multiple functions in general, so all_funcs is a list of all the functions in the code, and the correct function is extracted during a call opcode (i.e. 0x10)

all_funcs <- list(
  func <- list(
    args = c(),
    start = c(
      0x20, 0x00,      #   local.get 0
      0x50,            #   i64.eqz
      0x04, 0x7E       #   if i64
    ),
    if_branch = c(
      0x42, 0x01       #     i64.const 1
    ),
    else_branch = c(
      0x20, 0x00,      #     local.get 0
      0x20, 0x00,      #     local get 0
      0x42, 0x01,      #     i64.const 1
      0x7D,            #     i64.sub
      0x10, 0x00,      #     call 0
      0x7E             #     i64.mul
    )
  )
)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#" Recursively interpret the given function
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
interp_wasm <- function(func) {
  
  # Set the bytecode to be the starting code of the function
  byte_idx <- 1
  bytecode <- func$start
  
  # Helper function to read 'n' bytes from the bytecode
  read_bytes <- function(n) {
    res <- bytecode[byte_idx + seq(n) - 1]
    byte_idx <<- byte_idx + n
    res
  }
  
  # For each opcode, determine what to do within a giant `switch` statement
  while(byte_idx <= length(bytecode)) {
    opcode <- read_bytes(1)
    opcode <- sprintf("0x%02x", opcode)
    switch(
      opcode, 
      '0x04' = { # if
        type = read_bytes(1) # value type that if() operates on. ignored
        value = stack_pop()  # get the value. treat as boolean.
        if (value == 1) {
          # Jump the bytecode into the respective TRUE/FALSE branch
          bytecode <- func$if_branch
          byte_idx <- 1
        } else {
          bytecode <- func$else_branch
          byte_idx <- 1
        }
      },
      '0x10' = { # call
        # Select the function to call, setup the arguments and call it
        func_idx <- read_bytes(1)
        next_func <- all_funcs[[func_idx + 1]]
        arg <- stack_pop()
        next_func$args <- arg
        if (verbose) message("factorial(", arg, ")")
        interp_wasm(next_func)
      },
      '0x20' = { # local.get idx
        # Push a local variable onto the stack.
        # In this simple case all local vars are just the args to the function
        idx <- read_bytes(1)
        value <- func$args[idx + 1]
        stack_push(value)
      },
      '0x42' = { # i64 const
        # Push a constant onto the stack
        value <- read_bytes(1)
        stack_push(value)
      },
      '0x50' = { # i64.eqz
        # Is value at top of stack == 0?  No/Yes = 0/1
        value <- stack_pop()
        result <- value == 0
        stack_push(result)
      },
      '0x7d' = { # i64.sub
        # Subtract the two values at the top of the stack
        val2 <- stack_pop()
        val1 <- stack_pop()
        result <- val1 - val2
        stack_push(result)
      },
      '0x7e' = { # i64.mul
        # Multiply the 2 values at the top of the stack
        val2 <- stack_pop()
        val1 <- stack_pop()
        result <- val1 * val2
        stack_push(result)
      }
    )
    
    if (verbose) {
      message(opcode, " -> stack: ", deparse(stack))
    }
  }
  return(stack[1])
}

factorial(3) running in WASM

# Select the factorial function
func <- all_funcs[[1]]

# Set the argument to be '3'
func$args <- 3

# interpret the code
verbose <- FALSE
interp_wasm(func)
#> [1] 6

factorial(3) running in WASM (verbose)

func$args <- 4
verbose <- TRUE
interp_wasm(func)
#> 0x20 -> stack: c(4, 6)
#> 0x50 -> stack: c(0, 6)
#> 0x04 -> stack: 6
#> 0x20 -> stack: c(4, 6)
#> 0x20 -> stack: c(4, 4, 6)
#> 0x42 -> stack: c(1, 4, 4, 6)
#> 0x7d -> stack: c(3, 4, 6)
#> factorial(3)
#> 0x20 -> stack: c(3, 4, 6)
#> 0x50 -> stack: c(0, 4, 6)
#> 0x04 -> stack: c(4, 6)
#> 0x20 -> stack: c(3, 4, 6)
#> 0x20 -> stack: c(3, 3, 4, 6)
#> 0x42 -> stack: c(1, 3, 3, 4, 6)
#> 0x7d -> stack: c(2, 3, 4, 6)
#> factorial(2)
#> 0x20 -> stack: c(2, 3, 4, 6)
#> 0x50 -> stack: c(0, 3, 4, 6)
#> 0x04 -> stack: c(3, 4, 6)
#> 0x20 -> stack: c(2, 3, 4, 6)
#> 0x20 -> stack: c(2, 2, 3, 4, 6)
#> 0x42 -> stack: c(1, 2, 2, 3, 4, 6)
#> 0x7d -> stack: c(1, 2, 3, 4, 6)
#> factorial(1)
#> 0x20 -> stack: c(1, 2, 3, 4, 6)
#> 0x50 -> stack: c(0, 2, 3, 4, 6)
#> 0x04 -> stack: c(2, 3, 4, 6)
#> 0x20 -> stack: c(1, 2, 3, 4, 6)
#> 0x20 -> stack: c(1, 1, 2, 3, 4, 6)
#> 0x42 -> stack: c(1, 1, 1, 2, 3, 4, 6)
#> 0x7d -> stack: c(0, 1, 2, 3, 4, 6)
#> factorial(0)
#> 0x20 -> stack: c(0, 1, 2, 3, 4, 6)
#> 0x50 -> stack: c(1, 1, 2, 3, 4, 6)
#> 0x04 -> stack: c(1, 2, 3, 4, 6)
#> 0x42 -> stack: c(1, 1, 2, 3, 4, 6)
#> 0x10 -> stack: c(1, 1, 2, 3, 4, 6)
#> 0x7e -> stack: c(1, 2, 3, 4, 6)
#> 0x10 -> stack: c(1, 2, 3, 4, 6)
#> 0x7e -> stack: c(2, 3, 4, 6)
#> 0x10 -> stack: c(2, 3, 4, 6)
#> 0x7e -> stack: c(6, 4, 6)
#> 0x10 -> stack: c(6, 4, 6)
#> 0x7e -> stack: c(24, 6)
#> [1] 24

What’s next?

Nothing really.

An actual WASM stack machine would need signed & unsigned 64 bit integers. You could use {bit64} to support this, if you really wanted to.

There are ~250 standard opcodes, and another ~250 SSE opcodes. I have no appetite for implementing all these - but it could be fun at some future date.

Please steal/adapt the code if you’re interested.