WASM
One of the big buzzes at #RStudioConf2022 was the whiff of serverless “Shiny” in both python and R by harnessing WASM.
WASM (a.k.a. WebAssembly) is a portable binary-code format initially targetted at running code in a web browser at near-native speeds.
This post is a small exploration into what a WASM binary actually looks like and how it is structured.
I also create a toy interpreter for WASM in R. Note: You would never do this in real-life - it’s just for a bit of fun (maybe because I had so much fun with the virtual machine interpreter for AnotherWorld)
C function to calculate factorial(n)
This is a classic bit of C code for calculating n!
i.e. n * (n-1) * (n-2) * ... * 1
int factorial(int n) {
if (n == 0)
return 1;
else
return n * factorial(n-1);
}
factorial(n)
in WAT (human readable WASM text)
There is a human-readable version of WASM called “WAT” which has a structure similar to S-expressions.
The above C code would be compiled to WASM/WAT as follows:
(func (param i64) (result i64)
local.get 0 # put arg[0] on stack
i64.eqz # compare top of stack to zero
if (result i64) # if it is zero
i64.const 1 # put 1 on stack
else
local.get 0 # put arg[0] on stack
local.get 0 # put arg[0] on stack
i64.const 1 # put 1 on stack
i64.sub # subtract the top 2 values in stack
call 0 # Call function #0 (return value is on the stack)
i64.mul # Multiply
end)'
Compile WAT to WASM
The WebAssembly Binary Toolkit (WABT) can be used to compile the WAT file into a binary WASM file
brew install wabt
wat2wasm helloworld.wat # Creates 'helloworld.wasm'
# helloworld.wasm
00 61 73 6d 01 00 00 00 01 06 01 60 01 7e 01 7e 03 02 01 00 0a 17 01 15 00 20 00 50 04 7e 42 01 05 20 00 20 00 42 01 7d 10 00 7e 0b 0b
WASM raw bytes
With some googling and using the reference list of WASM bytecodes
it is possible to deconstruct and understand the bytes in the *.wasm
file for
the factorial(n)
function
0x00, 0x61, 0x73, 0x6D, # magick = \0asm
0x01, 0x00, 0x00, 0x00, # version = 1
# The 'type' section contains function signatures.
0x01, 0x06 , # section "type". 6 bytes
0x01, # Num types = 1 i.e. just a single 'function' type defined here
0x60, # Function (begin definition)
0x01, # Number of parameters = 1
0x7e, # param types: i64
0x01, # Number of results = 1
0x7e, # result types: i64
# The 'function' section stores indexes of the function signature
0x03, 0x02, # section "function". section size = 2 bytes
0x01, # Number of functions
0x00, # Index of function
# The 'code' section contains the actual function code
0x0A, 0x17, # Section "code". section size = 0x17 = 23 bytes
0x01, # Number of functions = 1
0x15, # Function body size = 0x15 = 21 bytes
0x00, # Number of local declarations = 0
0x20, 0x00, # local.get 0
0x50, # i64.eqz
0x04, 0x7E, # if i64
0x42, 0x01, # i64.const 1
0x05, # else
0x20, 0x00, # local.get 0
0x20, 0x00, # local get 0
0x42, 0x01, # i64.const 1
0x7D, # i64.sub
0x10, 0x00, # call 0
0x7E, # i64.mul
0x0B, # end
0x0B # end
WASM virtual stack machine
WASM bytecode is intended to be run on a virtual stack machine.
A simple way of implementing a stack machine in R is to have a giant switch statement
and a stack implementation. The switch
statement chooses what
operations to do based upon the bytecode.
I’m cheating a little bit here and have hand-parsed the code split at the if-else statement into two blocks.
Other cheats:
- you would normally need 64bit integers (i.e.
i64
type in WASM) to interpret thefactorial()
code in WASM, but as long as you don’t give it a large number it will be fine. - integers in WASM are encoded as LEB128 variable length integers but since the code sizes are all small, every integer fits in 7-bits which means that no encoding/decoding is necessary.
Warnings
- This unoptimized interpreter is going to be terribly slow.
Stack
Stack implementations don’t come much more naive than this.
The stack
variable itself is just going to exist in the global environment.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Naive stack
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stack <- c()
# Pop a value of the stack
stack_pop <- function() {
res <- stack[1]
stack <<- stack[-1]
res
}
# Push a value on the stack
stack_push <- function(val) {
stack <<- c(val, stack)
}
Define the functions
The major cheat here is that i’ve pre-parsed the function into initial code
in the function and the two code blocks that are executed depending on
the if/else
statement.
There may be multiple functions in general, so all_funcs
is a list of
all the functions in the code, and the correct function is extracted
during a call
opcode (i.e. 0x10
)
all_funcs <- list(
func <- list(
args = c(),
start = c(
0x20, 0x00, # local.get 0
0x50, # i64.eqz
0x04, 0x7E # if i64
),
if_branch = c(
0x42, 0x01 # i64.const 1
),
else_branch = c(
0x20, 0x00, # local.get 0
0x20, 0x00, # local get 0
0x42, 0x01, # i64.const 1
0x7D, # i64.sub
0x10, 0x00, # call 0
0x7E # i64.mul
)
)
)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#" Recursively interpret the given function
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
interp_wasm <- function(func) {
# Set the bytecode to be the starting code of the function
byte_idx <- 1
bytecode <- func$start
# Helper function to read 'n' bytes from the bytecode
read_bytes <- function(n) {
res <- bytecode[byte_idx + seq(n) - 1]
byte_idx <<- byte_idx + n
res
}
# For each opcode, determine what to do within a giant `switch` statement
while(byte_idx <= length(bytecode)) {
opcode <- read_bytes(1)
opcode <- sprintf("0x%02x", opcode)
switch(
opcode,
'0x04' = { # if
type = read_bytes(1) # value type that if() operates on. ignored
value = stack_pop() # get the value. treat as boolean.
if (value == 1) {
# Jump the bytecode into the respective TRUE/FALSE branch
bytecode <- func$if_branch
byte_idx <- 1
} else {
bytecode <- func$else_branch
byte_idx <- 1
}
},
'0x10' = { # call
# Select the function to call, setup the arguments and call it
func_idx <- read_bytes(1)
next_func <- all_funcs[[func_idx + 1]]
arg <- stack_pop()
next_func$args <- arg
if (verbose) message("factorial(", arg, ")")
interp_wasm(next_func)
},
'0x20' = { # local.get idx
# Push a local variable onto the stack.
# In this simple case all local vars are just the args to the function
idx <- read_bytes(1)
value <- func$args[idx + 1]
stack_push(value)
},
'0x42' = { # i64 const
# Push a constant onto the stack
value <- read_bytes(1)
stack_push(value)
},
'0x50' = { # i64.eqz
# Is value at top of stack == 0? No/Yes = 0/1
value <- stack_pop()
result <- value == 0
stack_push(result)
},
'0x7d' = { # i64.sub
# Subtract the two values at the top of the stack
val2 <- stack_pop()
val1 <- stack_pop()
result <- val1 - val2
stack_push(result)
},
'0x7e' = { # i64.mul
# Multiply the 2 values at the top of the stack
val2 <- stack_pop()
val1 <- stack_pop()
result <- val1 * val2
stack_push(result)
}
)
if (verbose) {
message(opcode, " -> stack: ", deparse(stack))
}
}
return(stack[1])
}
factorial(3)
running in WASM
# Select the factorial function
func <- all_funcs[[1]]
# Set the argument to be '3'
func$args <- 3
# interpret the code
verbose <- FALSE
interp_wasm(func)
#> [1] 6
factorial(3)
running in WASM (verbose)
func$args <- 4
verbose <- TRUE
interp_wasm(func)
#> 0x20 -> stack: c(4, 6)
#> 0x50 -> stack: c(0, 6)
#> 0x04 -> stack: 6
#> 0x20 -> stack: c(4, 6)
#> 0x20 -> stack: c(4, 4, 6)
#> 0x42 -> stack: c(1, 4, 4, 6)
#> 0x7d -> stack: c(3, 4, 6)
#> factorial(3)
#> 0x20 -> stack: c(3, 4, 6)
#> 0x50 -> stack: c(0, 4, 6)
#> 0x04 -> stack: c(4, 6)
#> 0x20 -> stack: c(3, 4, 6)
#> 0x20 -> stack: c(3, 3, 4, 6)
#> 0x42 -> stack: c(1, 3, 3, 4, 6)
#> 0x7d -> stack: c(2, 3, 4, 6)
#> factorial(2)
#> 0x20 -> stack: c(2, 3, 4, 6)
#> 0x50 -> stack: c(0, 3, 4, 6)
#> 0x04 -> stack: c(3, 4, 6)
#> 0x20 -> stack: c(2, 3, 4, 6)
#> 0x20 -> stack: c(2, 2, 3, 4, 6)
#> 0x42 -> stack: c(1, 2, 2, 3, 4, 6)
#> 0x7d -> stack: c(1, 2, 3, 4, 6)
#> factorial(1)
#> 0x20 -> stack: c(1, 2, 3, 4, 6)
#> 0x50 -> stack: c(0, 2, 3, 4, 6)
#> 0x04 -> stack: c(2, 3, 4, 6)
#> 0x20 -> stack: c(1, 2, 3, 4, 6)
#> 0x20 -> stack: c(1, 1, 2, 3, 4, 6)
#> 0x42 -> stack: c(1, 1, 1, 2, 3, 4, 6)
#> 0x7d -> stack: c(0, 1, 2, 3, 4, 6)
#> factorial(0)
#> 0x20 -> stack: c(0, 1, 2, 3, 4, 6)
#> 0x50 -> stack: c(1, 1, 2, 3, 4, 6)
#> 0x04 -> stack: c(1, 2, 3, 4, 6)
#> 0x42 -> stack: c(1, 1, 2, 3, 4, 6)
#> 0x10 -> stack: c(1, 1, 2, 3, 4, 6)
#> 0x7e -> stack: c(1, 2, 3, 4, 6)
#> 0x10 -> stack: c(1, 2, 3, 4, 6)
#> 0x7e -> stack: c(2, 3, 4, 6)
#> 0x10 -> stack: c(2, 3, 4, 6)
#> 0x7e -> stack: c(6, 4, 6)
#> 0x10 -> stack: c(6, 4, 6)
#> 0x7e -> stack: c(24, 6)
#> [1] 24
What’s next?
Nothing really.
An actual WASM stack machine would need signed & unsigned 64 bit integers.
You could use {bit64}
to support this, if you really wanted to.
There are ~250 standard opcodes, and another ~250 SSE opcodes. I have no appetite for implementing all these - but it could be fun at some future date.
Please steal/adapt the code if you’re interested.