disq(1 + x) depth pc opcode op args
1 0 1 16 LDCONST 1
2 0 3 20 GETVAR x
3 0 5 44 ADD NULL
4 0 7 1 RETURN NULL
A disassembler is a means of breaking a set of VM instructions into human-readble text.
With a disassembler we can take existing R code and explore the bytecode instructions behind the execution.
rbytecode::dis()dis() is the disassembler in the {rbytecode} package. When dissassembling code dis() returns a data.frame of information called the bytecode data.frame (or bcdf).
The bcdf contains structured information about each of the bytecode instructions making up the given code.
Calling as.character() on the bcdf results in a shorter, more compact version of the bytecode assembly. This bytecode assembly can be used as the input to rbytecode::asm() to compile the instructions back into an executable R bytecode object.
The disq() variant of the disassembler captures the argument without evaluating it first and passes the unevaluated expression to dis()
A disassembly of the R code 1 + x is shown below:
disq(1 + x) depth pc opcode op args
1 0 1 16 LDCONST 1
2 0 3 20 GETVAR x
3 0 5 44 ADD NULL
4 0 7 1 RETURN NULL
The output (a bcdf data.frame) shows all the instructions and their arguments. It also includes bookkeeping information about the instruction:
depth the recursion depth in terms of nexted promises and closurespc the program counter. the index of the start of this instruction in relation to the code at this depth.opcode the numeric value used to represent this instructionop the short text string representing this instructionargs the list of arguments to this functionDetails about the operation of individual instructions can be found in the Instruction Reference (Section 9). There you can find details such as instruction count, stack usage and the whether this command requires a stored expression (See Section 7).
In this particular example we can see that 1 + x does the following:
LDCONST 1 - Loads the constant 1 onto the stack.GETVAR x - Fetches the variable x and loads its value onto the stackADD Adds the two items on the top of the stack and pushes the result back onto the stack.RETURN passes the value at the top of the stack back to the calling environment.A compact representation of the bytecode assembly can be created by casting the bcdf to as.character()
disq(1 + x) |> as.character()LDCONST 1
GETVAR x
ADD
RETURN
Depending on what information you are after, both the bcdf and the compact bytecode assembly view are useful for exploring code.
disq({
f <- function(x, y = 1) {
x + y
}
f(x = 3)
}) |> as.character()MAKECLOSURE x; y = 1
GETVAR x
GETVAR y
ADD
RETURN
ENDMAKECLOSURE
SETVAR f
POP
GETFUN f
PUSHCONSTARG 3
SETTAG x
CALL
RETURN
In this more involved example:
MAKECLOSURE/ENDMAKECLOSURE delimit the body (and arguments) of the function and put it on the stack.SETVAR f takes this closure definition off the stack and assigns it to the variable fGETFUN sets up the function to be called. PUSHCONSTARG and SETTAG set up the argument, and CALL executes this function call.ggplot() callIn the following ggplot() call, the majority of the code is making promises with MAKEPROM, and using these as arguments to functions specified by GETFUN
disq({
library(ggplot2)
ggplot(mtcars) +
geom_point(aes(mpg, wt))
}) |> as.character()GETFUN library
MAKEPROM
GETVAR ggplot2
RETURN
ENDMAKEPROM
CALL
POP
GETFUN ggplot
MAKEPROM
GETVAR mtcars
RETURN
ENDMAKEPROM
CALL
GETFUN geom_point
MAKEPROM
GETFUN aes
MAKEPROM
GETVAR mpg
RETURN
ENDMAKEPROM
MAKEPROM
GETVAR wt
RETURN
ENDMAKEPROM
CALL
RETURN
ENDMAKEPROM
CALL
ADD
RETURN
The bytecode instructions behind existing R code can be inspected using dis() and disq() from the {rbytecode} package.
The bytecode assembly can be read and explored, and in the next section we can use rbytecode::asm() to compile the bytecode back into an executable bytecode object.