disq({
1 + x
|> as.character() })
LDCONST 1
GETVAR x
ADD
RETURN
This chapter explores some of the inner workings of R and the R bytecode compiler.
The internal R bytecode compiler includes constant folding as one of its optimization mechanisms.
Constant folding is the process of evaluating constant expressions at compile time rather than computing them at execution time.
Constants include simple literals (e.g. 2
), but also include expressions whose value is known at compile time.
There are four types of objects that are not treated as constants:
If an expression does not contain any of these, then it is a candidate for evaluation during the bytecode compilation process i.e. constant folding.
According to Tierney (Tierney 2023) constant folding is applied during code generation, and it not part of a separate optimization phase.
Note: {rbytecode}
bypasses R’s compilation mechanism and does not include any optimization steps i.e. bytecode assembly is converted to a bytecode object exactly as specified.
As an example of constant folding, compare the disassembly of the following two expressions 1 + x
and 1 + 2
disq({
1 + x
|> as.character() })
LDCONST 1
GETVAR x
ADD
RETURN
In x + 1
, the variable x
is not a constant (i.e. it is a variable reference of type symbol - which is the second in the list of four non-constant elements noted above). Since the expression contains a non-constant, it is left as-is and evaluated at runtime i.e. the value of x
is always fetched in order to calculate the result.
disq({
1 + 2
|> as.character() })
LDCONST 3
RETURN
In 1 + 2
, both arguments are constants and the compiler evaluates this expression at compile time. When evaluating this bytecode, it simply fetches the pre-computed value of 3
and returns it.
Note that the function +
does not count as a function call of type language in this example. Because +
is a primitive operation it gets a free-pass and its calculation is assumed to be deterministic.
Constant folding works with numeric vectors and nested operations.
In the following, the actual compiled bytecode simply loads the pre-calculated answer and returns it.
disq(
1:3 + 2^2 * (4 + (8 - 2))
|> as.character() )
LDCONST c(41, 42, 43)
RETURN
In the following examples it is evident that in R’s compiler, the optimization step related to constant folding does not make use of the commutative property.
2 * 3 * x
is correctly optimized to 6 * x
.
But in 2 * x * 3
, no simplification is done and the disassembled bytecode indicates that 2 * x
is evaluated first, and then that result is multiplied by 3.
disq(
2 * 3 * x
|> as.character() )
LDCONST 6
GETVAR x
MUL
RETURN
disq(
2 * x * 3
|> as.character() )
LDCONST 2
GETVAR x
MUL
LDCONST 3
MUL
RETURN
MATH1
operationsIt is interesting to note that even though the MATH1
functions (Section 11.3) are deterministic, they are not included in constant folding optimization.
In the following example, sin(1)
always evaluates to the same value, yet the bytecode indicates that this value is calculated every time during execution by calling the sin()
function with an argument of 1
.
disq(
sin(1)
|> as.character() )
BASEGUARD @label1
LDCONST 1
MATH1 sin
@label1
RETURN
What happens if +
is not a primitive operation? This would be the case if its definition had been changed for some reason.
`+` <- function(x, y) {x + y}
disq(
1 + 2
|> as.character() )
GETFUN +
PUSHCONSTARG 1
PUSHCONSTARG 2
CALL
RETURN
Now that the compiler detects this function is no longer the primitive it usually is, it can no longer determine that the expression always evaluates to a constant. No constant folding is performed, and the bytecode will call the function each time it is evaluated.
It is possible that future optimization steps in the R compiler could address:
MATH1
functions with constant arguments as producing a constant result which can be optimized with constant folding.There are 3 common ways to cube a number in R - illustrated in the following code.
<- 3
x
^ 3 x
[1] 27
**3 x
[1] 27
* x * x x
[1] 27
x ^ 3
and x**3
are exactly the sameDisassembling x ^ 3
and x**3
(below) shows that they compile to the exact same bytecode.
disq(
^ 3
x |> as.character() )
GETVAR x
LDCONST 3
EXPT
RETURN
disq(
**3
x|> as.character() )
GETVAR x
LDCONST 3
EXPT
RETURN
x * x * x
doing?As shown below, x * x * x
loads the variable three times and calls the MUL
operation twice
disq(
* x * x
x |> as.character() )
GETVAR x
GETVAR x
MUL
GETVAR x
MUL
RETURN
x * x * x
faster than x ^ 3
?x ^ 3
uses the EXPT
operation which is far more expensive than MUL
.
The MUL
operator is potentially a few CPU instruction, while the EXPT
operation probably calls a math library function that is huge by comparison (e.g. see netlib’s libm pow function)
<- runif(10000)
x <- bench::mark(x ^ 3, x * x * x, relative = TRUE)
res ::kable(res[,c(1, 4)], caption = "Relative speed of MUL and EXPT (higher is better)") knitr
expression | itr/sec |
---|---|
x^3 | 1.000000 |
x * x * x | 9.495478 |
x * x * x
Calculating the cube of a variable faster than x * x * x
at first seems improbable. How can you multiply faster than multiplying??
The trick used below is to not fetch the same variable twice. In standard R code x * x * x
produces three calls to GETVAR
. A GETVAR
call involves two steps:
CONST
list to get the variable nameSince we already loaded the variable on the stack with the first GETVAR
statement, we can use the DUP
op (Section 9.32) to duplicate this value that is now on the top of the stack, and push this dupe onto the stack as well.
The DUP
is cheaper than a GETVAR
, and thus we get the fastest way to cube a variable in R - even if it might only be a few percent.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Bytecode for "x ^ 3"
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<- asm(r"(
b0 GETVAR x
LDCONST 3
EXPT
RETURN
)")
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Bytecode for "x * x * x"
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<- asm(r"(
b1 GETVAR x
GETVAR x
MUL
GETVAR x
MUL
RETURN
)")
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Cubing a number using DUP
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<- asm(r"(
b2 GETVAR x
DUP
DUP
MUL
MUL
RETURN
)")
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Benchmarking (calculating relative speeds)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<- runif(1000)
x <- bench::mark(
res eval(b0),
eval(b1),
eval(b2),
relative = TRUE)
::kable(res[,c(1, 4)], caption = "Relative speed of using `DUP` over multiple `GETVAR` operations (higher is better)") knitr
expression | itr/sec |
---|---|
eval(b0) | 1.000000 |
eval(b1) | 5.057885 |
eval(b2) | 5.508104 |