8  Exploration

This chapter explores some of the inner workings of R and the R bytecode compiler.

8.1 Compiler Optimization: Constant folding

The internal R bytecode compiler includes constant folding as one of its optimization mechanisms.

Constant folding is the process of evaluating constant expressions at compile time rather than computing them at execution time.

Constants include simple literals (e.g. 2), but also include expressions whose value is known at compile time.

There are four types of objects that are not treated as constants:

  1. function calls of type language
  2. variable references of type symbol
  3. promises
  4. bytecode objects

If an expression does not contain any of these, then it is a candidate for evaluation during the bytecode compilation process i.e. constant folding.

According to Tierney (Tierney 2023) constant folding is applied during code generation, and it not part of a separate optimization phase.

Note: {rbytecode} bypasses R’s compilation mechanism and does not include any optimization steps i.e. bytecode assembly is converted to a bytecode object exactly as specified.

8.1.1 Simple demonstration of constant folding in action

As an example of constant folding, compare the disassembly of the following two expressions 1 + x and 1 + 2

disq({
  1 + x
}) |> as.character()
LDCONST 1
GETVAR x
ADD 
RETURN 

In x + 1, the variable x is not a constant (i.e. it is a variable reference of type symbol - which is the second in the list of four non-constant elements noted above). Since the expression contains a non-constant, it is left as-is and evaluated at runtime i.e. the value of x is always fetched in order to calculate the result.

disq({
  1 + 2
}) |> as.character()
LDCONST 3
RETURN 

In 1 + 2, both arguments are constants and the compiler evaluates this expression at compile time. When evaluating this bytecode, it simply fetches the pre-computed value of 3 and returns it.

Note that the function + does not count as a function call of type language in this example. Because + is a primitive operation it gets a free-pass and its calculation is assumed to be deterministic.

8.1.2 More complex example with constant folding

Constant folding works with numeric vectors and nested operations.

In the following, the actual compiled bytecode simply loads the pre-calculated answer and returns it.

disq(
  1:3 + 2^2 * (4 + (8 - 2))
) |> as.character()
LDCONST c(41, 42, 43)
RETURN 

8.1.3 Constant Folding and the Commutative Property

In the following examples it is evident that in R’s compiler, the optimization step related to constant folding does not make use of the commutative property.

2 * 3 * x is correctly optimized to 6 * x.

But in 2 * x * 3, no simplification is done and the disassembled bytecode indicates that 2 * x is evaluated first, and then that result is multiplied by 3.

disq(
  2 * 3 * x
) |> as.character()
LDCONST 6
GETVAR x
MUL 
RETURN 
disq(
  2 * x * 3
) |> as.character()
LDCONST 2
GETVAR x
MUL 
LDCONST 3
MUL 
RETURN 

8.1.4 Constant folding and MATH1 operations

It is interesting to note that even though the MATH1 functions (Section 11.3) are deterministic, they are not included in constant folding optimization.

In the following example, sin(1) always evaluates to the same value, yet the bytecode indicates that this value is calculated every time during execution by calling the sin() function with an argument of 1.

disq(
  sin(1)
) |> as.character()
BASEGUARD @label1
LDCONST 1
MATH1 sin
@label1 
RETURN 

8.1.5 Corner case: redefined primitive functions

What happens if + is not a primitive operation? This would be the case if its definition had been changed for some reason.

`+` <- function(x, y) {x + y}

disq(
  1 + 2
) |> as.character()
GETFUN +
PUSHCONSTARG 1
PUSHCONSTARG 2
CALL 
RETURN 

Now that the compiler detects this function is no longer the primitive it usually is, it can no longer determine that the expression always evaluates to a constant. No constant folding is performed, and the bytecode will call the function each time it is evaluated.

8.1.6 Future ideas

It is possible that future optimization steps in the R compiler could address:

  1. Consider MATH1 functions with constant arguments as producing a constant result which can be optimized with constant folding.
  2. Apply the commutative property to find more opportunities for constant folding.

8.2 Cubing a number

There are 3 common ways to cube a number in R - illustrated in the following code.

x <- 3

x ^ 3
[1] 27
x**3
[1] 27
x * x * x
[1] 27

8.2.1 x ^ 3 and x**3 are exactly the same

Disassembling x ^ 3 and x**3 (below) shows that they compile to the exact same bytecode.

disq(
  x ^ 3
) |> as.character()
GETVAR x
LDCONST 3
EXPT 
RETURN 
disq(
  x**3
) |> as.character()
GETVAR x
LDCONST 3
EXPT 
RETURN 

8.2.2 What is x * x * x doing?

As shown below, x * x * x loads the variable three times and calls the MUL operation twice

disq(
  x * x * x
) |> as.character()
GETVAR x
GETVAR x
MUL 
GETVAR x
MUL 
RETURN 

8.2.3 Why is x * x * x faster than x ^ 3?

x ^ 3 uses the EXPT operation which is far more expensive than MUL.

The MUL operator is potentially a few CPU instruction, while the EXPT operation probably calls a math library function that is huge by comparison (e.g. see netlib’s libm pow function)

x <- runif(10000)
res <- bench::mark(x ^ 3, x * x * x, relative = TRUE) 
knitr::kable(res[,c(1, 4)], caption = "Relative speed of MUL and EXPT (higher is better)")
Relative speed of MUL and EXPT (higher is better)
expression itr/sec
x^3 1.000000
x * x * x 9.495478

8.2.4 Calculating the cube faster than x * x * x

Calculating the cube of a variable faster than x * x * x at first seems improbable. How can you multiply faster than multiplying??

The trick used below is to not fetch the same variable twice. In standard R code x * x * x produces three calls to GETVAR. A GETVAR call involves two steps:

  1. an indirect lookup (by index) in the CONST list to get the variable name
  2. fetching the variable name from the current environment.

Since we already loaded the variable on the stack with the first GETVAR statement, we can use the DUP op (Section 9.32) to duplicate this value that is now on the top of the stack, and push this dupe onto the stack as well.

The DUP is cheaper than a GETVAR, and thus we get the fastest way to cube a variable in R - even if it might only be a few percent.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Bytecode for "x ^ 3"
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
b0 <- asm(r"(
GETVAR x
LDCONST 3
EXPT
RETURN
)")

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Bytecode for "x * x * x"
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
b1 <- asm(r"(
GETVAR x
GETVAR x
MUL
GETVAR x
MUL
RETURN
)")

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Cubing a number using DUP
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
b2 <- asm(r"(
GETVAR x
DUP
DUP
MUL
MUL
RETURN
)")

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Benchmarking (calculating relative speeds)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
x <- runif(1000)
res <- bench::mark(
   eval(b0), 
   eval(b1), 
   eval(b2), 
  relative = TRUE)
knitr::kable(res[,c(1, 4)], caption = "Relative speed of using `DUP` over multiple `GETVAR` operations (higher is better)")
Relative speed of using DUP over multiple GETVAR operations (higher is better)
expression itr/sec
eval(b0) 1.000000
eval(b1) 5.057885
eval(b2) 5.508104