# A mini-exploration of the quality R’s random number generators.

In general, we want *pseudo-random number generators* (**PRNG**s) which have (among other things) 2 key properties:

- They should be random
- They should be fast

There are other considerations (e.g. period, availability of multiple streams, jump-aheads etc), but I’m not going there (nor am I considering cryptographic security).

# R’s built-in PRNGs

R has 7 built-in PRNGs. These are accessible by calling `RNGkind(name)`

to
switch the backend to use a particular PRNG. The user-facing code
remains the same e.g. `runif()`

, `rnorm()`

**Mersenne-Twister**- The default PRNG in R
- Also known as: MT19937
- Developed by: Matsumoto and Nishimura, 1997

**Wichmann-Hill**- A combined linear congruential generator (LCG)
- Developed by: Wichman and Hill, 1982

**Marsaglia-Multicarry**- Developed by: Marsaglia, 1997

**Super-Duper**- Developed by: Marsaglia in the 70s

**Knuth-TAOCP-2002**- generalized feedback shift-register generator (GFSR) using lagged Fibonacci sequences
- Developed by: Knuth, 2002

**Knuth-TAOCP**- Another GFSR using using lagged Fibonacci sequences
- Developed by: Knuth, 1997

**L’Ecuyer-CMRG**- A combined multiple recursive generator
- Developed by: L’Ecuyer, 1999
- Defined in this paper Good Parameters and Implementations for Combined Multiple Recursive Random Number Generators (pdf)
- Also known as: MRG32ka

# Testing built-in PRNGs for quality

A 2007 paper which reviews many PRNGs and tests then for quality is L’Ecuyer’s TestU01: A C Library for Empirical Testing of Random Number Generators (pdf)

I’ve summarised Table 1 from this paper for the PRNGs which are available in base R. In this summary I only include the BigCrush numbers. The full version of Table 1 is included at the end of this post;

Year | RNGkind | Alias | Big Crush Errors |
---|---|---|---|

1970s | Super Duper | Super Duper 73 | Test not run as there were too many errors already at lower levels |

1982 | Wichmann-Hill | 22 | |

1997 | Mersenne Twister | MT19937 | 2 |

1997 | Marsaglia-Multicarry | R-Multicarry | Test not run as there were too many errors already at lower levels |

1997 | Knuth-TAOCP | Knuth.ran_array2 | 4 |

1999 | L’Ecuyer-CMRG | MRG32k3a | 0 |

2002 | Knuth-TAOCP-2002 | Knuth.ranf_array2 | 0 |

### Notes on built-in PRNG quality:

- R’s default PRNG (Mersenne-Twister) fails 2 test on BigCrush
- Only L’Ecuyer and Knuth-TAOCP-2002 pass the BigCrush tests with no failures
- The other PRNGs seem to perform quite badly with either multiple failures on BigCrush, or
so many failures on
*SmallCrush*and*Crush*that it wasn’t even worth testing then on*BigCrush* - There appears to be a trend that more recent PRNGs have fewer failures on
*BigCrush*.

- The most recent PRNG is
**18 years old**! What’s happened since then in PRNG development?

# Testing built-in PRNGs for speed

## Click to show/hide benchmark code

```
library(ggplot2)
library(bench)
rng_kinds <- c(
'Mersenne-Twister',
'Wichmann-Hill',
'Marsaglia-Multicarry',
'Super-Duper',
'Knuth-TAOCP-2002',
'Knuth-TAOCP',
"L'Ecuyer-CMRG"
)
N <- 1e6
fun <- function(rng_kind) {
RNGkind(kind = rng_kind)
res <- bench::mark(runif(N))
res$expression <- rng_kind
res
}
baser <- lapply(rng_kinds, fun)
baser <- do.call(rbind, baser)
plot(baser) +
labs(title = "Benchmark: runif() generation of 1e6 random numbers")
```

### Notes on built-in PRNG speed:

- All the built-in PRNGs are about the same speed
- Not sure if this is a characteristic of the way they are included within R - there may be a lot of setup/call overhead that is overshadowing the timing of the actual random number generation.

# Appendix: Full PRNG comparison table.

This is an extract from L’Ecuyer’s TestU01: A C Library for Empirical Testing of Random Number Generators (pdf)

How to read this table:

- The numbers in the last 3 rows are the number of
**failures**for a particular statistical test i.e. SmallCrush, Crush and BigCrush.- A number in brackets e.g.
`(12)`

indicates a*suspect*p-value, but not necessarily a failure.

- A number in brackets e.g.
- The second column is the log2 of the period of the generator (how long before it loops back to the start)
`t-32`

and`t-64`

are the times to generate 10^8 random numbers on a 32 and 64-bit machine (see the paper for actual specs)

# Appendix: Other resources:

- Correction to random number generation in R3.6.0 - here and here
- John Cook looks at few modern PRNGs in his 2017 blog post Testing RNGs with PractRand
- PCG PRNG
- Some criticism on the claims around PCG - here
- Good Practice in (Pseudo) Random Number Generationfor Bioinformatics Applications