Part 3: Evaluating R code from potentially malicious sources

TLDR: Shiny app

Shiny app using safe(?) evalation of R code

Please try and tell me if you break it or make it do something unsafe …

Evaluating R code from potentially malicious sources - part 3

As mentioned in Part 1, and Part 2 I’m looking into the idea of running R code which may originate from potentially malicious sources e.g. code from a web interface, or a database or even a tweet!

My most recent idea to safely execute code is to:

Create an empty environment
Copy the whitelisted functions into it
- Lots of control available here e.g. you could whitelist mean and mean.default but not include mean.difftime so that ordinary numeric means are possible, but not of difftime objects
Do all evaluation within this environment - any calls to non-whitelisted functions will cause an error

In this post I want to describe the following:

What is the purpose/context of executing this (potentially malicious) code?
What does it mean to be ‘safe’?
What functions can/can’t be whitelisted?

Note

I know this is dangerous and quite probably a fool’s quest.
I want to run this in the current process and use the result i.e. no remote sandboxes or docker images.
If this is impossible, or there’s an actual way to do this, or a better way of thinking of the problem, I’m keen to hear it!

Ping me on twitter

Context of running (possibly malicious) code

Broadly speaking the code to be run is either:

The entire language.
- No language restrictions
- Limited access to filesystem and other system resources e.g. RAppArmor
Some useful subset of the language
- Access to only the necessary approved (whitelisted) functions
- No restrictions on the environment in which the R interpreter itself runs
- Restriction of access to system resources is by including/excluding functions in the whitelist. i.e. If you do not wish to grant file reading, then don’t include any functions that the user could direct to read a file.

If you are looking to safely evaluate the entire R language in a safe environment, then RAppArmor is definitely where you should be reading, not here :)

I am interested in useful subsets of the language for specific purposes. The current use case that sparked this quest is to: Allow only the creation of R objects e.g. scalars, vectors, lists and data.frames.

This arose out of my desire to serialise/deserialise R objects to YAML (as discussed here).

What does it mean to be ‘safe’ in this context?

The specific purpose for safe evaluation is that I want to Allow only the creation of R objects e.g. scalars, vectors, lists and data.frames.

Wanted:

User should be able to create an object
Nothing else

Unwanted:

Calculations
Assigning variables
System calls
Excessive CPU usage
Excessive memory usage
Access to filesystem, network, keyboard etc.
Pretty much everything else

What functions can be whitelisted? What functions can’t?

An actual dive into what functions are, in general, safe for a malicious user to run will have to wait for another post (sorry!).

For this very simple case of only allowing for the creation of objects the whitelist is quite small

whitelist <- c(
  'structure',
  'c', 
  'list', 
  'data.frame',
  'matrix'
)

Code I want to be able to run

#-----------------------------------------------------------------------------
# User should be allowed to create all these objects
#-----------------------------------------------------------------------------
r_objects = list(
  null                     = "NULL",
  empty_list               = "list()",
  list                     = "list(a=1, b='hello')",
  `NA`                     = "NA",
  numeric_NA               = "NA_real_",
  integer_NA               = "NA_integer_",
  character_NA             = "NA_character_",
  
  boolean                  = "TRUE",
  numeric                  = "12.3",
  integer                  = "1L",
  character                = "'hello'",
  
  boolean_vec              = "c(TRUE, FALSE, FALSE)",
  numeric_vec              = "c(1.23, 4.56)",
  integer_vec              = "c(1L, 5L, 3L)",
  character_vec            = "c('a', 'b', 'c')",
  
  boolean_named_vec        = "c(a=TRUE, b=FALSE, c=FALSE)",
  numeric_named_vec        = "c(a=1.23, b=4.56)",
  integer_named_vec        = "c(a=1L, b=5L, c=3L)",
  character_named_vec      = "c(a='a', b='b', c='c')",
  
  boolean_vec_with_na      = "c(TRUE, FALSE, NA)",
  numeric_vec_with_na      = "c(1.23, NA)",
  integer_vec_with_na      = "c(1L, 2L, NA)",
  character_vec_with_na    = "c('a', 'b', NA)",
  
  boolean_list             = "list(TRUE, FALSE, FALSE)",
  numeric_list             = "list(1.23, 4.56)",
  integer_list             = "list(1L, 5L, 3L)",
  character_list           = "list('a', 'b', 'c')",
  
  boolean_list_with_null   = "list(TRUE, FALSE, NULL)",
  numeric_list_with_null   = "list(1.23, NULL)",
  integer_list_with_null   = "list(1L, 5L, NULL)",
  character_list_with_null = "list('a', 'b', NULL)",
  
  boolean_named_list       = "list(a=TRUE, b=FALSE, c=FALSE)",
  numeric_named_list       = "list(a=1.23, b=4.56)",
  integer_named_list       = "list(a=1L, b=5L, c=3L)",
  character_named_list     = "list(a='a', b='b', c='c')",
  
  boolean_matrix           = "matrix(TRUE, nrow=2, ncol=2)",
  numeric_matrix           = "matrix(1.2 , nrow=2, ncol=2)",
  integer_matrix           = "matrix(1L  , nrow=2, ncol=2)",
  character_matrix         = "matrix('a' , nrow=2, ncol=2)",
  
  data.frame               = "data.frame(a=1L, b=2.1, c='c', stringsAsFactors = FALSE)"
)

Code that should not run

#-----------------------------------------------------------------------------
# If my purpose is only creating objects, then all of these are unsafe
#-----------------------------------------------------------------------------
unsafe_commands <- list(
  "2 * 3",                     # Calculations
  "a <- 1",                    # Assigning variables
  "system('echo bad stuff')",  # System calls
  "list.files('.')",           # File system access
  "lm(mpg ~ cyl, mtcars)"      # Any computation
)

`safe_eval` command

As described in the Part 2:

create an empty environment and add the whitelisted functions to it.
evaluate the code in that environment

#-----------------------------------------------------------------------------
#' Safely evaluate code by only allowing access to whitelisted functions.
#'
#' @param code character string containing the code
#' @param whitelist character vector of names of functions which are allowed
#'                  to be called by the code
#'
#' @return the results of the evaluation (wrapped up in purrr::safely)
#-----------------------------------------------------------------------------
safe_eval <- function(code, whitelist) {
  envir <- rlang::new_environment()
  whitelist %>% 
    purrr::walk(function(x) {envir[[x]] <- get(x)})
  
  safely_eval <- purrr::safely(eval)
  
  safely_eval(parse(text=code), envir=envir)
}

Creating an object works

safe_eval("list(a=1, b='banana')", whitelist)$result

$a
[1] 1

$b
[1] "banana"

Creating all the objects from my test list results in no errors

res <- r_objects %>% 
  purrr::map(~safe_eval(.x, whitelist))

# There are no errors!  All of this code ran without a hitch
errors <- res %>% purrr::keep(~!is.null(.x$error))
stopifnot(length(errors) == 0)

Any attempt at a forbidden function causes an error

# Get an error or 'could not find function' for all the things i haven't explicitly allowed
safe_eval("2 * 3"                   , whitelist)$error
<simpleError in 2 * 3: could not find function "*">
safe_eval("a <- 1"                  , whitelist)$error
<simpleError in a <- 1: could not find function "<-">
safe_eval("system('echo bad stuff')", whitelist)$error
<simpleError in system("echo bad stuff"): could not find function "system">
safe_eval("list.files('.')"         , whitelist)$error
<simpleError in list.files("."): could not find function "list.files">
safe_eval("lm(mpg ~ cyl, mtcars)"   , whitelist)$error
<simpleError in lm(mpg ~ cyl, mtcars): could not find function "lm">

Conclusion

For the code execution context of “Allow creation of objects”, an environment consisting of whitelisted functions seems to be able to ensure safe evaluation of code.

This means that (with the add of some wrapper code) I could ensure safe deserialisation of most built-in object types with a better guarantee of re-creating the original object than the yaml package offers.

Have a look at the Shiny app using this safe(?) evalation of R code.

If you manage to do something ‘unsafe’ please let me know!