Body modification part 2: Automatically creating type checks for a function

Problem: Can I automatically create type checking code for a function?

This post is mainly scratching an itch to learn about some function internals. I want to work out a way to have R generate all the type checks for a given function given some shorter definition.

Notes:

  • mypy - optional static typing for python recently popped up on twitter and got me thinking about types in R.
  • I know there’s type checking in the ensurer package, but I wanted to attempt this without any prior implementation clouding my learning, so I didn’t look at this until after I made something.
  • There are numerous packages to help with asserting a value is of a particlar type, but you have to write them all manually which I’m aiming to avoid.

Design decisions

Basic ideas:

  • Ability to specify single assertions on a variable only (for simplicity).
  • Not every argument has to be typed.
  • Support default values for an argument e.g. a=1
  • Need the solution to be incremental to an existing function call.

Two user-facing functions needed:

  1. typed_function: A function to take an augmented argument list and create a function which tests the arguments according to that specification
  2. %+body%: A function to merge the checking function into the beginning of the users function

Type specification

Since I’m keeping this simple, type checking is specified by naming a boolean function to check the argument, and it will dangle off the end of a current argument in the function definition. Some examples:

f <- function(a = 1L ~ is_integer)
f <- function(name ~ is_character)
f <- function(a = 1L ~ is_integer, b = 2, c = 3 ~ is_double)

To make things easier, I’m going to rely on rlang’s suite of testing functions e.g. is_integer, is_character etc

Result

The code is included at the end of this post in an appendix. Warning: it’s not pretty.

To create a typed function, the syntax is:

f <- typed_function(argument list) %+body% { body of function }

A simple example - the original function definition:

f <- function(a = 1L, b, c='cool') {
  cat(c, a+b, "\n")
}

Augment the function definition to include type checking

f <- typed_function(a = 1L ~ is_integer, b ~ is_integer, c = 'cool' ~ is_character) %+body% {
  cat(c, a+b, "\n")
}

The resulting function has a section of checks added to the beginning

function (a = 1L, b, c = "cool") 
{
    {
        stopifnot(!is_missing(a))
        stopifnot(is_integer(a))
        stopifnot(!is_missing(b))
        stopifnot(is_integer(b))
        stopifnot(!is_missing(c))
        stopifnot(is_character(c))
    }
    {
        cat(c, a + b, "\n")
    }
}

The resulting function in action, and throwing errors all over the place when the types don’t match

f()
f(b = 1)
f(b = 2L)
f(b = 2L, c = 3)
f(b = 2L, c = 'rstats')
> f()
Error: !is_missing(b) is not TRUE
> f(b = 1)
Error: is_integer(b) is not TRUE
> f(b = 2L)
cool 3 
> f(b = 2L, c = 3)
Error: is_character(c) is not TRUE
> f(b = 2L, c = 'rstats')
rstats 3 

It works!

Comparison to ensurer

After hacking together my idea, I looked at the ensurer package which also offers the (experimental) ability to create type safe functions. You’ll need to install from github for this functionality.

An example of an ensurer call to create a function with type checking

f <- function_(a ~ integer, b ~ character: "Hello, World!", {
  rep(b, a)
})
function (a, b = "Hello, World!") 
{
    type_integer(a)
    type_character(b)
    {
        rep(b, a)
    }
}

Comparison:

  • I like the use of function_() as the name.
  • It might seem confusing putting the actual body inside the function call? Maybe. I don’t think there’s a good solution. I don’t think my use of %+body% is any clearer.
  • ensurer goes to the trouble of defining its own type checking primitives, whereas I rely on rlang
  • I prefer that my specification keeps the layout for defining an argument and puts the type checking stuff after the regular stuff i.e. I think a = 1 ~ is_double is slightly clearer than a ~ double : 1

Conclusion

  • It sort of works, but I don’t think this idea is worth pursuing. The cure seems worse than the disease.
  • I learnt quite a bit about functions.
  • Possible extensions:
    • support other types - at the moment can only really process basic numeric/integers/characters because of the way the alist() is constructed
    • add a way to type check the return value of a function
    • support the ... argument
    • support multiple checks for each variable e.g. a = 1 ~ is_double & is_non_zero

Appendix - the code

#-----------------------------------------------------------------------------
#' Unpack a list element containing type information about an argument
#'
#' @param l an argument to `typed_function` e.g `a = 1 ~ is_integer`
#'
#' @return a single element list containing a list containing the name, value and boolean test function
#-----------------------------------------------------------------------------
unpack_arg <- function(l) {
  stopifnot(length(l) == 1)
  has_outer_name <- !is.null(names(l)) && names(l) != ''
  raw_value      <- l[[1]]
  has_type       <- is.call(raw_value) && raw_value[[1]] == '~'
  if (has_outer_name && has_type) {
    name  <- names(l)
    value <- raw_value[[2]]
    type  <- raw_value[[3]]
  } else if (has_outer_name && !has_type) {
    name  <- names(l)
    value <- raw_value
    type  <- NULL
  } else if (!has_outer_name && has_type) {
    name  <- as.character(raw_value[[2]])
    value <- "..NULL.."
    type  <- raw_value[[3]]
  } else if (!has_outer_name && !has_type) {
    name  <- as.character(raw_value)
    value <- "..NULL.."
    type  <- NULL
  } else {
    stop("Shouldn't happen")
  }

  list(
    list(
      name  = name,
      value = value,
      type  = type
    )
  )
}


#-----------------------------------------------------------------------------
#' Create a character representation of an argument to `alist()` for a single argument
#'
#'
#' @param unpacked_arg a list containing the name, value and boolean test function
#'
#' @return A text representation of an argument to pass to `alist()`
#-----------------------------------------------------------------------------
create_alist_argument <- function(unpacked_arg) {
  if (identical(unpacked_arg$value, "..NULL..")) {
    paste0(unpacked_arg$name, '=')
  } else {
    if (is.integer(unpacked_arg$value)) {
      paste0(unpacked_arg$name, '=', unpacked_arg$value, 'L')
    } else if (is.character(unpacked_arg$value)) {
      paste0(unpacked_arg$name, '= "', unpacked_arg$value, '"')
    } else {
      paste0(unpacked_arg$name, '=', unpacked_arg$value)
    }
  }
}


#-----------------------------------------------------------------------------
#' Create a list of formals for a function.
#'
#' This is a wrapper around the `alist()` command
#'
#' @param unpacked_args a list of argument info, containing one list for each
#'                      argument consisting of a list containing the name, value and boolean test function
#'
#' @return A set of formal arguments created by `alist()`
#-----------------------------------------------------------------------------
create_formals <- function(unpacked_args) {
  alist_args    <- unpacked_args %>% map_chr(create_alist_argument)
  alist_command <- sprintf("alist(%s)", paste(alist_args, collapse=", "))
  eval(parse(text = alist_command))
}






#-----------------------------------------------------------------------------
#' Create a type assertion for the given `typed_arg`
#'
#' @param typed_arg a list containing the name, value and boolean function to test argument
#'
#' @return code to test whether the argument passes the given boolean function
#-----------------------------------------------------------------------------
create_type_assertion <- function(typed_arg) {
  func_name <- as.character(typed_arg$type)
  if (!exists(func_name)) {
    stop("Check function not found: ", func_name)
  }

  func_sym <- as.name(func_name)
  arg_sym  <- as.name(typed_arg$name)

  bquote({
    stopifnot(!is_missing(.(arg_sym)))
    stopifnot(.(func_sym)(.(arg_sym)))
  })
}




#-----------------------------------------------------------------------------
#' Create a function consisting of checks based upon the arguments
#'
#' @param ... named arguments of the form:
#'   -    a
#'   -    a = [default value]
#'   -    a                   ~ [boolean func]
#'   -    a = [default value] ~ [boolean func]
#'
#' Boolean functions are best taken from `rlang` e.g. `rlang::is_integer`
#'
#' e.g.   typed_function(a = 1 ~ is_double, b ~ is_list)
#'
#' @return a function with the formal arguments used to call `typed_function`
#'         whose body only contains functions for checking those arguments
#'         against the specified boolean functions
#-----------------------------------------------------------------------------
typed_function <- function(...) {

  # Use the ... arguments to create a list of information
  # about each argument i.e. name, type,
  ll            <- exprs(...)
  arg_infos     <- ll %>% purrr::lmap(unpack_arg)
  args          <- create_formals(arg_infos)

  # Create a list of commands to check each of the arguments that
  # has type information
  body_calls <- arg_infos %>%
    discard(~is.null(.x$type)) %>%
    map(create_type_assertion) %>%
    map(rlang::call_args) %>%
    flatten()

  # Create the checking function and give it a body and formal arguments
  check_func          <- function() {}
  body(check_func)    <- rlang::call2('{', splice(body_calls))
  formals(check_func) <- args

  check_func
}



#-----------------------------------------------------------------------------
#' Append body contents to an existing function
#'
#' @param fun a function
#' @param new_body new content to be appended to existing body
#-----------------------------------------------------------------------------
`%+body%` <- function(fun, this_body) {
  body(fun) <- rlang::call2('{', body(fun), substitute(this_body))
  fun
}