Memoise with object size limit

Problem

  • I have a function which takes 2 arguments and returns a matrix.
  • Returned values for the most common inputs are under 1MB but some input values result in returned matrices up to 5GB in size (thanks to a combinatorial explosion!)
  • I would like to memoise the function using the memoise package, but there doesn’t seem to be a way to limit memory usage (someone please correct me on twitter if there is a way!)
  • The following is an adapted version of memoise::memoise() which only caches objects if they are less than the given object_size_limit, which is set to a default of 1MB.

Solution

  • I made the tiniest of tweaks the core memoise::memoise() function to:
    • Add an extra function argument object_size_limit
    • Check the object size before storing
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#' A version of 'memoise::memoise' with limits on individual object size
#'
#' @param f     Function of which to create a memoised copy.
#' @param ... optional variables specified as formulas with no RHS to use as
#' additional restrictions on caching. See Examples for usage.
#' @param envir Environment of the returned function.
#' @param cache Cache function.
#' @param object_size_limit maximum size of objects stored in cache.
#'        Default: 1048576 bytes (1MB)
#'
#'
#' @import memoise
#' @importFrom stats setNames
#' @importFrom digest digest
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
memoise_with_size_limit <- function (f, ..., envir = environment(f), 
                                     cache = memoise::cache_memory(), 
                                     object_size_limit = 1048576L) {
  f_formals <- formals(args(f))
  if (memoise::is.memoised(f)) {
    stop("`f` must not be memoised.", call. = FALSE)
  }
  f_formal_names <- names(f_formals)
  f_formal_name_list <- lapply(f_formal_names, as.name)
  init_call_args <- setNames(f_formal_name_list, f_formal_names)
  init_call <- memoise:::make_call(quote(`_f`), init_call_args)
  memoise:::validate_formulas(...)
  additional <- list(...)
  memo_f <- eval(bquote(function(...) {
    called_args <- as.list(match.call())[-1]
    default_args <- Filter(function(x) !identical(x, quote(expr = )),
                           as.list(formals()))
    default_args <- default_args[setdiff(names(default_args),
                                         names(called_args))]
    args <- c(lapply(called_args, eval, parent.frame()),
              lapply(default_args, eval, envir = environment()))
    hash <- `_cache`$digest(c(body(`_f`), args, lapply(`_additional`,
                                                       function(x) eval(x[[2L]], environment(x)))))
    if (`_cache`$has_key(hash)) {
      res <- `_cache`$get(hash)
    }
    else {
      res <- withVisible(.(init_call))
      #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      # Check size and only store if < object_size_limit
      #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      if (pryr::object_size(res) < .(object_size_limit)) {
        `_cache`$set(hash, res)
      }
    }
    if (res$visible) {
      res$value
    }
    else {
      invisible(res$value)
    }
  }, as.environment(list(init_call = init_call, object_size_limit = object_size_limit))))
  formals(memo_f) <- f_formals
  attr(memo_f, "memoised") <- TRUE
  if (is.null(envir)) {
    envir <- baseenv()
  }
  memo_f_env <- new.env(parent = envir)
  memo_f_env$`_cache` <- cache
  memo_f_env$`_f` <- f
  memo_f_env$`_additional` <- additional
  environment(memo_f) <- memo_f_env
  class(memo_f) <- c("memoised", "function")
  memo_f
}

Future

  • Is there a way to do this with the additonal ... argument to memoise()?
  • You could really go-to-town customising memoise:
    • Add ability to remove objects from the cache
    • Add ability to store last access time for an object, and hence remove objects which have not been accessed in some time.
    • Write a version which lets you set the maximum size of the entire cache, and then
      • not adding any more objects to the cache
      • making room by deleting objects from the cache which haven’t been accessed recently