# Memoise with object size limit

## Problem

• I have a function which takes 2 arguments and returns a matrix.
• Returned values for the most common inputs are under 1MB but some input values result in returned matrices up to 5GB in size (thanks to a combinatorial explosion!)
• I would like to memoise the function using the memoise package, but there doesn’t seem to be a way to limit memory usage (someone please correct me on twitter if there is a way!)
• The following is an adapted version of memoise::memoise() which only caches objects if they are less than the given object_size_limit, which is set to a default of 1MB.

## Solution

• I made the tiniest of tweaks the core memoise::memoise() function to:
• Add an extra function argument object_size_limit
• Check the object size before storing
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#' A version of 'memoise::memoise' with limits on individual object size
#'
#' @param f     Function of which to create a memoised copy.
#' @param ... optional variables specified as formulas with no RHS to use as
#' additional restrictions on caching. See Examples for usage.
#' @param envir Environment of the returned function.
#' @param cache Cache function.
#' @param object_size_limit maximum size of objects stored in cache.
#'        Default: 1048576 bytes (1MB)
#'
#'
#' @import memoise
#' @importFrom stats setNames
#' @importFrom digest digest
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
memoise_with_size_limit <- function (f, ..., envir = environment(f),
cache = memoise::cache_memory(),
object_size_limit = 1048576L) {
f_formals <- formals(args(f))
if (memoise::is.memoised(f)) {
stop("f must not be memoised.", call. = FALSE)
}
f_formal_names <- names(f_formals)
f_formal_name_list <- lapply(f_formal_names, as.name)
init_call_args <- setNames(f_formal_name_list, f_formal_names)
init_call <- memoise:::make_call(quote(_f), init_call_args)
memoise:::validate_formulas(...)
memo_f <- eval(bquote(function(...) {
called_args <- as.list(match.call())[-1]
default_args <- Filter(function(x) !identical(x, quote(expr = )),
as.list(formals()))
default_args <- default_args[setdiff(names(default_args),
names(called_args))]
args <- c(lapply(called_args, eval, parent.frame()),
lapply(default_args, eval, envir = environment()))
hash <- _cache$digest(c(body(_f), args, lapply(_additional, function(x) eval(x[[2L]], environment(x))))) if (_cache$has_key(hash)) {
res <- _cache$get(hash) } else { res <- withVisible(.(init_call)) #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Check size and only store if < object_size_limit #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ if (pryr::object_size(res) < .(object_size_limit)) { _cache$set(hash, res)
}
}
if (res$visible) { res$value
}
else {
invisible(res$value) } }, as.environment(list(init_call = init_call, object_size_limit = object_size_limit)))) formals(memo_f) <- f_formals attr(memo_f, "memoised") <- TRUE if (is.null(envir)) { envir <- baseenv() } memo_f_env <- new.env(parent = envir) memo_f_env$_cache <- cache
memo_f_env$_f <- f memo_f_env$_additional <- additional
environment(memo_f) <- memo_f_env
class(memo_f) <- c("memoised", "function")
memo_f
}

## Future

• Is there a way to do this with the additonal ... argument to memoise()?
• You could really go-to-town customising memoise:
• Add ability to remove objects from the cache
• Add ability to store last access time for an object, and hence remove objects which have not been accessed in some time.
• Write a version which lets you set the maximum size of the entire cache, and then
• not adding any more objects to the cache
• making room by deleting objects from the cache which haven’t been accessed recently