Problem
- I have a function which takes 2 arguments and returns a matrix.
- Returned values for the most common inputs are under 1MB but some input values
result in returned matrices up to 5GB in size (thanks to a combinatorial explosion!)
- I would like to memoise the function
using the memoise package, but there doesn’t
seem to be a way to limit memory usage (someone please correct me on twitter if there is a way!)
- The following is an adapted version of
memoise::memoise()
which only caches
objects if they are less than the given object_size_limit
, which is set
to a default of 1MB.
Solution
- I made the tiniest of tweaks the core
memoise::memoise()
function to:
- Add an extra function argument
object_size_limit
- Check the object size before storing
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#' A version of 'memoise::memoise' with limits on individual object size
#'
#' @param f Function of which to create a memoised copy.
#' @param ... optional variables specified as formulas with no RHS to use as
#' additional restrictions on caching. See Examples for usage.
#' @param envir Environment of the returned function.
#' @param cache Cache function.
#' @param object_size_limit maximum size of objects stored in cache.
#' Default: 1048576 bytes (1MB)
#'
#'
#' @import memoise
#' @importFrom stats setNames
#' @importFrom digest digest
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
memoise_with_size_limit <- function (f, ..., envir = environment(f),
cache = memoise::cache_memory(),
object_size_limit = 1048576L) {
f_formals <- formals(args(f))
if (memoise::is.memoised(f)) {
stop("`f` must not be memoised.", call. = FALSE)
}
f_formal_names <- names(f_formals)
f_formal_name_list <- lapply(f_formal_names, as.name)
init_call_args <- setNames(f_formal_name_list, f_formal_names)
init_call <- memoise:::make_call(quote(`_f`), init_call_args)
memoise:::validate_formulas(...)
additional <- list(...)
memo_f <- eval(bquote(function(...) {
called_args <- as.list(match.call())[-1]
default_args <- Filter(function(x) !identical(x, quote(expr = )),
as.list(formals()))
default_args <- default_args[setdiff(names(default_args),
names(called_args))]
args <- c(lapply(called_args, eval, parent.frame()),
lapply(default_args, eval, envir = environment()))
hash <- `_cache`$digest(c(body(`_f`), args, lapply(`_additional`,
function(x) eval(x[[2L]], environment(x)))))
if (`_cache`$has_key(hash)) {
res <- `_cache`$get(hash)
}
else {
res <- withVisible(.(init_call))
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Check size and only store if < object_size_limit
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
if (pryr::object_size(res) < .(object_size_limit)) {
`_cache`$set(hash, res)
}
}
if (res$visible) {
res$value
}
else {
invisible(res$value)
}
}, as.environment(list(init_call = init_call, object_size_limit = object_size_limit))))
formals(memo_f) <- f_formals
attr(memo_f, "memoised") <- TRUE
if (is.null(envir)) {
envir <- baseenv()
}
memo_f_env <- new.env(parent = envir)
memo_f_env$`_cache` <- cache
memo_f_env$`_f` <- f
memo_f_env$`_additional` <- additional
environment(memo_f) <- memo_f_env
class(memo_f) <- c("memoised", "function")
memo_f
}
Future
- Is there a way to do this with the additonal
...
argument to memoise()
?
- You could really go-to-town customising memoise:
- Add ability to remove objects from the cache
- Add ability to store last access time for an object, and hence remove objects
which have not been accessed in some time.
- Write a version which lets you set the maximum size of the entire cache, and then
- not adding any more objects to the cache
- making room by deleting objects from the cache which haven’t been accessed recently