# Memoisation

## Memoisation

Memoisation allows the caching of the output for a function for a given set of inputs.

That is, for a set of arguments the result is only ever calculated once and then stored in a cache. If those set of arguments ever happen again, then the result is pulled from the cache rather than the function being re-run.

This technique is really useful when a function takes a long time to execute.

There are at least 2 memoisation packages in R:

I will use `memoise` simply because I’m more familiar with it.

## My problem

A part of my solution method for nonograms was to generate all possible patterns for a particular row based upon the clues. (First discussed in this post)

The problem boils down to the following:

• Find all integer sequences (of the target length) which add to the target sum
• integer here means positive, non-zero whole number
• target length is
• always greater than 1
• never greater than 35 (for the nonograms I’m attempting to solve)
• target sum is never less than the target length
``````#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#' Find all permutations of all vectors of positive integers of a given length
#' which sum to a given value
#'
#' e.g. find_integers_with_sum(4, 2) gives
#'    1, 3
#'    2, 2
#'    3, 1
#'
#' @param target_length target length
#' @param target_sum target sum
#' @return matrix or raw values, with each row being a solution
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
find_integers_with_sum <- function(target_sum, target_length) {

if (target_length == 1L) {
return(matrix(as.raw(target_sum)))
}

max_choice <- target_sum - target_length + 1L
solutions  <- vector('list', max_choice)
lens       <- integer(max_choice)

for (this_value in seq_len(max_choice)) {
solution                <- find_integers_with_sum(target_sum - this_value, target_length - 1L)
solutions[[this_value]] <- solution
lens[this_value]        <- nrow(solution)
}

cbind(as.raw(rep.int(seq_len(max_choice), lens)), do.call(rbind, solutions), deparse.level = 0L)
}``````
``find_integers_with_sum(target_sum=4, target_length=2)``
``````     [,1] [,2]
[1,]   01   03
[2,]   02   02
[3,]   03   01``````

## The challenge

When the `target_sum` and `target_length` get “large”, the number of solutions becomes huge.

• `find_integers_with_sum(25, 10)`
• 1,307,504 solutions
• 10 seconds to generate.
• 13MB matrix
• `find_integers_with_sum(30, 10)`
• 10 million solutions
• 60 seconds to generate
• 100MB matrix
• `find_integers_with_sum(35, 10)`
• over 52 million solutions
• takes forever to generate (I’ve honestly never done it without memoising!)
• 525MB matrix

The way `find_integers_with_sums()` recurses to find its solution means that it’s calling itself in a nested way to solve smaller and smaller versions of the problem.

This means that inner-most, smallest versions of the problem are solved mutiple times, but are always producing the same result.

This function will benefit from cacheing those inner results.

## Memoisation

To memoise this function, I will be using the `memoise` package.

I will consider three possible backends:

1. In-memory
2. Filesystem in RAM disk (on OSX)
• diskutil erasevolume HFS+ ‘ramdisk’ `hdiutil attach -nomount ram://1165430`
3. Filesystem on SSD

I will be using `find_integers_with_sum(30, 10)` as my test case.

### No cache

``````# Before doing anything else, I'm keeping a copy of the original un-cached function
find_integers_with_sum_ <- find_integers_with_sum

# run without memoisation
system.time({find_integers_with_sum(30, 10)})
system.time({find_integers_with_sum(30, 10)})``````

### In-memory cache

``````find_integers_with_sum  <- memoise(find_integers_with_sum_, cache = cache_memory())
system.time({find_integers_with_sum(30, 10)})
system.time({find_integers_with_sum(30, 10)})``````

### Filesystem on Ram disk

``````path <- "/Volumes/ramdisk"
find_integers_with_sum  <- memoise::memoise(find_integers_with_sum_, cache=cache_filesystem(path))
system.time({find_integers_with_sum(30, 10)})
system.time({find_integers_with_sum(30, 10)})``````

### Filesystem on SSD

``````path <- tempdir()
find_integers_with_sum  <- memoise::memoise(find_integers_with_sum_, cache=cache_filesystem(path))
system.time({find_integers_with_sum(30, 10)})
system.time({find_integers_with_sum(30, 10)})``````

## Timing Summary

Table 1: Run times
cache first_run second_run
None 60.00 60.000
Memory 1.60 0.001
Ram disk 3.28 0.130
SSD 3.46 0.130

## Future

• Consider trying the following modifications to `memoise`
• Add compression option to the filesystem cache (and s3 and gcs caches)
• Revisit a way for caches to have limitations on the maximum size of a result which gets cached. I tried this in a recent post
• Investigate a multi-tiered cache with small results saved to ram and bigger results saved to the filesystem.