mikefc

Problem

This problem popped up as a necessary step to solving a type of puzzle (which I’ll post about soon).

  • From the set of all positive (non-zero) integers, find all the sequences of the specified length which have the given sum.
  • Need all orderings
  • sums usually range from 10 to 30
  • lengths usually range from 2 to 10

Example: All the length=3 vectors which sum to 5 (one sequence per row)

     [,1] [,2] [,3]
[1,]    1    1    3
[2,]    1    2    2
[3,]    1    3    1
[4,]    2    1    2
[5,]    2    2    1
[6,]    3    1    1

Naive solution - generate all sequences, then filter

  • Use expand grid to expand all possible sequences.
  • Then filter where rowSums() is equal to the required target_sum
  • Scales really badly as the sum and length targets get bigger
find_sequences_with_sum_naive <- function(target_sum, target_length) {
  res <- do.call(expand.grid, replicate(n=target_length, seq(target_sum - target_length + 1L), simplify = FALSE))
  res[rowSums(res) == target_sum,]
}

find_sequences_with_sum_naive(5, 3)
   Var1 Var2 Var3
3     3    1    1
5     2    2    1
7     1    3    1
11    2    1    2
13    1    2    2
19    1    1    3

Iterative solution with back-tracking

  • build up vectors one digit at a time
  • stop if the required length or sum is exceeded and try a different sequence
find_sequences_with_sum <- function(target_sum, target_length, current_vec=c(), N=0L, S=0L) {

  stopifnot(target_sum >= target_length)
  
  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # If we're at the right length, have we found a solution?
  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  if (N == target_length) {
    if (S == target_sum) {
      return(list(current_vec))
    } else {
      return()
    }
  }

  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # Limit the choices in the next iteration to just ones that are possible
  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  max_choice <- (target_sum - S) - (target_length - N) + 1L
  choices <- 1:max_choice

  
  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # For each choice, add it to the current_vec and see if that is a pathway
  # to a solution
  #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  solutions <- list()
  for (new_val in choices) {
    this_solution <- find_sequences_with_sum(target_sum, target_length, c(current_vec, new_val), N=N+1L, S=S+new_val)
    solutions <- c(solutions, this_solution)
  }

  solutions
}


do.call(rbind, find_sequences_with_sum(5, 3)) 
     [,1] [,2] [,3]
[1,]    1    1    3
[2,]    1    2    2
[3,]    1    3    1
[4,]    2    1    2
[5,]    2    2    1
[6,]    3    1    1

Quick benchmark

  • For small problems, results are roughly similar.

Small example

expression median itr/sec mem_alloc
find_sequences_with_sum_naive(5, 3) 294µs 3248.245 64.9KB
find_sequences_with_sum(5, 3) 370µs 2640.920 0B

Medium-sized example

  • Once the problem becomes moderately large, the back-tracking solution (to no-one’s surprise) is faster and uses less memory.
  • in this case, ~3x as fast and ~1000x less memory

expression median itr/sec mem_alloc
find_sequences_with_sum_naive(13, 7) 199.9ms 5.509087 104MB
find_sequences_with_sum(13, 7) 66.4ms 15.151297 142KB

Summary

  • Use the back-tracking solution