defaultlist - an R list with a default value

Problem: I want a list with a default value

The default list in R is great, but sometimes I’d like it to return something other than NULL if a name isn’t in the list.

Contrived motivating example

Say I want to use a list as a counter for elements in a stream of data, but I’m not sure before-hand what elements are present. Each time I want to increase the count of a particular name, I first have to manually check if the name is already in the list. If it is, then add 1 to that location, otherwise set the counter for that item to 1.

counter <- list()

things <- c('bob', 'david', 'kate', 'susan', 'susan')

for (thing in things) {
  if (thing %in% names(counter)) {
    counter[[thing]] <- counter[[thing]] + 1L
  } else {
    counter[[thing]] <- 1
  }
}

counter

$bob
[1] 1

$david
[1] 1

$kate
[1] 1

$susan
[1] 2

This is fine, but if I’ve got a lot of counters, I’d like something a bit less clunky.

Preferred syntax

I want something which looks pretty much like a list, but if the requested name is not in the list, then the default value is returned, rather than NULL.

In the case of a counter, I want this new defaultlist to return 0 if the requested name is not present.

counter <- defaultlist(0)

things <- c('bob', 'david', 'kate', 'susan', 'susan')

for (thing in things) {
  counter[[thing]] <- counter[[thing]] + 1
}

`defaultlist` implementation

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#' Create a list with a default value
#'
#' This behaves exactly like a 'list()' object, except if the requested value
#' does not exist, a default value is returned (instead of NULL).
#'
#' Similar to a `defaultdict` in Python
#'
#' @param value default value to return if item not in list
#'
#' @return new `defaultlist` object
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
defaultlist <- function(value) {
  structure(list(), class = 'defaultlist', value = value)
}

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Fetch value from defaultlist
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`[[.defaultlist` <- `$.defaultlist` <- function(x, y) {
  res <- unclass(x)[[y]] 
  if (is.null(res)) attr(x, 'value') else res
}

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Print like a list
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
print.defaultlist <- function(x, ...) {
  attr(x, 'value') <- NULL
  attr(x, 'class') <- NULL
  print(x)
}

`defaultlist` in action - list with a default of ‘0’

counter <- defaultlist(0)

things <- c('bob', 'david', 'kate', 'susan', 'susan')

for (thing in things) {
  counter[[thing]] <- counter[[thing]] + 1
}

counter

$bob
[1] 1

$david
[1] 1

$kate
[1] 1

$susan
[1] 2

Example - `defaultlist` with a default of FALSE

haystack <- defaultlist(FALSE)
haystack[['surprise']] <- TRUE

haystack[['hello?']]
[1] FALSE
haystack$mcfly
[1] FALSE
haystack[['anyone home?']]
[1] FALSE
haystack[['surprise']]
[1] TRUE

Extra Credit Example - using nested defaultlists to count most common letter pairs in a stream of unknown characters

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Create nested defaultlists
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
counter <- defaultlist(defaultlist(0))

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Create a stream of characters heavily weighted towards 'a' and 'e'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
set.seed(1)
stream <- sample(letters[1:5], 1000, replace = TRUE, prob = c(5, 1, 1, 1, 4))
head(stream)

[1] "a" "a" "e" "d" "a" "d"

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Count the pair of characters (prev, this)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
for (idx in 2:length(stream)) {
  this <- stream[[idx]]
  prev <- stream[[idx  - 1]]
  counter[[prev]][[this]] <- counter[[prev]][[this]] + 1
}

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# The most probable letter pair in the stream is: a-a
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sort(unlist(counter), decreasing = TRUE)

a.a a.e e.a e.e a.b b.a b.e d.a a.c e.b a.d c.e c.a e.c e.d d.e b.b c.b c.c d.b 
178 138 137 108  41  39  37  35  32  32  30  30  30  28  27  19   8   8   8   7 
d.c b.d b.c d.d c.d 
  7   7   5   4   4