Introducing {priorityqueue} - two simple priority queues. One written in R, the other in C

priorityqueue

R-CMD-check

priorityqueue is a package with two priority queue implementations for R: one written in C, and the other written in plain R.

Both of these are max-priority-queues i.e. “popping” from the queue returns the value with highest priority.

These priority queues support any R object. The priority can be any finite double() value.

What’s in the box

Two priority queue implementations with identical call APIs.

  • Pure R priority queue:
    • rpq_create(), rpq_insert(rpq, obj, priority), rpq_pop(rpq)
  • C-based priority queue:
    • cpq_create(), cpq_insert(cpq, obj, priority), cpq_pop(cpq)

Installation

You can install from GitHub with:

# install.package('remotes')
remotes::install_github('coolbutuseless/priorityqueue')

Example

Priority queues are a data-structure to store objects along with a priority with each object.

When popping a value off the queue, the object with the largest priority is returned.

library(priorityqueue)

rpq <- rpq_create()
rpq_is_empty(rpq)
#> [1] TRUE
rpq_insert(rpq, obj = "#RStats", priority = 2)
rpq_insert(rpq, obj = "hello"  , priority = 3)
rpq_insert(rpq, obj = 2 * 2    , priority = 1)

rpq_length(rpq)
#> [1] 3
# Retrieve the objects from largest priority to smallest
rpq_pop(rpq)
#> [1] "hello"
rpq_pop(rpq)
#> [1] "#RStats"
rpq_pop(rpq)
#> [1] 4

Rationale

I needed a data structure in R which would allow me to insert/pop items based upon a priority value - what is known as a priority queue.

I wrote a quick solution in R that met my basic needs i.e. an environment containing a list of objects and a vector of values, plus some standard functions to manipulate. This code is the basis of the rpq_* functions in this package.

I then wondered how this could be done with C, and as a learning exercise I wrote a minimal priority queue in C which supported R objects. This was tidied to be the C prioirty queue (cpq_*) in this package.

Timing

In general the C implementation is faster and uses less memory than the R implementation (no surprises there).

For larger queues, the C implementation can be ~10x faster and use 1/30th the memory.

This implementation of a priority queue compares favourably with the priority queue included in the {collections} package on CRAN.

Click to show/hide the timing code
r_priority_queue <- function(N) {
  set.seed(1)
  rpq <- rpq_create()
  
  for (i in 1:(N*2)) {
    rpq_insert(rpq, 1, runif(1))
    rpq_insert(rpq, 1, runif(1))
  }
  
  for (i in 1:N) {
    rpq_insert(rpq, 1, runif(1))
    rpq_insert(rpq, 1, runif(1))
    rpq_pop(rpq)
    rpq_pop(rpq)
    rpq_pop(rpq)
  }
}


cpq_priority_queue <- function(N) {
  set.seed(1)
  cpq <- cpq_create()
  
  for (i in 1:(N*2)) {
    cpq_insert(cpq, 1, runif(1))
    cpq_insert(cpq, 1, runif(1))
  }
  
  for (i in 1:N) {
    cpq_insert(cpq, 1, runif(1))
    cpq_insert(cpq, 1, runif(1))
    cpq_pop(cpq)
    cpq_pop(cpq)
    cpq_pop(cpq)
  }
}



collections_priority_queue <- function(N) {
  set.seed(1)
  colq <- collections::priority_queue()
  
  for (i in 1:(N*2)) {
    colq$push(1, runif(1))
    colq$push(1, runif(1))
  }
  
  for (i in 1:N) {
    colq$push(1, runif(1))
    colq$push(1, runif(1))
    colq$pop()
    colq$pop()
    colq$pop()
  }
}


res <- bench::press(
  N = c(10, 100, 1000),
  bench::mark(
    `Plain R priority queue` = r_priority_queue(N), 
    `C priority queue` = cpq_priority_queue(N),
    `collections::priority_queue()` = collections_priority_queue(N)
  )
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
expression N itr/sec mem_alloc
Plain R priority queue 10 774.412196 408.02KB
C priority queue 10 3443.282577 226.52KB
collections::priority_queue() 10 3271.967561 278.96KB
Plain R priority queue 100 61.281542 7.94MB
C priority queue 100 370.743903 1.48MB
collections::priority_queue() 100 317.896057 1.47MB
Plain R priority queue 1000 1.463592 645.93MB
C priority queue 1000 36.510057 14.73MB
collections::priority_queue() 1000 29.865650 14.71MB

Technical Bits

Plain R priority queue

  • Implemented as an environment with storage for a list of objects and a vector of priorities.
  • New objects and their priorities are appended to the list/vector in the environment.
  • When a value is popped, the maximum priority is found (using whichmax()) and the matching object is returned

C priority queue

  • A heap-based priority queue.
  • Creating a cpq creates an ExternalPointer to a cpq_t object which contains the queue and other information all entirely within C.
  • An R list is attached to this ExternalPointer in C (as a protected item on the pointer). When new R objects are added to the priority queue, they are also added to this list to prevent them from being garbage collected.

Acknowledgements

  • R Core for developing and maintaining the language.
  • CRAN maintainers, for patiently shepherding packages onto CRAN and maintaining the repository