priorityqueue
priorityqueue
is a package with two
priority queue implementations for R:
one written in C, and the other written in plain R.
Both of these are max-priority-queues i.e. “popping” from the queue returns the value with highest priority.
These priority queues support any R object. The priority can be any finite double()
value.
What’s in the box
Two priority queue implementations with identical call APIs.
- Pure R priority queue:
rpq_create()
,rpq_insert(rpq, obj, priority)
,rpq_pop(rpq)
- C-based priority queue:
cpq_create()
,cpq_insert(cpq, obj, priority)
,cpq_pop(cpq)
Installation
You can install from GitHub with:
# install.package('remotes')
remotes::install_github('coolbutuseless/priorityqueue')
Example
Priority queues are a data-structure to store objects along with a priority with each object.
When popping a value off the queue, the object with the largest priority is returned.
library(priorityqueue)
rpq <- rpq_create()
rpq_is_empty(rpq)
#> [1] TRUE
rpq_insert(rpq, obj = "#RStats", priority = 2)
rpq_insert(rpq, obj = "hello" , priority = 3)
rpq_insert(rpq, obj = 2 * 2 , priority = 1)
rpq_length(rpq)
#> [1] 3
# Retrieve the objects from largest priority to smallest
rpq_pop(rpq)
#> [1] "hello"
rpq_pop(rpq)
#> [1] "#RStats"
rpq_pop(rpq)
#> [1] 4
Rationale
I needed a data structure in R which would allow me to insert/pop items based upon a priority value - what is known as a priority queue.
I wrote a quick solution in R that met my basic needs i.e. an environment
containing a list of objects and a vector of values, plus some standard functions
to manipulate. This code is the basis of the rpq_*
functions in this package.
I then wondered how this could be done with C, and as a learning exercise I
wrote a minimal priority queue in C which supported R objects. This was tidied
to be the C prioirty queue (cpq_*
) in this package.
Timing
In general the C implementation is faster and uses less memory than the R implementation (no surprises there).
For larger queues, the C implementation can be ~10x faster and use 1/30th the memory.
This implementation of a priority queue compares favourably with the priority queue included in the {collections} package on CRAN.
Click to show/hide the timing code
r_priority_queue <- function(N) {
set.seed(1)
rpq <- rpq_create()
for (i in 1:(N*2)) {
rpq_insert(rpq, 1, runif(1))
rpq_insert(rpq, 1, runif(1))
}
for (i in 1:N) {
rpq_insert(rpq, 1, runif(1))
rpq_insert(rpq, 1, runif(1))
rpq_pop(rpq)
rpq_pop(rpq)
rpq_pop(rpq)
}
}
cpq_priority_queue <- function(N) {
set.seed(1)
cpq <- cpq_create()
for (i in 1:(N*2)) {
cpq_insert(cpq, 1, runif(1))
cpq_insert(cpq, 1, runif(1))
}
for (i in 1:N) {
cpq_insert(cpq, 1, runif(1))
cpq_insert(cpq, 1, runif(1))
cpq_pop(cpq)
cpq_pop(cpq)
cpq_pop(cpq)
}
}
collections_priority_queue <- function(N) {
set.seed(1)
colq <- collections::priority_queue()
for (i in 1:(N*2)) {
colq$push(1, runif(1))
colq$push(1, runif(1))
}
for (i in 1:N) {
colq$push(1, runif(1))
colq$push(1, runif(1))
colq$pop()
colq$pop()
colq$pop()
}
}
res <- bench::press(
N = c(10, 100, 1000),
bench::mark(
`Plain R priority queue` = r_priority_queue(N),
`C priority queue` = cpq_priority_queue(N),
`collections::priority_queue()` = collections_priority_queue(N)
)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
expression | N | itr/sec | mem_alloc |
---|---|---|---|
Plain R priority queue | 10 | 774.412196 | 408.02KB |
C priority queue | 10 | 3443.282577 | 226.52KB |
collections::priority_queue() | 10 | 3271.967561 | 278.96KB |
Plain R priority queue | 100 | 61.281542 | 7.94MB |
C priority queue | 100 | 370.743903 | 1.48MB |
collections::priority_queue() | 100 | 317.896057 | 1.47MB |
Plain R priority queue | 1000 | 1.463592 | 645.93MB |
C priority queue | 1000 | 36.510057 | 14.73MB |
collections::priority_queue() | 1000 | 29.865650 | 14.71MB |
Technical Bits
Plain R priority queue
- Implemented as an environment with storage for a list of objects and a vector of priorities.
- New objects and their priorities are appended to the list/vector in the environment.
- When a value is popped, the maximum priority is found (using
whichmax()
) and the matching object is returned
C priority queue
- A heap-based priority queue.
- Creating a
cpq
creates an ExternalPointer to acpq_t
object which contains the queue and other information all entirely within C. - An R list is attached to this ExternalPointer in C (as a protected item on the pointer). When new R objects are added to the priority queue, they are also added to this list to prevent them from being garbage collected.
Acknowledgements
- R Core for developing and maintaining the language.
- CRAN maintainers, for patiently shepherding packages onto CRAN and maintaining the repository