Problem: p.which.min()
- Input:
vs
A list of N vectors of numeric values (each the same length, L) - Output: For each index along the vectors (1..L)
- the minimum value out of the N vectors at this index
- the index of the vector that had the minimum (1..N)
- Conditions:
- If there are multiple vectors with the same minimum at the given index, it doesn’t matter which is returned.
- The values are floating points (not integers).
This problem is a combination of:
* find the parallel `min`, and
* find the parallel `which.min`
.. and so it shall be named p.which.min()
!!
Preamble
Some R functions discussed below operate on lists of vectors, others operator on matrixes. The following 2 functions switch between the two data representations.
vs_to_mat <- function(vs) { do.call(cbind, vs) }
mat_to_vs <- function(mat) {lapply(seq_len(ncol(mat)), function(i) mat[,i])}
Example
- An example with N=3, L=5
vs <- list(
c(3, 4, 6, 9, 2),
c(9, 9, 7, 6, 1),
c(2, 2, 7, 4, 8)
)
mat <- vs_to_mat(vs)
mat
[,1] [,2] [,3]
[1,] 3 9 2
[2,] 4 9 2
[3,] 6 7 7
[4,] 9 6 4
[5,] 2 1 8
In this example, the parallel minimum across all vectors is c(2, 2, 6, 4, 1)
and the index of the
vector in which the minimum occurs is c(3, 3, 1, 3, 2)
# desired result
list(
min = c(2, 2, 6, 4, 1),
index = c(3, 3, 1, 3, 2)
)
Create test data
- This is the function used to create benchmarking data, e.g.
create_test_data(N=100, L=100)
#-----------------------------------------------------------------------------
# Create a list of N numeric vectors, each with L elements
#-----------------------------------------------------------------------------
create_test_data <- function(N, L) {
set.seed(1)
seq(N) %>% purrr::map(~runif(L))
}
#-----------------------------------------------------------------------------
# Benchmarking data
#-----------------------------------------------------------------------------
vs.bench <- create_test_data(300, 300)
mat.bench <- vs_to_mat(vs.bench)
Find the parallel minimum: The pmin()
function
The pmin
function is part of base R and are exactly what is needed
for the first part of this problem.
pmin
returns the parallel minima of the input values. That is, it will return the value of
the minimum at each index (but not the index of the vector from which it came)
pmin
takes a ...
argument, so either pass in all the vectors one-at-a-time in the function call
or wrap it with a do.call()
i.e. do.call(pmin.int, vs)
pmin(c(3, 4, 6, 9, 2), c(9, 9, 7, 6, 1), c(2, 2, 7, 4, 8))
[1] 2 2 6 4 1
do.call(pmin, vs)
[1] 2 2 6 4 1
There is also a built-in pmin.int
. The int
suffix stands for internal
, not integer
!
This is a faster version of the pmin
function when all arguments are atomic vectors
and there are no classes.
do.call(pmin.int, vs)
[1] 2 2 6 4 1
Find the parallel which.min
: The max.col
function
The max.col(matrix)
function finds the maximum position for each row of a matrix. For some
reason there is no corresponding min.col
, but this can be faked with max.col(-matrix)
.
This only operates on a matrix, so the vs
list will need to be converted to a matrix
mat <- vs_to_mat(vs)
max.col(-mat)
[1] 3 3 1 3 2
R solution - using pmin.int
and max.col
#-----------------------------------------------------------------------------
# Find parallel min and which.min over a list of vectors
#-----------------------------------------------------------------------------
p.which.min <- function(vs) {
mins <- do.call(pmin.int, vs)
mat <- vs_to_mat(vs)
index <- max.col(-mat)
list(
min = mins,
index = index
)
}
p.which.min(vs)
$min
[1] 2 2 6 4 1
$index
[1] 3 3 1 3 2
Naive Rcpp code
- Initial Rcpp code for
- Anyone have a more idiomatic approach?
Rcpp::cppFunction('
List p_which_min_Rcpp(List vs) {
int N = vs.size();
NumericVector v = vs[0];
int L = v.size();
NumericVector min_so_far(L);
IntegerVector min_index(L); // Automatically initialised to zero
// Initialise the "min so far" to just be the first vector
min_so_far = v;
// Iterate over each vector in turn (starting from the second one)
for (int i=1; i<N; i++) {
NumericVector vi = vs[i];
for (int j=0; j<L; j++) {
if (vi[j] < min_so_far[j]) {
min_so_far[j] = vi[j];
min_index[j] = i;
}
}
}
List ret;
ret["min"] = min_so_far;
ret["index"] = min_index + 1; // convert to R based indexing
return ret;
}')
p_which_min_Rcpp(vs)
$min
[1] 2 2 6 4 1
$index
[1] 3 3 1 3 2
Benchmarking
Unit: microseconds
expr min lq mean median uq max
p.which.min 1034.128 1558.653 1850.7818 1651.3805 1814.9050 9661.718
p_which_min_Rcpp 73.336 84.684 106.9455 93.4695 108.6985 1276.364
neval
200
200
Conclusions and Extensions
- Anyone know of a better pure R solution?
- Would still like an idiomatic Rcpp version.