Problem: %in%
isn’t strict enough for some of my use cases
%in%
will silently ignore
- arguments of different types
- RHS arguments that don’t match any of the possible LHS inputs
- Empty RHS or LHS
For example, the following is a non-sensical filter in which none of the elements we’re trying to
match against are actually in the Species
vector!
iris %>%
filter(Species %in% c('leaf', 'tree', 'butterfly'))
# A tibble: 0 x 5
# … with 5 variables: Sepal.Length <dbl>, Sepal.Width <dbl>,
# Petal.Length <dbl>, Petal.Width <dbl>, Species <fct>
jonocarrol has brought up this issue as a pretty neat PR to the forcats package.
I’m not trying to be as elegant as he is, so I’m just going to make something work in isolation.
My use case:
- comparing atomic vectors only, usually character strings
- lhs and rhs arguments must be of the same class
- lhs and rhs arguments must have at last 1 element each
Some test cases.
Cases which should run without error
1:5 %in% 2:4 # Good
[1] FALSE TRUE TRUE TRUE FALSE
c('a', 'b') %in% 'a' # Good
[1] TRUE FALSE
Cases for which I’d like an explicit error
c('a', 'b') %in% 'A' # Bad: Some RHS values not in LHS
[1] FALSE FALSE
1:5 %in% c('a', 'b') # Bad: type mismatch
[1] FALSE FALSE FALSE FALSE FALSE
1:5 %in% 5:8 # Bad: Some RHS values not in LHS
[1] FALSE FALSE FALSE FALSE TRUE
1 %in% 1:5 # Bad: type mismatch
[1] TRUE
1:5 %in% integer(0) # Bad: empty RHS
[1] FALSE FALSE FALSE FALSE FALSE
integer(0) %in% 1:5 # Bad: empty LHS
logical(0)
1 %in% list(1, 2, 3) # Bad: RHS not atomic
[1] TRUE
%IN%
- a stricter %in%
%IN%
expects the following
- LHS and RHS are atomic
- LHS and RHS are not empty
- LHS and RHS are identical classes
- All elenents in the RHS are in the LHS
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#' Stricter version of %in%
#'
#' Check arguments for type mismatch, emptiness, atomic-ness and assert that
#' all elements in RHS are also in the LHS
#'
#' @param lhs left-hand side
#' @param rhs right-hand side
#'
#' @return A logical vector the same length as 'lhs' which is TRUE if the
#' correponding value in the lhs exists somewhere in the rhs.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'%IN%' <- function(lhs, rhs) {
if (!is.atomic(lhs) || !is.atomic(rhs)) {
stop("%IN%: LHS and RHS must be atomic")
}
if (length(rhs) == 0) {
stop("%IN%: RHS has no elements")
}
if (length(lhs) == 0) {
stop("%IN%: LHS has no elements")
}
if (!identical(class(lhs), class(rhs))) {
stop("%IN%: Classes not identical. LHS: ", deparse(class(lhs)), " RHS: ", deparse(class(rhs)))
}
if (!all(rhs %in% lhs)) {
stop("%IN%: Some RHS elements not in LHS - ", deparse(setdiff(rhs, lhs), nlines=1))
}
lhs %in% rhs
}
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Negated version
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'%NOTIN%' <- Negate('%IN%')
How %IN%
handles cases which should run without error
1:5 %IN% 2:4 # Good
[1] FALSE TRUE TRUE TRUE FALSE
c('a', 'b') %IN% 'a' # Good
[1] TRUE FALSE
How %IN%
handles cases for which I’d like an explicit error
c('a', 'b') %IN% 'A' # Bad: Some RHS values not in LHS
Error in c("a", "b") %IN% "A": %IN%: Some RHS elements not in LHS - "A"
1:5 %IN% c('a', 'b') # Bad: type mismatch
Error in 1:5 %IN% c("a", "b"): %IN%: Classes not identical. LHS: "integer" RHS: "character"
1:5 %IN% 5:8 # Bad: Some RHS values not in LHS
Error in 1:5 %IN% 5:8: %IN%: Some RHS elements not in LHS - 6:8
1 %IN% 1:5 # Bad: type mismatch
Error in 1 %IN% 1:5: %IN%: Classes not identical. LHS: "numeric" RHS: "integer"
1:5 %IN% integer(0) # Bad: empty RHS
Error in 1:5 %IN% integer(0): %IN%: RHS has no elements
integer(0) %IN% 1:5 # Bad: empty LHS
Error in integer(0) %IN% 1:5: %IN%: LHS has no elements
1 %IN% list(1, 2, 3) # Bad: RHS not atomic
Error in 1 %IN% list(1, 2, 3): %IN%: LHS and RHS must be atomic
Future
- I feel that this version of
%IN%
may be a little too strict to be useful. - Not sure I’m really attached to the name
%IN%
- I assume the vctrs package will solve all of this for me soon :)
- Waiting for someone to tell me that
%IN%
is equivalent to some base R function I’ve never heard of ;)