Shortest unique prefix

Shortest unique prefix - hack

  • I want to find the shortest unique prefix for each string in a vector (N < 20).
  • This isn’t a computational bottleneck, so no need for heroic/drastic solutions.
  • I wasn’t going to write a trie implementation just to solve this problem.

Unfortunately, this got a bit ugly :( .Anyone got a more elegant solution?

library(tidyverse)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Shortest Unique Prefix
#
# - Number of strings is always < 20
# - 'strings' input does not contain duplicates
# - this isn't a computational bottleneck
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
find_shortest_unique_prefix <- function(strings) {
  
  # Determine uniqueness boolean vector of a set of arguments
  is_unique <- function(...) {x <- list(...); !(duplicated(x) | duplicated(x, fromLast=TRUE))}
  
  # for each possible prefix length, truncate all the strings
  # and see which of them are unique
  prefix_uniqueness <- strings %>% 
    map(str_sub, end=seq(max(nchar(strings)))) %>%
    pmap(is_unique) %>%
    transpose() 
  
  # Find the first occurrence of each string being unique
  shortest_prefix_length <- prefix_uniqueness %>% 
    map_int(
      ~.x %>% flatten_lgl() %>% which.max()
    )
  
  # Trim all the strings to the shortest prefix length
  shortest_prefix <- stringr::str_sub(strings, end=shortest_prefix_length)
  
  shortest_prefix
}
find_shortest_unique_prefix(c("ab", "apple", "apart", 'b', 'ag'))
[1] "ab"  "app" "apa" "b"   "ag" 
find_shortest_unique_prefix(c('blue', 'black', 'bold'))
[1] "blu" "bla" "bo"