# Shortest unique prefix - second attempt

## Shortest unique prefix - hack part #2

• I want to find the shortest unique prefix for each string in a vector (N < 20)
• i.e. in a vector of strings, what’s the shortest a string can be and still match only itself
• This isn’t a computational bottleneck in the overall process, so there’s no need for heroic/drastic solutions.
• I wasn’t going to write a trie implementation just to solve this problem.

Yesterday’s solution got a bit ugly.

Today I remembered there’s always the function you need in base R, and found `pmatch()` and `charmatch()` which do partial string matching.

`charmatch` seeks matches for the elements of its first argument among those of its second. If there is a single exact match or a unique partial match, then the index of the matching value is returned.

`charmatch` does all the heavy lifting of matching a full set of prefixes against the full set of strings. Since the creation of all the prefixes is done from shortest to longest, then the first prefix that matches only a single string is the shortest unique prefix for that string.

``````library(tidyverse)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Shortest Unique Prefix
#
# - Number of strings is always < 20
# - 'strings' input does not contain duplicates
# - this isn't a computational bottleneck in the process, so any neat solution
#   is acceptable
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
find_shortest_unique_prefix <- function(strings) {

stopifnot(!any(duplicated(strings)))

# Calculate all possible prefixes of all strings
all_prefixes <- strings %>%
map(str_sub, end=seq(max(nchar(strings)))) %>%
flatten_chr()

# find 'charmatch'es
matches <- charmatch(all_prefixes, strings)
idx     <- !duplicated(matches) & matches!=0

all_prefixes[idx]
}``````
``find_shortest_unique_prefix(c("ab", "apple", "apart", 'b', 'ag'))``
``[1] "ab"  "app" "apa" "b"   "ag" ``
``find_shortest_unique_prefix(c('blue', 'black', 'bold'))``
``[1] "blu" "bla" "bo" ``