Finding names in R packages which could be domain names (using rvest and the tidyverse)

Introduction

Twitter likes to turn things which look like URLs into URLS.

I tweeted some rstats code last week and twitter decided that seq.int was actually http://seq.int.

According to wikipedia, the .int TLD has the “strictest application policies of all TLDs”, so there’s no chance of registering that particular domain.

Out of the other standard R packages in base in the tidyverse, which ones could be domain names?

Valid Top Level Domains (TLDs)

According to wikipedia the list changes frequently, so I’m just going to grab them today as a snapshot in time.

library(tidyverse)
library(xml2)
library(rvest)

#-----------------------------------------------------------------------------
# - Read in HTML
# - Extract tables with TLDs
#
# - https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
#-----------------------------------------------------------------------------
html   <- xml2::read_html("../../static/img/TLDs/List_of_Internet_top-level_domains")
tables <- rvest::html_nodes(html, 'table')


#-----------------------------------------------------------------------------
# Extract all the TLDs from the html tables
#-----------------------------------------------------------------------------
tlds <- tables[7:65] %>%
  purrr::map_dfr(~rvest::html_table(.x, fill=TRUE)[,1:2] %>% 
                   set_names(c('name', 'tld_info'))) %>%
  as.tbl() %>%
  filter(grepl("^\\.", name)) %>%
  mutate(
    name     = stringr::str_sub(name, start=2),
    tld_info = gsub("\n", ' ', tld_info)
  ) %>%
  replace_na(list(tld_info=''))
Table 1: Sample of TLDs
name tld_info
dnp Dai Nippon Printing Co.
tips general help topics
rsvp Invitations and replies
energy energy industry and marketing
dental dentists
attorney attorneys and legal organizations
swatch The Swatch Group Ltd
style fashion
rich businesses catering to the wealthy
hotels

Names in base, tidyverse and misc packages

#-----------------------------------------------------------------------------
# Packages from which to extract names
#-----------------------------------------------------------------------------
packages <- c(
  'base', 'graphics', 'grDevices', 
  'stats', 'stats4', 'utils', "broom", 
  "cli", "crayon", "dplyr", "dbplyr", "forcats", "ggplot2", 
  "haven", "hms", "httr", "jsonlite", "lubridate", "magrittr", 
  "modelr", "purrr", "readr", "reprex", "rlang", 
  "rstudioapi", "rvest", "stringr", "tibble", "tidyr", "xml2", 
  'Hmisc', 'zoo', 'knitr', 'Rcpp', 'shiny', 'xtable', 
  'data.table', 'devtools', 'testthat', 'roxygen2'
)


#-----------------------------------------------------------------------------
# Helper function for extracting names from a package
#-----------------------------------------------------------------------------
extract_names_from_package <- function(package_name) {
  data_frame(
    package   = package_name,
    function_ = ls(asNamespace(package_name))
  )
}


#-----------------------------------------------------------------------------
# Extract names from all named packages
#-----------------------------------------------------------------------------
all_names <- packages %>% 
  purrr::map_dfr(extract_names_from_package)
Warning: `data_frame()` is deprecated, use `tibble()`.
This warning is displayed once per session.
#-----------------------------------------------------------------------------
# Keep only those which contain a "."
#-----------------------------------------------------------------------------
possible_tlds <- all_names %>%
  filter(grepl("\\.", function_)) %>%
  mutate(
    tld = purrr::map_chr(strsplit(function_, '\\.'), tail, 1)
  )


#-----------------------------------------------------------------------------
# Keep only those whose last part of their name matches a TLD
#-----------------------------------------------------------------------------
possible_tlds %>% 
  inner_join(tlds, by=c(tld='name')) %>%
  knitr::kable()
package function_ tld tld_info
base as.call call
base as.name name individuals, by name
base do.call call
base file.info info information
base file.link link connecting to information[75]
base file.show show entertainment and vlogs
base format.info info information
base is.call call
base is.na na Namibia
base is.name name individuals, by name
base match.call call
base match.fun fun
base month.abb abb ABB Ltd
base month.name name individuals, by name
base print.by by Belarus
base sys.call call
base Sys.info info information
graphics layout.show show entertainment and vlogs
graphics plot.design design graphic art and fashion
graphics plot.new new general
grDevices dev.new new general
grDevices dev.off off
grDevices graphics.off off
grDevices quartz.save save
stats ansari.test test
stats bartlett.test test
stats binom.test test
stats Box.test test
stats chisq.test test
stats cor.test test
stats fisher.test test
stats fligner.test test
stats friedman.test test
stats glm.fit fit Fitness and exercise
stats kruskal.test test
stats ks.test test
stats lm.fit fit Fitness and exercise
stats make.link link connecting to information[75]
stats mantelhaen.test test
stats mauchly.test test
stats mcnemar.test test
stats mood.test test
stats na.fail fail general
stats oneway.test test
stats pairwise.prop.test test
stats pairwise.t.test test
stats pairwise.wilcox.test test
stats poisson.test test
stats power.anova.test test
stats power.prop.test test
stats power.t.test test
stats PP.test test
stats predict.ar ar Argentina
stats predict.smooth.spline.fit fit Fitness and exercise
stats print.ar ar Argentina
stats print.family family families
stats prop.test test
stats prop.trend.test test
stats quade.test test
stats shapiro.test test
stats spec.ar ar Argentina
stats t.test test
stats var.test test
stats wilcox.test test
utils bug.report report business services
utils bug.report.info info information
utils create.post post postal services
utils help.search search
utils index.search search
utils process.events events happenings
utils url.show show entertainment and vlogs
broom tidy.map map
ggplot2 +.gg gg Guernsey
ggplot2 fortify.map map
ggplot2 is.zero zero
rlang print.box box individuals and businesses, in order to promote personal cloud storage
rvest format.select select
Hmisc sas.codes codes computer and/or encryption code enthusiasts
Hmisc smean.sd sd Sudan
Hmisc spearman.test test
Hmisc string.bounding.box box individuals and businesses, in order to promote personal cloud storage
Hmisc xy.group group
zoo as.yearmon.date date online dating
zoo as.yearqtr.date date online dating
shiny file.exists.ci ci Côte d’Ivoire
shiny file.path.ci ci Côte d’Ivoire
shiny find.file.ci ci Côte d’Ivoire
xtable as.is is Iceland
xtable sanitize.final final
devtools file.info info information
devtools print.doctor doctor
roxygen2 object_defaults.data data
roxygen2 object_usage.data data

Notes

  • Impossible
    • is.na is not supported as the registry requires at least 3 characters. in the first part
    • .call is by amazon registry and i can’t see how to even try and register
    • .off - I couldn’t find a registry that sold it.
    • .test (like .example) is purely for testing
    • .map and .new seems to be associated with google registry.
  • Expensive
    • lm.fit was quoted as available for $1200
    • match.fun quoted as available for $7500
  • Taken
    • as.name
    • is.name
    • sys.info
    • na.fail
  • Possible
    • glm.fit for just $64
    • POSIXct.date and POSIXlt.date may be available. $30
    • spline.fit for $33