Break a string into labelled tokens based upon a set of patterns
lex(text, regexes, verbose = FALSE, ...)
text | a single character string |
---|---|
regexes | a named vector of regex strings. Each string represents
a regex to match a token, and the name of the string is the
label for the token. Each regex can contain an explicit
captured group using the standard |
verbose | print more information about the matching process. default: FALSE |
... | further arguments passed to |
a named character vector with the names representing the token type with the value being the element extracted by the corresponding regular expression.
#> word whitespace word whitespace number #> "hello" " " "there" " " "123.45"