Break a string into labelled tokens based upon a set of patterns
lex(text, regexes, verbose = FALSE)
a single character string
a named vector of regex strings. Each string represents
a regex to match a token, and the name of the string is the
label for the token. Each regex can contain an explicit
captured group using the standard ()
brackets. If a regex
doesn't not define a captured group then the entire regex will
be captured. The regexes
will be processed in order such that an early match takes
precedence over any later match.
print more information about the matching process. default: FALSE
a named character vector with the names representing the token type with the value being the element extracted by the corresponding regular expression.
if (FALSE) {
lex("hello there 123.45", regexes=c(number=re$number, word="(\\w+)", whitespace="(\\s+)"))
}