Break a string into labelled tokens based upon a set of patterns
lex(text, regexes, verbose = FALSE)
text | a single character string |
---|---|
regexes | a named vector of regex strings. Each string represents
a regex to match a token, and the name of the string is the
label for the token. Each regex can contain an explicit
captured group using the standard |
verbose | print more information about the matching process. default: FALSE |
a named character vector with the names representing the token type with the value being the element extracted by the corresponding regular expression.
if (FALSE) { lex("hello there 123.45", regexes=c(number=re$number, word="(\\w+)", whitespace="(\\s+)")) }