Break a string into labelled tokens based upon a set of patterns

lex(text, regexes, verbose = FALSE)

Arguments

text: a single character string
regexes: a named vector of regex strings. Each string represents a regex to match a token, and the name of the string is the label for the token. Each regex can contain an explicit captured group using the standard () brackets. If a regex doesn't not define a captured group then the entire regex will be captured. The regexes will be processed in order such that an early match takes precedence over any later match.
verbose: print more information about the matching process. default: FALSE

Value

a named character vector with the names representing the token type with the value being the element extracted by the corresponding regular expression.

Examples

if (FALSE) {
lex("hello there 123.45", regexes=c(number=re$number, word="(\\w+)", whitespace="(\\s+)"))
}