Example parser: obj format for 3d objects

A simple text file to store 3d objects is the Wavefront obj format. The filetype is well documented on the internet (e.g.  1, 2, 3), and an example octahedron object is show below which has 6 vertices and 8 faces.

octahedron_obj <- '
# OBJ file created by ply_to_obj.c
#
g Object001

v  1  0  0
v  0  -1  0
v  -1  0  0
v  0  1  0
v  0  0  1
v  0  0  -1

f  2  1  5
f  3  2  5
f  4  3  5
f  1  4  5
f  1  2  6
f  2  3  6
f  3  4  6
f  4  1  6
'

The basic structure of a .obj file is:

  • Comments start with # and continue to the end of the line
  • There are symbols at the start of each line telling us what the data on the rest of the line represents, e.g.
    • v means this line defines a vertex and will be followed by 3 numbers representing the x, y, z coordinates.
    • f means this line defines a triangular face and the following 3 numbers indicate the 3 vertices which make up this face
    • vn means this line defines a vector for the direction of the normal at a vertex
  • The format is more complicated than this, and I’m leaving out a lot of details, but this is enough to get the general idea.

Use lex() to turn the text into tokens

  1. Start by defining the regular expression patterns for each element in the obj file.
  2. Use flexo::lex() to turn the obj text into tokens
  3. Throw away whitespace, newlines and comments, since I’m not interested in them.
obj_regexes <- c(
  comment    = '(#.*?)\n',  # assume comments take up the whole line
  number     = flexo::re$number, # matches most numeric values
  symbol     = '\\w+',
  newline    = '\n',
  whitespace = '\\s+'
)

Tokenising the obj

Split the obj text data into tokens, but then remove anything that we don’t need to create the actual data structure representing the 3d object.

tokens <- lex(octahedron_obj, obj_regexes)
tokens <- tokens[!(names(tokens) %in% c('whitespace', 'newline', 'comment'))]
tokens
##      symbol      symbol      symbol      number      number      number 
##         "g" "Object001"         "v"         "1"         "0"         "0" 
##      symbol      number      number      number      symbol      number 
##         "v"         "0"        "-1"         "0"         "v"        "-1" 
##      number      number      symbol      number      number      number 
##         "0"         "0"         "v"         "0"         "1"         "0" 
##      symbol      number      number      number      symbol      number 
##         "v"         "0"         "0"         "1"         "v"         "0" 
##      number      number      symbol      number      number      number 
##         "0"        "-1"         "f"         "2"         "1"         "5" 
##      symbol      number      number      number      symbol      number 
##         "f"         "3"         "2"         "5"         "f"         "4" 
##      number      number      symbol      number      number      number 
##         "3"         "5"         "f"         "1"         "4"         "5" 
##      symbol      number      number      number      symbol      number 
##         "f"         "1"         "2"         "6"         "f"         "2" 
##      number      number      symbol      number      number      number 
##         "3"         "6"         "f"         "3"         "4"         "6" 
##      symbol      number      number      number 
##         "f"         "4"         "1"         "6"

Use TokenStream to help turn the tokens into coherent data.frame

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Initialise a TokenStream object so I can manipulate the stream of tokens
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stream <- TokenStream$new(tokens)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Fast-forward over everything until we get to the first vertex
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jnk <- stream$consume_until(value = 'v', inclusive = FALSE)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# A place to store the intermediate data for vertices and faces
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vlist <- list()
flist <- list()


#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Extract the numeric data for each vertex and face until stream is out of data
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
while (!stream$end_of_stream()) {
  type   <- stream$consume(1)
  values <- stream$consume_while(name = 'number')
  
  if (type == 'v') {
    vlist <- append(vlist, list(as.numeric(values)))
  } else {
    flist <- append(flist, list(as.numeric(values)))
  }
  
}
 
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Combine intermediate data into matrices
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
verts <- do.call(rbind, vlist)
faces <- do.call(rbind, flist)

verts
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0   -1    0
## [3,]   -1    0    0
## [4,]    0    1    0
## [5,]    0    0    1
## [6,]    0    0   -1
faces
##      [,1] [,2] [,3]
## [1,]    2    1    5
## [2,]    3    2    5
## [3,]    4    3    5
## [4,]    1    4    5
## [5,]    1    2    6
## [6,]    2    3    6
## [7,]    3    4    6
## [8,]    4    1    6