YAML vs JSON for saving human-readable R objects (TLDR: Use YAML)

Problem: serialization of R list objects to a human readable/editable file

I often use R list objects for storing configuration information within shiny apps. These are used to define multiple configuration sets and are usually saved to file. I then want to be able to edit the objects on the filesystem with a text editor.

What’s a good file format for saving R list objects to file such that they’re easily editable?

Criteria:

  • has an R package I can use
  • easy to read and edit the saved data

Not Criteria:

  • interoperability with other software
  • transmission over a network
  • saving to a database

Possibilities:

  • RDS - even though there’s an ascii option to saveRDS it doesn’t save anything that’s human parseable
  • CSV - Never going to be able to save nested list data (with NULL values!) to a CSV without a whole lot of pre- and post- data transformation
  • YAML - worth a look.
  • JSON - worth a look.
  • XML - hand editing XML nodes in a text file seems like a good way to shoot yourself in the foot.

Actual contenders:

Example R list object

colour_scheme <- list(
  plain = list(
    top         = 'black',
    bottom      = 'black',
    size        = 1,
    angle       = 45.0,
    sub_options = list(),
    border      = NULL,
    tune        = NA
  ),
  fancy = list(
    top         = 'red',
    bottom      = 'blue',
    size        = 5,
    angle       = 17.5,
    sub_options = list(pieces_of_flair = c(0, 37)),
    border      = 'cursive',
    tune        = 'adagio'
  )
)

jsonlite: JSON encoding the R list

colour_scheme_json <- jsonlite::toJSON(colour_scheme, pretty=TRUE) 
colour_scheme_json
{
  "plain": {
    "top": ["black"],
    "bottom": ["black"],
    "size": [1],
    "angle": [45],
    "sub_options": [],
    "border": {},
    "tune": [null]
  },
  "fancy": {
    "top": ["red"],
    "bottom": ["blue"],
    "size": [5],
    "angle": [17.5],
    "sub_options": {
      "pieces_of_flair": [0, 37]
    },
    "border": ["cursive"],
    "tune": ["adagio"]
  }
} 

jsonlite: Restoring the R list from JSON

When the R object is recreated from the JSON, the result is not identical to the original data.

This is because the original border = NULL in the data was represented in JSON as an empty list. Upon restoration, the R object contains border = list() instead of the required NULL

restored_json <- jsonlite::fromJSON(colour_scheme_json)
identical(colour_scheme, restored_json)
[1] FALSE

RJSONIO: JSON encoding the R list

colour_scheme_json2 <- RJSONIO::toJSON(colour_scheme, pretty=TRUE) 
cat(colour_scheme_json2)
{
    "plain" : {
        "top" : "black",
        "bottom" : "black",
        "size" : 1,
        "angle" : 45,
        "sub_options" : [],
        "border" : null,
        "tune" : null
    },
    "fancy" : {
        "top" : "red",
        "bottom" : "blue",
        "size" : 5,
        "angle" : 17.5,
        "sub_options" : {
            "pieces_of_flair" : [
                0,
                37
            ]
        },
        "border" : "cursive",
        "tune" : "adagio"
    }
}

RJSONIO: Restoring the R list from JSON

When the R object is recreated from the JSON, the result is not identical to the original data.

This is because the original tune = NA in the data was represented in JSON as a null. Upon restoration, the R object contains tune=NULL instead of the required NA

RJSONIO allows you to specify how NAs are encoded, but there does not seem to be a direct representation of NA in the JSON format.

restored_json2 <- RJSONIO::fromJSON(colour_scheme_json2)
identical(colour_scheme, restored_json2)
[1] FALSE

YAML encoding the R list

colour_scheme_yaml <- yaml::as.yaml(colour_scheme) 
cat(colour_scheme_yaml)
plain:
  top: black
  bottom: black
  size: 1.0
  angle: 45.0
  sub_options: []
  border: ~
  tune: .na
fancy:
  top: red
  bottom: blue
  size: 5.0
  angle: 17.5
  sub_options:
    pieces_of_flair:
    - 0.0
    - 37.0
  border: cursive
  tune: adagio

Restoring the R list from YAML

When the R object is recreated from the YAML, the result is identical to the original data.

restored_yaml <- yaml::yaml.load(colour_scheme_yaml)
identical(colour_scheme, restored_yaml)
[1] TRUE

Named Vectors and Named lists

Both jsonlite and yaml convert a named vector into a plain array i.e. the names are stripped.

RJSONIO saves a named vector as a map (meaning that names are kept) but as a side effect, restoring a map with only a single primitive datatype produces a vector and not a list.

named_vec   <-    c(a=1, b=2, c=3)
named_list  <- list(a=1, b=2, c=3)
named_list2 <- list(a=1, b='cat')

yaml::as.yaml(named_vec)     %>% cat()
- 1.0
- 2.0
- 3.0
jsonlite::toJSON(named_vec)  %>% cat()
[1,2,3]
RJSONIO::toJSON(named_vec)   %>% cat()
{
 "a":        1,
"b":        2,
"c":        3 
}
RJSONIO::toJSON(named_list)  %>% cat()
{
 "a":        1,
"b":        2,
"c":        3 
}
RJSONIO::toJSON(named_vec)   %>% RJSONIO::fromJSON() %>% class()
[1] "numeric"
RJSONIO::toJSON(named_list)  %>% RJSONIO::fromJSON() %>% class()
[1] "numeric"
RJSONIO::toJSON(named_list2) %>% RJSONIO::fromJSON() %>% class()
[1] "list"

Comparison

A table comparing the representations in YAML and JSON is shown below.

Table 1: Comparison
JSON (jsonlite) JSON (RJSONIO) YAML
strings & names quoted quoted unquoted
single value arrays [] as is as is
vector arrays [] arrays [] one-value-per-line
NULL {} null ~
NA [null] null (or user defined) .na
whitespace not significant not significant used for indentation and syntax. No TABs allowed!
named vector names kept names lost names lost
restored list not identical! not identical identical
  • The YAML representation when restored to an R object is identical to the original R list.
    The JSON restoration is not idential to the original data.
  • The representations of NULL and NA values are “not like R” in both formats
  • JSON’s quoting of all names and strings seems needlessly labour-intensive
  • YAML’s whitespace usage for indentation and structure will be familiar to Python users
  • YAML’s pickiness of “No TABs only spaces!” could be an issue. Need to ensure text editor always replaces TABs with spaces.

Verdict

YAML.

Reasons:

  • restored object is identical to original list
  • more lightweight - no quoting of strings or names. Single values are not represented as short arrays.
  • NULL and NA representation are a teeny tiny bit closer to how I’d think about them in R
  • Familiarity with Python means I’m OK with whitespace being signficant.

Updates

  • 2018-02-03 10:00 - heyaudy gave some helpful feedback that the actual JSON output is dependent upon which package I use, and that RJSONIO handles single values and NULLs better than jsonlite