Problem: serialization of R list
objects to a human readable/editable file
I often use R list
objects for storing configuration information within shiny apps. These are used to define multiple
configuration sets and are usually saved to file. I then want to be able to edit the objects on the filesystem with a text editor.
What’s a good file format for saving R list
objects to file such that they’re easily editable?
Criteria:
- has an R package I can use
- easy to read and edit the saved data
Not Criteria:
- interoperability with other software
- transmission over a network
- saving to a database
Possibilities:
- RDS - even though there’s an
ascii
option tosaveRDS
it doesn’t save anything that’s human parseable - CSV - Never going to be able to save nested
list
data (withNULL
values!) to a CSV without a whole lot of pre- and post- data transformation - YAML - worth a look.
- JSON - worth a look.
- XML - hand editing XML nodes in a text file seems like a good way to shoot yourself in the foot.
Actual contenders:
Example R list object
colour_scheme <- list(
plain = list(
top = 'black',
bottom = 'black',
size = 1,
angle = 45.0,
sub_options = list(),
border = NULL,
tune = NA
),
fancy = list(
top = 'red',
bottom = 'blue',
size = 5,
angle = 17.5,
sub_options = list(pieces_of_flair = c(0, 37)),
border = 'cursive',
tune = 'adagio'
)
)
jsonlite
: JSON encoding the R list
colour_scheme_json <- jsonlite::toJSON(colour_scheme, pretty=TRUE)
colour_scheme_json
{
"plain": {
"top": ["black"],
"bottom": ["black"],
"size": [1],
"angle": [45],
"sub_options": [],
"border": {},
"tune": [null]
},
"fancy": {
"top": ["red"],
"bottom": ["blue"],
"size": [5],
"angle": [17.5],
"sub_options": {
"pieces_of_flair": [0, 37]
},
"border": ["cursive"],
"tune": ["adagio"]
}
}
jsonlite
: Restoring the R list from JSON
When the R object is recreated from the JSON, the result is not identical to the original data.
This is because the original border = NULL
in the data was represented in JSON as
an empty list. Upon restoration, the R object contains border = list()
instead of the
required NULL
restored_json <- jsonlite::fromJSON(colour_scheme_json)
identical(colour_scheme, restored_json)
[1] FALSE
RJSONIO
: JSON encoding the R list
colour_scheme_json2 <- RJSONIO::toJSON(colour_scheme, pretty=TRUE)
cat(colour_scheme_json2)
{
"plain" : {
"top" : "black",
"bottom" : "black",
"size" : 1,
"angle" : 45,
"sub_options" : [],
"border" : null,
"tune" : null
},
"fancy" : {
"top" : "red",
"bottom" : "blue",
"size" : 5,
"angle" : 17.5,
"sub_options" : {
"pieces_of_flair" : [
0,
37
]
},
"border" : "cursive",
"tune" : "adagio"
}
}
RJSONIO
: Restoring the R list from JSON
When the R object is recreated from the JSON, the result is not identical to the original data.
This is because the original tune = NA
in the data was represented in JSON as
a null. Upon restoration, the R object contains tune=NULL
instead of the
required NA
RJSONIO
allows you to specify how NAs
are encoded, but there does not seem to be
a direct representation of NA
in the JSON format.
restored_json2 <- RJSONIO::fromJSON(colour_scheme_json2)
identical(colour_scheme, restored_json2)
[1] FALSE
YAML encoding the R list
colour_scheme_yaml <- yaml::as.yaml(colour_scheme)
cat(colour_scheme_yaml)
plain:
top: black
bottom: black
size: 1.0
angle: 45.0
sub_options: []
border: ~
tune: .na
fancy:
top: red
bottom: blue
size: 5.0
angle: 17.5
sub_options:
pieces_of_flair:
- 0.0
- 37.0
border: cursive
tune: adagio
Restoring the R list from YAML
When the R object is recreated from the YAML, the result is identical to the original data.
restored_yaml <- yaml::yaml.load(colour_scheme_yaml)
identical(colour_scheme, restored_yaml)
[1] TRUE
Named Vectors and Named lists
Both jsonlite
and yaml
convert a named vector into a plain array i.e. the names are stripped.
RJSONIO
saves a named vector as a map (meaning that names are kept) but as a side effect, restoring a map with only
a single primitive datatype produces a vector and not a list.
named_vec <- c(a=1, b=2, c=3)
named_list <- list(a=1, b=2, c=3)
named_list2 <- list(a=1, b='cat')
yaml::as.yaml(named_vec) %>% cat()
- 1.0
- 2.0
- 3.0
jsonlite::toJSON(named_vec) %>% cat()
[1,2,3]
RJSONIO::toJSON(named_vec) %>% cat()
{
"a": 1,
"b": 2,
"c": 3
}
RJSONIO::toJSON(named_list) %>% cat()
{
"a": 1,
"b": 2,
"c": 3
}
RJSONIO::toJSON(named_vec) %>% RJSONIO::fromJSON() %>% class()
[1] "numeric"
RJSONIO::toJSON(named_list) %>% RJSONIO::fromJSON() %>% class()
[1] "numeric"
RJSONIO::toJSON(named_list2) %>% RJSONIO::fromJSON() %>% class()
[1] "list"
Comparison
A table comparing the representations in YAML and JSON is shown below.
JSON (jsonlite) | JSON (RJSONIO) | YAML | |
---|---|---|---|
strings & names | quoted | quoted | unquoted |
single value | arrays [] | as is | as is |
vector | arrays [] | arrays [] | one-value-per-line |
NULL | {} | null | ~ |
NA | [null] | null (or user defined) | .na |
whitespace | not significant | not significant | used for indentation and syntax. No TABs allowed! |
named vector | names kept | names lost | names lost |
restored list | not identical! | not identical | identical |
- The YAML representation when restored to an R object is identical to the original R list.
The JSON restoration is not idential to the original data. - The representations of
NULL
andNA
values are “not like R” in both formats - JSON’s quoting of all names and strings seems needlessly labour-intensive
- YAML’s whitespace usage for indentation and structure will be familiar to Python users
- YAML’s pickiness of “No TABs only spaces!” could be an issue. Need to ensure text editor always replaces TABs with spaces.
Verdict
YAML.
Reasons:
- restored object is identical to original list
- more lightweight - no quoting of strings or names. Single values are not represented as short arrays.
NULL
andNA
representation are a teeny tiny bit closer to how I’d think about them in R- Familiarity with Python means I’m OK with whitespace being signficant.