vignettes/from_json_options.Rmd
from_json_options.Rmd
This vignette:
opts
argument for reading JSON with the
read_json_X()
family of functions.opts_read_json()
opts
argument - Specifying options when reading
JSON
All read_json_x()
functions have an opts
argument. opts
takes a named list of options used to
configure the way yyjsonr
parses JSON into R objects.
The default argument for opts
is an empty list, which
internally sets the default options for parsing.
The default options for parsing can also be viewed by running
opts_read_json()
.
The following three function calls are all equivalent ways of calling
read_json_str()
using the default options:
read_json_str(str)
read_json_str(str, opts = list())
read_json_str(str, opts = opts_read_json())
Setting a single option (and keeping all other options at their default value) can be done in a number of ways.
The following three function calls are all equivalent:
read_json_str(str, opts = list(str_specials = 'string'))
read_json_str(str, opts = opts_read_json(str_specials = 'string'))
read_json_str(str, str_specials = 'string')
promote_num_to_string
- mixtures of numeric and
string types
By default, yyjsonr
does not promote string values to
numerica values i.e. promote_num_to_string = FALSE
.
If an array contains mixed types, then an R list will be returned, so that all JSON values retain their original type.
json <- '[1,2,3.1,"apple", null]'
read_json_str(json)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3.1
#>
#> [[4]]
#> [1] "apple"
#>
#> [[5]]
#> NULL
If promote_num_to_string
is set to TRUE
,
then yyjsonr
will promote numeric types to strings if the
following conditions are met:
null
value
yyjsonr::read_json_str(json, promote_num_to_string = TRUE)
#> [1] "1" "2" "3.100000" "apple" NA
df_missing_list_elem
- Missing list elements
(when parsing data.frames)
When JSON data is being parsed into an R data.frame some columns become list-columns if there are mixed types in the original JSON.
It is possible that some values are completely missing in the JSON
representation, and the df_missing_list_elem
specifies the
replacement for this missing value in the R data.frame. The default
value is df_missing_list_elem = NULL
.
str <- '[{"a":1, "b":2}, {"a":3, "b":4}]'
read_json_str(str)
#> a b
#> 1 1 2
#> 2 3 4
str <- '[{"a":1, "b":[1,2]}, {"a":3, "b":2}]'
read_json_str(str)
#> a b
#> 1 1 1, 2
#> 2 3 2
str <- '[{"a":1, "b":[1,2]}, {"a":2}]'
read_json_str(str)
#> a b
#> 1 1 1, 2
#> 2 2 NULL
read_json_str(str, df_missing_list_elem = NA)
#> a b
#> 1 1 1, 2
#> 2 2 NA
obj_of_arrs_to_df
- Reading JSON as a
data.frame
By default, if JSON looks like it represents a data.frame it will be
loaded as such. That is, a JSON {}
object which contains
only []
arrays (all of equal length) will be treated as
data.frame. This is the default i.e.
obj_of_arrs_to_df = TRUE
.
If obj_of_arrs_to_df = FALSE
then this data will be read
in as a named list. In addition, if the []
arrays are not
all the same length, then the data will also be read in as a named list
as no inference of missing values will be done.
str <- '{"a":[1,2],"b":["apple", "banana"]}'
read_json_str(str)
#> a b
#> 1 1 apple
#> 2 2 banana
read_json_str(str, obj_of_arrs_to_df = FALSE)
#> $a
#> [1] 1 2
#>
#> $b
#> [1] "apple" "banana"
str_unequal <- '{"a":[1,2],"b":["apple", "banana", "carrot"]}'
read_json_str(str_unequal)
#> $a
#> [1] 1 2
#>
#> $b
#> [1] "apple" "banana" "carrot"
arr_of_objs_to_df
- Reading JSON as a
data.frame
str <- '[{"a":1, "b":2}, {"a":3, "b":4}]'
read_json_str(str)
#> a b
#> 1 1 2
#> 2 3 4
read_json_str(str, arr_of_objs_to_df = FALSE)
#> [[1]]
#> [[1]]$a
#> [1] 1
#>
#> [[1]]$b
#> [1] 2
#>
#>
#> [[2]]
#> [[2]]$a
#> [1] 3
#>
#> [[2]]$b
#> [1] 4
str <- '[{"a":1, "b":2}, {"a":3, "b":4, "c":99}]'
read_json_str(str)
#> a b c
#> 1 1 2 NA
#> 2 3 4 99
str_specials
- Reading string "NA"
from JSON
JSON only really has the value null
for representing
special missing values, and this is converted to an R
NA_character_
value when it is encountered in a string-ish
context.
When yyjsonr
encounters a literal "NA"
value in a string-ish context, its conversion to an R value is
controlled by the str_specials
options
The possible values for the str_specials
argument
are:
string
read in as the literal character string
"NA"
(the default behaviour)special
read in as NA_character_
str <- '["hello", "NA", null]'
read_json_str(str) # default: str_specials = 'string'
#> [1] "hello" "NA" NA
read_json_str(str, str_specials = 'special')
#> [1] "hello" NA NA
num_specials
- Reading numeric
"NA"
, "NaN"
and "Inf"
JSON only really has the value null
for representing
special missing values, and this is converted to an R
NA_integer_
or NA_real_
value when it is
encountered in a number-ish context.
When yyjsonr
encounters a literal "NA"
,
"NaN"
or "Inf"
value in a number-ish context,
its conversion to an R value is controlled by the
num_specials
options.
The possible values for the num_specials
argument
are:
special
read in as an actual numeric NA
,
NaN
or Inf
value (the default behaviour)string
read in as the literal character string
"NA"
etc
str <- '[1.23, "NA", "NaN", "Inf", "-Inf", null]'
read_json_str(str) # default: num_specials = 'special'
#> [1] 1.23 NA NaN Inf -Inf NA
read_json_str(str, num_specials = 'string')
#> [[1]]
#> [1] 1.23
#>
#> [[2]]
#> [1] "NA"
#>
#> [[3]]
#> [1] "NaN"
#>
#> [[4]]
#> [1] "Inf"
#>
#> [[5]]
#> [1] "-Inf"
#>
#> [[6]]
#> NULL
int64
- large integer support
JSON supports large integers outside the range of R’s 32-bit integer type.
When such a large value is encountered in JSON, the
int64
option controls the value’s representation in R.
The possible values for the int64
option are:
string
store JSON integer as a string in Rdouble
will store the JSON integer as a double
precisision numeric. If the integer is outside the range +/- 2^53, then
it may not be stored perfectly in the double.bit64
convert to a 64-bit integer supported by the {bit64}
package.
str <- '[1, 274877906944]'
# default: int64 = 'string'
# Since result is a mix of types, a list is returned
read_json_str(str)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] "274877906944"
# Read large integer as double
robj <- read_json_str(str, int64 = 'double')
class(robj)
#> [1] "numeric"
robj
#> [1] 1 274877906944
# Read large integer as 'bit64::integer64' type
library(bit64)
read_json_str(str, int64 = 'bit64')
#> integer64
#> [1] 1 274877906944
length1_array_asis
- distinguishing scalars from
length-1 vectors
JSON supports the concept of both scalar and vector values i.e. in
JSON scalar 67
is different from an array of length 1
[67]
. The length1_array_asis
option is for
situations where it is important to distinguish these value types in
R.
However, R does not make this distinction between scalars and vectors of length 1.
To assist in translating objects from JSON to R and back to JSON,
setting length1_array_asis = TRUE
will mark JSON arrays of
length 1 with the class AsIs
. This option defaults to
FALSE
.
read_json_str('67') |> str()
#> int 67
read_json_str('[67]') |> str()
#> int 67
read_json_str('67' , length1_array_asis = TRUE) |> str()
#> int 67
read_json_str('[67]', length1_array_asis = TRUE) |> str() # Has 'AsIs' class
#> 'AsIs' int 67
This option is then used with the option auto_unbox
when
writing JSON in order to control how length-1 R vectors are written.
Shown below, if the length-1 vector is marked with AsIs
class when reading, then when writing out to JSON with
auto_unbox = TRUE
it becomes a JSON vector value.
In the following example, only the second value ([67]
)
is affected by the option length1_array_asis
. When the
option is TRUE
the value is tagged with a class of
AsIs
. Then when the created R object is subsequently
written out to a JSON string, its structure is determined by
auto_unbox
which understands how to handle this class.
str <- '{"a":67, "b":[67], "c":[1,2]}'
# Length-1 vectors output as JSON arrays
read_json_str(str) |>
write_json_str(auto_unbox = FALSE) |>
cat()
#> {"a":[67],"b":[67],"c":[1,2]}
# Length-1 vectors output as JSON scalars
read_json_str(str) |>
write_json_str(auto_unbox = TRUE) |>
cat()
#> {"a":67,"b":67,"c":[1,2]}
# Length-1 vectors output as JSON arrays
read_json_str(str, length1_array_asis = TRUE) |>
write_json_str(auto_unbox = FALSE) |>
cat()
#> {"a":[67],"b":[67],"c":[1,2]}
# !!!!
# Those values marked with 'AsIs' class when reading are output
# as length-1 JSON arrays
read_json_str(str, length1_array_asis = TRUE) |>
write_json_str(auto_unbox = TRUE) |>
cat()
#> {"a":67,"b":[67],"c":[1,2]}
yyjson_read_flag
- internal YYJSON
C library options
The yyjson
C library supports a number of internal
options for reading JSON.
These options are considered advanced, and the user is referred to
the yyjson
documentation for further explanation on what they control.
Warning: some of these advanced options do not make sense for interfacing with R, or otherwise conflict with how this package converts JSON to R objects.
# A reference list of all the possible YYJSON options
yyjsonr::yyjson_read_flag
#> $YYJSON_READ_NOFLAG
#> [1] 0
#>
#> $YYJSON_READ_INSITU
#> [1] 1
#>
#> $YYJSON_READ_STOP_WHEN_DONE
#> [1] 2
#>
#> $YYJSON_READ_ALLOW_TRAILING_COMMAS
#> [1] 4
#>
#> $YYJSON_READ_ALLOW_COMMENTS
#> [1] 8
#>
#> $YYJSON_READ_ALLOW_INF_AND_NAN
#> [1] 16
#>
#> $YYJSON_READ_NUMBER_AS_RAW
#> [1] 32
#>
#> $YYJSON_READ_ALLOW_INVALID_UNICODE
#> [1] 64
#>
#> $YYJSON_READ_BIGNUM_AS_RAW
#> [1] 128
read_json_str(
"[1, 2, 3, ] // A JSON comment not allowed by the standard",
opts = opts_read_json(yyjson_read_flag = c(
yyjson_read_flag$YYJSON_READ_ALLOW_TRAILING_COMMAS,
yyjson_read_flag$YYJSON_READ_ALLOW_COMMENTS
))
)
#> [1] 1 2 3