Using the 'jsoncons' Library in R
Martin Morgan
Roswell Park Comprehensive Cancer Center, Buffalo, NY, USMarcel Ramos
CUNY School of Public Health at Hunter College, New York, NY, USSource:
vignettes/rjsoncons.Rmd
rjsoncons.Rmd
Introduction & installation
This package provides the header-only ‘jsoncons’ library for manipulating JSON objects. Use rjsoncons for querying JSON or R objects using JMESpath, JSONpath, or JSONpointer. Link to the package for direct access to the ‘jsoncons’ C++ library.
Install the released package version from CRAN
install.pacakges("rjsoncons", repos = "https://CRAN.R-project.org")
Install the development version with
if (!requireNamespace("remotes", quiety = TRUE))
install.packages("remotes", repos = "https://CRAN.R-project.org")
remotes::install_github("mtmorgan/rjsoncons")
Attach the installed package to your R session, and check the version of the C++ library in use
JSON Use cases
Select, filter and transform
Here is a simple JSON example document
json <- '{
"locations": [
{"name": "Seattle", "state": "WA"},
{"name": "New York", "state": "NY"},
{"name": "Bellevue", "state": "WA"},
{"name": "Olympia", "state": "WA"}
]
}'
There are several common use cases. Use rjsoncons to query the JSON string using JSONpath, JMESPath or JSONpointer syntax to filter larger documents to records of interest, e.g., only cities in New York state, using ‘JMESpath’ syntax.
Use the as = "R"
argument to extract deeply nested
elements as R objects, e.g., a character vector of city names
in Washington state.
j_query(json, "locations[?state == 'WA'].name", as = "R")
## [1] "Seattle" "Bellevue" "Olympia"
The JSON Pointer specification is simpler, indexing a single object in the document. JSON arrays are 0-based.
j_query(json, "/locations/0/state")
## [1] "WA"
The examples above use j_query()
, which automatically
infers query specification from the form of path
using
j_path_type()
. It may be useful to indicate query
specification more explicitly using jsonpointer()
,
jsonpath()
, or jmespath()
; examples
illustrating features available for each query specification are on the
help pages ?jsonpointer
, ?jsonpath
, and
?jmespath
.
Array-of-objects to R data.frame
The following transforms a nested JSON document into a format that
can be incorporated directly in R as a
data.frame
.
path <- '{
name: locations[].name,
state: locations[].state
}'
j_query(json, path, as = "R") |>
data.frame()
## name state
## 1 Seattle WA
## 2 New York NY
## 3 Bellevue WA
## 4 Olympia WA
The transformation from JSON ‘array-of-objects’ to ‘object-of-arrays’
suitable for direct representation as a data.frame
is
common, and is implemented directly as j_pivot()
j_pivot(json, "locations", as = "data.frame")
## name state
## 1 Seattle WA
## 2 New York NY
## 3 Bellevue WA
## 4 Olympia WA
R objects as input
rjsoncons can
filter and transform R objects. These are converted to JSON
using jsonlite::toJSON()
before queries are made;
toJSON()
arguments like auto_unbox = TRUE
can
be added to the function call.
NDJSON support
rjsoncons supports ‘NDJSON’(new-line delimited JSON). NDJSON consists of a file or character vector where each line / element represents a JSON record.
j_query()
provides a one-to-one mapping of NDJSON lines
/ elements to the return value, e.g.,
j_query(ndjson_file, "@", as = "string")
on an NDJSON file
with 1000 lines will return a character vector of 1000 elements, or with
j_query(ndjson, "@", as = "R")
an R list with
length 1000.
ndjson_file <- system.file(package = "rjsoncons", "extdata", "example.ndjson")
j_query(ndjson_file, "{state: state, name: name}")
## [1] "{\"state\":\"WA\",\"name\":\"Seattle\"}"
## [2] "{\"state\":\"NY\",\"name\":\"New York\"}"
## [3] "{\"state\":\"WA\",\"name\":\"Bellevue\"}"
## [4] "{\"state\":\"WA\",\"name\":\"Olympia\"}"
NDJSON files can be large, and are easy to iterate through. The queries are therefore processed in ‘chunks’, and a finite number of records can be processed
j_query(ndjson_file, "{state: state, name: name}", n_records = 2)
## [1] "{\"state\":\"WA\",\"name\":\"Seattle\"}"
## [2] "{\"state\":\"NY\",\"name\":\"New York\"}"
Note that n_records
refers to the number of lines input,
rather than the number of lines satisfying path=
.
The JSON parser
The package includes a JSON parser, used with the argument
as = "R"
or directly with as_r()
The main rules of this transformation are outlined here. JSON arrays of a single type (boolean, integer, double, string) are transformed to R vectors of the same length and corresponding type.
as_r('[true, false, true]') # boolean -> logical
## [1] TRUE FALSE TRUE
as_r('[1, 2, 3]') # integer -> integer
## [1] 1 2 3
as_r('[1.0, 2.0, 3.0]') # double -> numeric
## [1] 1 2 3
as_r('["a", "b", "c"]') # string -> character
## [1] "a" "b" "c"
JSON arrays mixing integer and double values are transformed to R numeric vectors.
If a JSON integer array contains a value larger than R’s
32-bit integer representation, the array is transformed to an R
numeric vector. NOTE that this results in loss of precision for JSON
integer values greater than 2^53
.
JSON objects are transformed to R named lists.
as_r('{}')
## named list()
as_r('{"a": 1.0, "b": [2, 3, 4]}') |> str()
## List of 2
## $ a: num 1
## $ b: int [1:3] 2 3 4
There are several additional details. A JSON scalar and a JSON vector of length 1 are represented in the same way in R.
JSON arrays mixing types other than integer and double are transformed to R lists
JSON null
values are represented as R
NULL
values; arrays of null
are transformed to
lists
as_r('null') # NULL
## NULL
as_r('[null]') |> str() # list(NULL)
## List of 1
## $ : NULL
as_r('[null, null]') |> str() # list(NULL, NULL)
## List of 2
## $ : NULL
## $ : NULL
Ordering of object members is controlled by the
object_names=
argument. The default preserves names as they
appear in the JSON definition; use "sort"
to sort names
alphabetically. This argument is applied recursively.
json <- '{"b": 1, "a": {"d": 2, "c": 3}}'
as_r(json) |> str()
## List of 2
## $ b: int 1
## $ a:List of 2
## ..$ d: int 2
## ..$ c: int 3
as_r(json, object_names = "sort") |> str()
## List of 2
## $ a:List of 2
## ..$ c: int 3
## ..$ d: int 2
## $ b: int 1
The parser corresponds approximately to
jsonlite::fromJSON()
with arguments
simplifyVector = TRUE, simplifyDataFrame = FALSE, simplifyMatrix = FALSE)
.
Unit tests (using the tinytest
framework) providing additional details are available at
system.file(package = "rjsoncons", "tinytest", "test_as_r.R")
Using jsonlite::fromJSON()
The built-in parser can be replaced by alternative parsers by
returning the query as a JSON string, e.g., using the
fromJSON()
in the jsonlite
package.
j_query(json, "locations[?state == 'WA']") |>
## `fromJSON()` simplifies list-of-objects to data.frame
jsonlite::fromJSON()
## NULL
The rjsoncons
package is particularly useful when accessing elements that might
otherwise require complicated application of nested
lapply()
, purrr expressions,
or tidyr
unnest_*()
(see R for
Data Science chapter ‘Hierarchical data’).
C++ library use in other packages
The package includes the complete ‘jsoncons’ C++ header-only library, available to other R packages by adding
LinkingTo: rjsoncons
SystemRequirements: C++11
to the DESCRIPTION file. Typical use in an R package would also
include LinkingTo:
specifications for the cpp11 or Rcpp (this package
uses cpp11)
packages to provide a C / C++ interface between R and the C++ ‘jsoncons’
library.
Session information
This vignette was compiled using the following software versions
sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rjsoncons_1.1.0.9401 BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] vctrs_0.6.5 cli_3.6.2 knitr_1.45
## [4] rlang_1.1.3 xfun_0.41 stringi_1.8.3
## [7] purrr_1.0.2 textshaping_0.3.7 jsonlite_1.8.8
## [10] glue_1.7.0 htmltools_0.5.7 ragg_1.2.7
## [13] sass_0.4.8 rmarkdown_2.25 evaluate_0.23
## [16] jquerylib_0.1.4 fastmap_1.1.1 yaml_2.3.8
## [19] lifecycle_1.0.4 memoise_2.0.1 bookdown_0.37
## [22] BiocManager_1.30.22 stringr_1.5.1 compiler_4.3.2
## [25] fs_1.6.3 systemfonts_1.0.5 digest_0.6.34
## [28] R6_2.5.1 magrittr_2.0.3 bslib_0.6.1
## [31] tools_4.3.2 pkgdown_2.0.7 cachem_1.0.8
## [34] desc_1.4.3