Skip to contents

Introduction & installation

This package provides the header-only ‘jsoncons’ library for manipulating JSON objects. Use rjsoncons for querying JSON or R objects using JMESpath, JSONpath, or JSONpointer. Link to the package for direct access to the ‘jsoncons’ C++ library.

Install the released package version from CRAN

install.pacakges("rjsoncons", repos = "https://CRAN.R-project.org")

Install the development version with

if (!requireNamespace("remotes", quiety = TRUE))
    install.packages("remotes", repos = "https://CRAN.R-project.org")
remotes::install_github("mtmorgan/rjsoncons")

Attach the installed package to your R session, and check the version of the C++ library in use

library(rjsoncons)
rjsoncons::version()
## [1] "0.173.2"

JSON Use cases

Select, filter and transform

Here is a simple JSON example document

json <- '{
  "locations": [
    {"name": "Seattle", "state": "WA"},
    {"name": "New York", "state": "NY"},
    {"name": "Bellevue", "state": "WA"},
    {"name": "Olympia", "state": "WA"}
  ]
}'

There are several common use cases. Use rjsoncons to query the JSON string using JSONpath, JMESPath or JSONpointer syntax to filter larger documents to records of interest, e.g., only cities in New York state, using ‘JMESpath’ syntax.

j_query(json, "locations[?state == 'NY']") |>
    cat("\n")
## [{"name":"New York","state":"NY"}]

Use the as = "R" argument to extract deeply nested elements as R objects, e.g., a character vector of city names in Washington state.

j_query(json, "locations[?state == 'WA'].name", as = "R")
## [1] "Seattle"  "Bellevue" "Olympia"

The JSON Pointer specification is simpler, indexing a single object in the document. JSON arrays are 0-based.

j_query(json, "/locations/0/state")
## [1] "WA"

The examples above use j_query(), which automatically infers query specification from the form of path using j_path_type(). It may be useful to indicate query specification more explicitly using jsonpointer(), jsonpath(), or jmespath(); examples illustrating features available for each query specification are on the help pages ?jsonpointer, ?jsonpath, and ?jmespath.

Array-of-objects to R data.frame

The following transforms a nested JSON document into a format that can be incorporated directly in R as a data.frame.

path <- '{
    name: locations[].name,
    state: locations[].state
}'
j_query(json, path, as = "R") |>
    data.frame()
##       name state
## 1  Seattle    WA
## 2 New York    NY
## 3 Bellevue    WA
## 4  Olympia    WA

The transformation from JSON ‘array-of-objects’ to ‘object-of-arrays’ suitable for direct representation as a data.frame is common, and is implemented directly as j_pivot()

j_pivot(json, "locations", as = "data.frame")
##       name state
## 1  Seattle    WA
## 2 New York    NY
## 3 Bellevue    WA
## 4  Olympia    WA

R objects as input

rjsoncons can filter and transform R objects. These are converted to JSON using jsonlite::toJSON() before queries are made; toJSON() arguments like auto_unbox = TRUE can be added to the function call.

## `lst` is an *R* list
lst <- jsonlite::fromJSON(json, simplifyVector = FALSE)
j_query(lst, "locations[?state == 'WA'].name | sort(@)", auto_unbox = TRUE) |>
    cat("\n")
## ["Bellevue","Olympia","Seattle"]

NDJSON support

rjsoncons supports ‘NDJSON’(new-line delimited JSON). NDJSON consists of a file or character vector where each line / element represents a JSON record.

j_query() provides a one-to-one mapping of NDJSON lines / elements to the return value, e.g., j_query(ndjson_file, "@", as = "string") on an NDJSON file with 1000 lines will return a character vector of 1000 elements, or with j_query(ndjson, "@", as = "R") an R list with length 1000.

ndjson_file <- system.file(package = "rjsoncons", "extdata", "example.ndjson")
j_query(ndjson_file, "{state: state, name: name}")
## [1] "{\"state\":\"WA\",\"name\":\"Seattle\"}" 
## [2] "{\"state\":\"NY\",\"name\":\"New York\"}"
## [3] "{\"state\":\"WA\",\"name\":\"Bellevue\"}"
## [4] "{\"state\":\"WA\",\"name\":\"Olympia\"}"

NDJSON files can be large, and are easy to iterate through. The queries are therefore processed in ‘chunks’, and a finite number of records can be processed

j_query(ndjson_file, "{state: state, name: name}", n_records = 2)
## [1] "{\"state\":\"WA\",\"name\":\"Seattle\"}" 
## [2] "{\"state\":\"NY\",\"name\":\"New York\"}"

Note that n_records refers to the number of lines input, rather than the number of lines satisfying path=.

The JSON parser

The package includes a JSON parser, used with the argument as = "R" or directly with as_r()

as_r('{"a": 1.0, "b": [2, 3, 4]}') |>
    str()
#> List of 2
#>  $ a: num 1
#>  $ b: int [1:3] 2 3 4

The main rules of this transformation are outlined here. JSON arrays of a single type (boolean, integer, double, string) are transformed to R vectors of the same length and corresponding type.

as_r('[true, false, true]') # boolean -> logical
## [1]  TRUE FALSE  TRUE
as_r('[1, 2, 3]')           # integer -> integer
## [1] 1 2 3
as_r('[1.0, 2.0, 3.0]')     # double  -> numeric
## [1] 1 2 3
as_r('["a", "b", "c"]')     # string  -> character
## [1] "a" "b" "c"

JSON arrays mixing integer and double values are transformed to R numeric vectors.

as_r('[1, 2.0]') |> class() # numeric
## [1] "numeric"

If a JSON integer array contains a value larger than R’s 32-bit integer representation, the array is transformed to an R numeric vector. NOTE that this results in loss of precision for JSON integer values greater than 2^53.

as_r('[1, 2147483648]') |> class()  # 64-bit integers -> numeric
## [1] "numeric"

JSON objects are transformed to R named lists.

as_r('{}')
## named list()
as_r('{"a": 1.0, "b": [2, 3, 4]}') |> str()
## List of 2
##  $ a: num 1
##  $ b: int [1:3] 2 3 4

There are several additional details. A JSON scalar and a JSON vector of length 1 are represented in the same way in R.

identical(as_r("3.14"), as_r("[3.14]"))
## [1] TRUE

JSON arrays mixing types other than integer and double are transformed to R lists

as_r('[true, 1, "a"]') |> str()
## List of 3
##  $ : logi TRUE
##  $ : int 1
##  $ : chr "a"

JSON null values are represented as R NULL values; arrays of null are transformed to lists

as_r('null')                  # NULL
## NULL
as_r('[null]') |> str()       # list(NULL)
## List of 1
##  $ : NULL
as_r('[null, null]') |> str() # list(NULL, NULL)
## List of 2
##  $ : NULL
##  $ : NULL

Ordering of object members is controlled by the object_names= argument. The default preserves names as they appear in the JSON definition; use "sort" to sort names alphabetically. This argument is applied recursively.

json <- '{"b": 1, "a": {"d": 2, "c": 3}}'
as_r(json) |> str()
## List of 2
##  $ b: int 1
##  $ a:List of 2
##   ..$ d: int 2
##   ..$ c: int 3
as_r(json, object_names = "sort") |> str()
## List of 2
##  $ a:List of 2
##   ..$ c: int 3
##   ..$ d: int 2
##  $ b: int 1

The parser corresponds approximately to jsonlite::fromJSON() with arguments simplifyVector = TRUE, simplifyDataFrame = FALSE, simplifyMatrix = FALSE). Unit tests (using the tinytest framework) providing additional details are available at

system.file(package = "rjsoncons", "tinytest", "test_as_r.R")

Using jsonlite::fromJSON()

The built-in parser can be replaced by alternative parsers by returning the query as a JSON string, e.g., using the fromJSON() in the jsonlite package.

j_query(json, "locations[?state == 'WA']") |>
    ## `fromJSON()` simplifies list-of-objects to data.frame
    jsonlite::fromJSON()
## NULL

The rjsoncons package is particularly useful when accessing elements that might otherwise require complicated application of nested lapply(), purrr expressions, or tidyr unnest_*() (see R for Data Science chapter ‘Hierarchical data’).

C++ library use in other packages

The package includes the complete ‘jsoncons’ C++ header-only library, available to other R packages by adding

LinkingTo: rjsoncons
SystemRequirements: C++11

to the DESCRIPTION file. Typical use in an R package would also include LinkingTo: specifications for the cpp11 or Rcpp (this package uses cpp11) packages to provide a C / C++ interface between R and the C++ ‘jsoncons’ library.

Session information

This vignette was compiled using the following software versions

sessionInfo()
## R version 4.3.2 (2023-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rjsoncons_1.1.0.9401 BiocStyle_2.30.0    
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5         cli_3.6.2           knitr_1.45         
##  [4] rlang_1.1.3         xfun_0.41           stringi_1.8.3      
##  [7] purrr_1.0.2         textshaping_0.3.7   jsonlite_1.8.8     
## [10] glue_1.7.0          htmltools_0.5.7     ragg_1.2.7         
## [13] sass_0.4.8          rmarkdown_2.25      evaluate_0.23      
## [16] jquerylib_0.1.4     fastmap_1.1.1       yaml_2.3.8         
## [19] lifecycle_1.0.4     memoise_2.0.1       bookdown_0.37      
## [22] BiocManager_1.30.22 stringr_1.5.1       compiler_4.3.2     
## [25] fs_1.6.3            systemfonts_1.0.5   digest_0.6.34      
## [28] R6_2.5.1            magrittr_2.0.3      bslib_0.6.1        
## [31] tools_4.3.2         pkgdown_2.0.7       cachem_1.0.8       
## [34] desc_1.4.3