This article summarizes examples drawn from StackOverflow and elsewhere. Start by loading the rjsoncons and dplyr packages
Query and pivot
Data frame columns as NDJSON
https://stackoverflow.com/questions/76447100
This question presented a tibble
where one column
contained JSON expressions.
df
#> # A tibble: 2 × 2
#> subject response
#> * <chr> <chr>
#> 1 dtv85251vucquc45 "{\"P0_Q0\":{\"aktiv\":2,\"bekümmert\":3,\"interessiert\":4,…
#> 2 mcj8vdqz7sxmjcr0 "{\"P0_Q0\":{\"aktiv\":1,\"bekümmert\":3,\"interessiert\":1,…
The goal was to extract the fields of each P0_Q0
element
as new columns. The response
column can be viewed as
NDJSON, so we can use pivot(df$response, "P0_Q0")
for the
hard work, and bind_cols()
to prepend subject
bind_cols(
df |> select(subject),
df |> pull(response) |> j_pivot("P0_Q0", as = "tibble")
)
#> # A tibble: 2 × 11
#> subject aktiv bekümmert interessiert `freudig erregt` verärgert stark schuldig
#> <chr> <int> <int> <int> <int> <int> <int> <int>
#> 1 dtv852… 2 3 4 2 2 0 1
#> 2 mcj8vd… 1 3 1 1 0 0 2
#> # ℹ 3 more variables: erschrocken <int>, feindselig <int>, angeregt <int>
My initial
response was early in package development, and motivated
j_pivot()
as an easier way to perform the common operation
of transforming a JSON array-of-objects to and R data
frame.
Constructing a pivot object using JMESPath
https://stackoverflow.com/questions/78029215
This question had an array of objects, each of which had a single unique key-value pair.
json <- '{
"encrypted_values":[
{"name_a":"value_a"},
{"name_b":"value_b"},
{"name_c":"value_c"}
]
}'
The goal was to create a tibble with keys in one column, and values
in another. jsonlite::fromJSON()
or
j_pivot(json, "encrypted_values")
simplify the result to a
tibble with a column for each object key, which is not desired.
jsonlite::fromJSON(json)$encrypted_values
#> name_a name_b name_c
#> 1 value_a <NA> <NA>
#> 2 <NA> value_b <NA>
#> 3 <NA> <NA> value_c
j_pivot(json, "encrypted_values", as = "tibble")
#> # A tibble: 3 × 3
#> name_a name_b name_c
#> <list> <list> <list>
#> 1 <chr [1]> <NULL> <NULL>
#> 2 <NULL> <chr [1]> <NULL>
#> 3 <NULL> <NULL> <chr [1]>
Instead, write a JMESPath query that extracts an object with the keys
as one element, and values as another. This uses @
to
represent the current mode, and keys()
and
values()
functions to extract associated elements. The
trailing []
converts an array-of-arrays of keys (for
example) to a simple array of keys.
query <- '{
name : encrypted_values[].keys(@)[],
value: encrypted_values[].values(@)[]
}'
j_pivot(json, query, as = "tibble")
#> # A tibble: 3 × 2
#> name value
#> <chr> <chr>
#> 1 name_a value_a
#> 2 name_b value_b
#> 3 name_c value_c
Reading from URLs
https://stackoverflow.com/questions/78023560
This question illustrates rjsoncons ability to
read URLs; the query itself extracts from the fixtures
array of objects specific nested elements, and is similar to the
previous question. In practice, I used
json <- readLines(url)
to create a local copy of the
data to use while developing the query.
url <- "https://www.nrl.com/draw//data?competition=111&season=2024"
query <- 'fixtures[].{
homeTeam: homeTeam.nickName,
awayTeam: awayTeam.nickName
}'
j_pivot(url, query, as = "tibble")
#> # A tibble: 8 × 2
#> homeTeam awayTeam
#> <chr> <chr>
#> 1 Dolphins Sea Eagles
#> 2 Panthers Bulldogs
#> 3 Eels Broncos
#> 4 Wests Tigers Knights
#> 5 Dragons Rabbitohs
#> 6 Storm Sharks
#> 7 Roosters Warriors
#> 8 Titans Cowboys
The easiest path to a more general answer (extract all members of ‘homeTeam’ and ‘awayTeam’ as a tibble) might, like the posted answer, combine JSON extraction and tidyr.
query <- 'fixtures[].{ homeTeam: homeTeam, awayTeam: awayTeam }'
j_pivot(url, query, as = "tibble") |>
tidyr::unnest_wider(c("homeTeam", "awayTeam"), names_sep = "_")
#> # A tibble: 8 × 8
#> homeTeam_teamId homeTeam_nickName homeTeam_teamPosition homeTeam_theme
#> <int> <chr> <chr> <list>
#> 1 500723 Dolphins 4th <named list [2]>
#> 2 500014 Panthers 3rd <named list [2]>
#> 3 500031 Eels 12th <named list [2]>
#> 4 500023 Wests Tigers 15th <named list [2]>
#> 5 500022 Dragons 13th <named list [2]>
#> 6 500021 Storm 2nd <named list [2]>
#> 7 500001 Roosters 6th <named list [2]>
#> 8 500004 Titans 16th <named list [2]>
#> # ℹ 4 more variables: awayTeam_teamId <int>, awayTeam_nickName <chr>,
#> # awayTeam_teamPosition <chr>, awayTeam_theme <list>
Deeply nested objects
https://stackoverflow.com/questions/77998013
The details of this question are on StackOverflow, and the following code chunks are not evaluated directly. The example has several interesting elements:
-
The JSON is quite large (about 90 Mb), so processing is not immediate. While developing the query, I focused on a subset of the data for a more interactive experience.
Crime2013 <- j_query(json, "x.calls[9].args")
JSON array indexing is 0-based, in contrast to 1-based R indexing.
-
In developing the JSON query, I spent quite a bit of time viewing results using
listviewer::jsonedit()
, e.g.,listviewer::jsonedit(j_query(Crime2013, "[0][*]"))
The objects of interest are polygon coordinates nested deeply in the JSON, at the location implied by JMESPath. One polygon is at
query <- "x.calls[9].args[0][0][0][0]"
j_pivot(json, query, as = "tibble")
## # A tibble: 27 × 2
## lng lat
## <dbl> <dbl>
## 1 -43.3 -22.9
## 2 -43.3 -22.9
## 3 -43.3 -22.9
## 4 -43.3 -22.9
## 5 -43.3 -22.9
## 6 -43.3 -22.9
## 7 -43.3 -22.9
## 8 -43.3 -22.9
## 9 -43.3 -22.9
## 10 -43.3 -22.9
## # ℹ 17 more rows
## # ℹ Use `print(n = ...)` to see more rows
There are 3618 of these polygons, and they are extracted by using a
wild-card *
in place of an index 0
at a
particular place in the path.
query <- "x.calls[9].args[0][*][0][0]"
j_pivot(json, query, as = "tibble")
## # A tibble: 3,618 × 2
## lng lat
## <list> <list>
## 1 <dbl [27]> <dbl [27]>
## 2 <dbl [6]> <dbl [6]>
## 3 <dbl [5]> <dbl [5]>
## 4 <dbl [42]> <dbl [42]>
## 5 <dbl [6]> <dbl [6]>
## 6 <dbl [8]> <dbl [8]>
## 7 <dbl [4]> <dbl [4]>
## 8 <dbl [6]> <dbl [6]>
## 9 <dbl [13]> <dbl [13]>
## 10 <dbl [12]> <dbl [12]>
## # ℹ 3,608 more rows
## # ℹ Use `print(n = ...)` to see more rows
tidyr::unnest()
could be used to create a ‘long’ version
of this result, as illustrated in my response.
While exploring the data, the JMESPath function length()
suggested two anomalies in the data (not all paths in
Crime2013
have just one polygon; a path used to identify
polygons has only 2022 elements, but there are 3618 polygons); this
could well be a misunderstanding on my part.
Patch
Moving elements
https://stackoverflow.com/questions/78047988
The example is not completely reproducible, but the challenge is that the igraph package produces JSON like
data <- '
{
"nodes": [
{
"name": "something"
},
{
"name": "something_else"
}
],
"links": [
{
"source": "something",
"target": "something_else"
}
],
"attributes": {
"directed": false
}
}'
but the desired data moves the ‘directed’ attribute to a top-level
field. From the JSON patch
documentation, the patch is a single ‘move’ operation from
/attributes/directed
to the top-level /
:
patch <- '[
{"op": "move", "from": "/attributes/directed", "path": "/directed"}
]'
The patch is accomplished with
patched_data <- j_patch_apply(data, patch)
This JSON string could be visualized with
listviwer::jsonedit(patched_data)
, or
patched_data |> as_r() |> str()
, or
patched_data |>
jsonlite::prettify()
#> {
#> "nodes": [
#> {
#> "name": "something"
#> },
#> {
#> "name": "something_else"
#> }
#> ],
#> "links": [
#> {
#> "source": "something",
#> "target": "something_else"
#> }
#> ],
#> "attributes": {
#>
#> },
#> "directed": false
#> }
#>
Session information
sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.1.4 rjsoncons_1.3.0.9000
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 cli_3.6.2 knitr_1.46 rlang_1.1.3
#> [5] xfun_0.43 purrr_1.0.2 generics_0.1.3 textshaping_0.3.7
#> [9] jsonlite_1.8.8 glue_1.7.0 htmltools_0.5.8.1 ragg_1.3.0
#> [13] sass_0.4.9 fansi_1.0.6 rmarkdown_2.26 tibble_3.2.1
#> [17] evaluate_0.23 jquerylib_0.1.4 fastmap_1.1.1 yaml_2.3.8
#> [21] lifecycle_1.0.4 memoise_2.0.1 compiler_4.4.0 fs_1.6.4
#> [25] pkgconfig_2.0.3 tidyr_1.3.1 systemfonts_1.0.6 digest_0.6.35
#> [29] R6_2.5.1 tidyselect_1.2.1 utf8_1.2.4 pillar_1.9.0
#> [33] magrittr_2.0.3 bslib_0.7.0 withr_3.0.0 tools_4.4.0
#> [37] pkgdown_2.0.9 cachem_1.0.8 desc_1.4.3