Skip to contents

This article summarizes examples drawn from StackOverflow and elsewhere. Start by loading the rjsoncons and dplyr packages

Query and pivot

Data frame columns as NDJSON

https://stackoverflow.com/questions/76447100

This question presented a tibble where one column contained JSON expressions.

df
#> # A tibble: 2 × 2
#>   subject          response                                                     
#> * <chr>            <chr>                                                        
#> 1 dtv85251vucquc45 "{\"P0_Q0\":{\"aktiv\":2,\"bekümmert\":3,\"interessiert\":4,…
#> 2 mcj8vdqz7sxmjcr0 "{\"P0_Q0\":{\"aktiv\":1,\"bekümmert\":3,\"interessiert\":1,…

The goal was to extract the fields of each P0_Q0 element as new columns. The response column can be viewed as NDJSON, so we can use pivot(df$response, "P0_Q0") for the hard work, and bind_cols() to prepend subject

bind_cols(
    df |> select(subject),
    df |> pull(response) |> j_pivot("P0_Q0", as = "tibble")
)
#> # A tibble: 2 × 11
#>   subject aktiv bekümmert interessiert `freudig erregt` verärgert stark schuldig
#>   <chr>   <int>     <int>        <int>            <int>     <int> <int>    <int>
#> 1 dtv852…     2         3            4                2         2     0        1
#> 2 mcj8vd…     1         3            1                1         0     0        2
#> # ℹ 3 more variables: erschrocken <int>, feindselig <int>, angeregt <int>

My initial response was early in package development, and motivated j_pivot() as an easier way to perform the common operation of transforming a JSON array-of-objects to and R data frame.

Constructing a pivot object using JMESPath

https://stackoverflow.com/questions/78029215

This question had an array of objects, each of which had a single unique key-value pair.

json <- '{
    "encrypted_values":[
        {"name_a":"value_a"}, 
        {"name_b":"value_b"}, 
        {"name_c":"value_c"}
    ]
}'

The goal was to create a tibble with keys in one column, and values in another. jsonlite::fromJSON() or j_pivot(json, "encrypted_values") simplify the result to a tibble with a column for each object key, which is not desired.

jsonlite::fromJSON(json)$encrypted_values
#>    name_a  name_b  name_c
#> 1 value_a    <NA>    <NA>
#> 2    <NA> value_b    <NA>
#> 3    <NA>    <NA> value_c
j_pivot(json, "encrypted_values", as = "tibble")
#> # A tibble: 3 × 3
#>   name_a    name_b    name_c   
#>   <list>    <list>    <list>   
#> 1 <chr [1]> <NULL>    <NULL>   
#> 2 <NULL>    <chr [1]> <NULL>   
#> 3 <NULL>    <NULL>    <chr [1]>

Instead, write a JMESPath query that extracts an object with the keys as one element, and values as another. This uses @ to represent the current mode, and keys() and values() functions to extract associated elements. The trailing [] converts an array-of-arrays of keys (for example) to a simple array of keys.

query <- '{
    name : encrypted_values[].keys(@)[],
    value: encrypted_values[].values(@)[]
}'
j_pivot(json, query, as = "tibble")
#> # A tibble: 3 × 2
#>   name   value  
#>   <chr>  <chr>  
#> 1 name_a value_a
#> 2 name_b value_b
#> 3 name_c value_c

Reading from URLs

https://stackoverflow.com/questions/78023560

This question illustrates rjsoncons ability to read URLs; the query itself extracts from the fixtures array of objects specific nested elements, and is similar to the previous question. In practice, I used json <- readLines(url) to create a local copy of the data to use while developing the query.

url <- "https://www.nrl.com/draw//data?competition=111&season=2024"
query <- 'fixtures[].{
    homeTeam: homeTeam.nickName, 
    awayTeam: awayTeam.nickName
}'
j_pivot(url, query, as = "tibble")
#> # A tibble: 8 × 2
#>   homeTeam     awayTeam  
#>   <chr>        <chr>     
#> 1 Dolphins     Sea Eagles
#> 2 Panthers     Bulldogs  
#> 3 Eels         Broncos   
#> 4 Wests Tigers Knights   
#> 5 Dragons      Rabbitohs 
#> 6 Storm        Sharks    
#> 7 Roosters     Warriors  
#> 8 Titans       Cowboys

The easiest path to a more general answer (extract all members of ‘homeTeam’ and ‘awayTeam’ as a tibble) might, like the posted answer, combine JSON extraction and tidyr.

query <- 'fixtures[].{ homeTeam: homeTeam, awayTeam: awayTeam }'
j_pivot(url, query, as = "tibble") |>
    tidyr::unnest_wider(c("homeTeam", "awayTeam"), names_sep = "_")
#> # A tibble: 8 × 8
#>   homeTeam_teamId homeTeam_nickName homeTeam_teamPosition homeTeam_theme  
#>             <int> <chr>             <chr>                 <list>          
#> 1          500723 Dolphins          4th                   <named list [2]>
#> 2          500014 Panthers          3rd                   <named list [2]>
#> 3          500031 Eels              12th                  <named list [2]>
#> 4          500023 Wests Tigers      15th                  <named list [2]>
#> 5          500022 Dragons           13th                  <named list [2]>
#> 6          500021 Storm             2nd                   <named list [2]>
#> 7          500001 Roosters          6th                   <named list [2]>
#> 8          500004 Titans            16th                  <named list [2]>
#> # ℹ 4 more variables: awayTeam_teamId <int>, awayTeam_nickName <chr>,
#> #   awayTeam_teamPosition <chr>, awayTeam_theme <list>

Deeply nested objects

https://stackoverflow.com/questions/77998013

The details of this question are on StackOverflow, and the following code chunks are not evaluated directly. The example has several interesting elements:

  • The JSON is quite large (about 90 Mb), so processing is not immediate. While developing the query, I focused on a subset of the data for a more interactive experience.

    Crime2013 <- j_query(json, "x.calls[9].args")
  • JSON array indexing is 0-based, in contrast to 1-based R indexing.

  • In developing the JSON query, I spent quite a bit of time viewing results using listviewer::jsonedit(), e.g.,

    listviewer::jsonedit(j_query(Crime2013, "[0][*]"))

The objects of interest are polygon coordinates nested deeply in the JSON, at the location implied by JMESPath. One polygon is at

query <- "x.calls[9].args[0][0][0][0]"
j_pivot(json, query, as = "tibble")
## # A tibble: 27 × 2
##      lng   lat
##    <dbl> <dbl>
##  1 -43.3 -22.9
##  2 -43.3 -22.9
##  3 -43.3 -22.9
##  4 -43.3 -22.9
##  5 -43.3 -22.9
##  6 -43.3 -22.9
##  7 -43.3 -22.9
##  8 -43.3 -22.9
##  9 -43.3 -22.9
## 10 -43.3 -22.9
## # ℹ 17 more rows
## # ℹ Use `print(n = ...)` to see more rows

There are 3618 of these polygons, and they are extracted by using a wild-card * in place of an index 0 at a particular place in the path.

query <- "x.calls[9].args[0][*][0][0]"
j_pivot(json, query, as = "tibble")
## # A tibble: 3,618 × 2
##    lng        lat       
##    <list>     <list>    
##  1 <dbl [27]> <dbl [27]>
##  2 <dbl [6]>  <dbl [6]> 
##  3 <dbl [5]>  <dbl [5]> 
##  4 <dbl [42]> <dbl [42]>
##  5 <dbl [6]>  <dbl [6]> 
##  6 <dbl [8]>  <dbl [8]> 
##  7 <dbl [4]>  <dbl [4]> 
##  8 <dbl [6]>  <dbl [6]> 
##  9 <dbl [13]> <dbl [13]>
## 10 <dbl [12]> <dbl [12]>
## # ℹ 3,608 more rows
## # ℹ Use `print(n = ...)` to see more rows

tidyr::unnest() could be used to create a ‘long’ version of this result, as illustrated in my response.

While exploring the data, the JMESPath function length() suggested two anomalies in the data (not all paths in Crime2013 have just one polygon; a path used to identify polygons has only 2022 elements, but there are 3618 polygons); this could well be a misunderstanding on my part.

Patch

Moving elements

https://stackoverflow.com/questions/78047988

The example is not completely reproducible, but the challenge is that the igraph package produces JSON like

data <- '
{
  "nodes": [
     {
        "name": "something"
     },
     {
        "name": "something_else"
     }
  ],
  "links": [
      {
         "source": "something",
         "target": "something_else"
      }
   ],
  "attributes": {
      "directed": false
  }
}'

but the desired data moves the ‘directed’ attribute to a top-level field. From the JSON patch documentation, the patch is a single ‘move’ operation from /attributes/directed to the top-level /:

patch <- '[
    {"op": "move", "from": "/attributes/directed", "path": "/directed"}
]'

The patch is accomplished with

patched_data <- j_patch_apply(data, patch)

This JSON string could be visualized with listviwer::jsonedit(patched_data), or patched_data |> as_r() |> str(), or

patched_data |>
    jsonlite::prettify()
#> {
#>     "nodes": [
#>         {
#>             "name": "something"
#>         },
#>         {
#>             "name": "something_else"
#>         }
#>     ],
#>     "links": [
#>         {
#>             "source": "something",
#>             "target": "something_else"
#>         }
#>     ],
#>     "attributes": {
#> 
#>     },
#>     "directed": false
#> }
#> 

Session information

sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.1.4          rjsoncons_1.3.0.9000
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.5       cli_3.6.2         knitr_1.46        rlang_1.1.3      
#>  [5] xfun_0.43         purrr_1.0.2       generics_0.1.3    textshaping_0.3.7
#>  [9] jsonlite_1.8.8    glue_1.7.0        htmltools_0.5.8.1 ragg_1.3.0       
#> [13] sass_0.4.9        fansi_1.0.6       rmarkdown_2.26    tibble_3.2.1     
#> [17] evaluate_0.23     jquerylib_0.1.4   fastmap_1.1.1     yaml_2.3.8       
#> [21] lifecycle_1.0.4   memoise_2.0.1     compiler_4.4.0    fs_1.6.4         
#> [25] pkgconfig_2.0.3   tidyr_1.3.1       systemfonts_1.0.6 digest_0.6.35    
#> [29] R6_2.5.1          tidyselect_1.2.1  utf8_1.2.4        pillar_1.9.0     
#> [33] magrittr_2.0.3    bslib_0.7.0       withr_3.0.0       tools_4.4.0      
#> [37] pkgdown_2.0.9     cachem_1.0.8      desc_1.4.3