Flatten and find keys or values in JSON or NDJSON documents

j_flatten() transforms a JSON document into a list where names are JSONpointer 'paths' and elements are the corresponding 'values' from the JSON document.

j_find_values() finds paths to exactly matching values.

j_find_values_grep() finds paths to values matching a regular expression.

j_find_keys() finds paths to exactly matching keys.

j_find_keys_grep() finds paths to keys matching a regular expression.

For NDJSON documents, the result is either a character vector (for as = "string") or list of R objects, one element for each NDJSON record.

Usage

j_flatten(
  data,
  object_names = "asis",
  as = "string",
  ...,
  n_records = Inf,
  verbose = FALSE,
  data_type = j_data_type(data),
  path_type = "JSONpointer"
)

j_find_values(
  data,
  values,
  object_names = "asis",
  as = "R",
  ...,
  n_records = Inf,
  verbose = FALSE,
  data_type = j_data_type(data),
  path_type = "JSONpointer"
)

j_find_values_grep(
  data,
  pattern,
  object_names = "asis",
  as = "R",
  ...,
  grep_args = list(),
  n_records = Inf,
  verbose = FALSE,
  data_type = j_data_type(data),
  path_type = "JSONpointer"
)

j_find_keys(
  data,
  keys,
  object_names = "asis",
  as = "R",
  ...,
  n_records = Inf,
  verbose = FALSE,
  data_type = j_data_type(data),
  path_type = "JSONpointer"
)

j_find_keys_grep(
  data,
  pattern,
  object_names = "asis",
  as = "R",
  ...,
  grep_args = list(),
  n_records = Inf,
  verbose = FALSE,
  data_type = j_data_type(data),
  path_type = "JSONpointer"
)

Arguments

data: a character() JSON string or NDJSON records, or the name of a file or URL containing JSON or NDJSON, or an R object parsed to a JSON string using jsonlite::toJSON().
object_names: character(1) order data object elements "asis" (default) or "sort" before filtering on path.
as: character(1) describing the return type. For j_flatten(), either "string" or "R". For other functions on this page, one of "R", "data.frame", or "tibble".
...: passed to jsonlite::toJSON when data is an R object.
n_records: numeric(1) maximum number of NDJSON records parsed.
verbose: logical(1) report progress when parsing large NDJSON files.
data_type: character(1) type of data; one of "json", "ndjson", or a value returned by j_data_type().
path_type: character(1) type of 'path' to be returned; one of '"JSONpointer"', '"JSONpath"'; '"JMESpath"' is not supported.
values: vector of one or more values to be matched exactly to values in the JSON document.
pattern: character(1) regular expression to match values or paths.
grep_args: list() additional arguments passed to grepl() when searching on values or paths.
keys: character() vector of one or more keys to be matched exactly to path elements.

Value

j_flatten(as = "string") (default) returns a JSON string representation of the flattened document, i.e., an object with keys the JSONpointer paths and values the value at the corresponding path in the original document.

j_flatten(as = "R") returns a named list, where names() are the JSONpointer paths to each element in the JSON document and list elements are the corresponding values.

j_find_values() and j_find_values_grep() return a list with names as JSONpointer paths and list elements the matching values, or a data.frame or tibble with columns path and value. Values are coerced to a common type when as is data.frame or tibble.

j_find_keys() and j_find_keys_grep() returns a list, data.frame, or tibble similar to j_find_values() and j_find_values_grep().

For NDJSON documents, the result is a vector paralleling the NDJSON document, with j_flatten() applied to each element of the NDJSON document.

Details

Functions documented on this page expand data into all path / value pairs. This is not suitable for very large JSON documents.

For j_find_keys(), the key must exactly match one or more consecutive keys in the JSONpointer path returned by j_flatten().

For j_find_keys_grep(), the key can define a pattern that spans across JSONpointer or JSONpath elements.

Examples

json <- '{
    "discards": {
        "1000": "Record does not exist",
        "1004": "Queue limit exceeded",
        "1010": "Discarding timed-out partial msg"
    },
    "warnings": {
        "0": "Phone number missing country code",
        "1": "State code missing",
        "2": "Zip code missing"
    }
}'

## JSONpointer
j_flatten(json) |>
    cat("\n")
#> {"/discards/1000":"Record does not exist","/discards/1004":"Queue limit exceeded","/discards/1010":"Discarding timed-out partial msg","/warnings/0":"Phone number missing country code","/warnings/1":"State code missing","/warnings/2":"Zip code missing"} 

## JSONpath
j_flatten(json, as = "R", path_type = "JSONpath") |>
    str()
#> List of 6
#>  $ $['discards']['1000']: chr "Record does not exist"
#>  $ $['discards']['1004']: chr "Queue limit exceeded"
#>  $ $['discards']['1010']: chr "Discarding timed-out partial msg"
#>  $ $['warnings']['0']   : chr "Phone number missing country code"
#>  $ $['warnings']['1']   : chr "State code missing"
#>  $ $['warnings']['2']   : chr "Zip code missing"

j_find_values(json, "Zip code missing", as = "tibble")
#> # A tibble: 1 × 2
#>   path        value           
#>   <chr>       <chr>           
#> 1 /warnings/2 Zip code missing
j_find_values(
    json,
    c("Queue limit exceeded", "Zip code missing"),
    as = "tibble"
)
#> # A tibble: 2 × 2
#>   path           value               
#>   <chr>          <chr>               
#> 1 /discards/1004 Queue limit exceeded
#> 2 /warnings/2    Zip code missing    

j_find_values_grep(json, "missing", as = "tibble")
#> # A tibble: 3 × 2
#>   path        value                            
#>   <chr>       <chr>                            
#> 1 /warnings/0 Phone number missing country code
#> 2 /warnings/1 State code missing               
#> 3 /warnings/2 Zip code missing                 

## JSONpath
j_find_values_grep(json, "missing", as = "tibble", path_type = "JSONpath")
#> # A tibble: 3 × 2
#>   path               value                            
#>   <chr>              <chr>                            
#> 1 $['warnings']['0'] Phone number missing country code
#> 2 $['warnings']['1'] State code missing               
#> 3 $['warnings']['2'] Zip code missing                 

j_find_keys(json, "discards", as = "tibble")
#> # A tibble: 3 × 2
#>   path           value                           
#>   <chr>          <chr>                           
#> 1 /discards/1000 Record does not exist           
#> 2 /discards/1004 Queue limit exceeded            
#> 3 /discards/1010 Discarding timed-out partial msg
j_find_keys(json, "1", as = "tibble")
#> # A tibble: 1 × 2
#>   path        value             
#>   <chr>       <chr>             
#> 1 /warnings/1 State code missing
j_find_keys(json, c("discards", "warnings"), as = "tibble")
#> # A tibble: 6 × 2
#>   path           value                            
#>   <chr>          <chr>                            
#> 1 /discards/1000 Record does not exist            
#> 2 /discards/1004 Queue limit exceeded             
#> 3 /discards/1010 Discarding timed-out partial msg 
#> 4 /warnings/0    Phone number missing country code
#> 5 /warnings/1    State code missing               
#> 6 /warnings/2    Zip code missing                 

## JSONpath
j_find_keys(json, "discards", as = "tibble", path_type = "JSONpath")
#> # A tibble: 3 × 2
#>   path                  value                           
#>   <chr>                 <chr>                           
#> 1 $['discards']['1000'] Record does not exist           
#> 2 $['discards']['1004'] Queue limit exceeded            
#> 3 $['discards']['1010'] Discarding timed-out partial msg

j_find_keys_grep(json, "discard", as = "tibble")
#> # A tibble: 3 × 2
#>   path           value                           
#>   <chr>          <chr>                           
#> 1 /discards/1000 Record does not exist           
#> 2 /discards/1004 Queue limit exceeded            
#> 3 /discards/1010 Discarding timed-out partial msg
j_find_keys_grep(json, "1", as = "tibble")
#> # A tibble: 4 × 2
#>   path           value                           
#>   <chr>          <chr>                           
#> 1 /discards/1000 Record does not exist           
#> 2 /discards/1004 Queue limit exceeded            
#> 3 /discards/1010 Discarding timed-out partial msg
#> 4 /warnings/1    State code missing              
j_find_keys_grep(json, "car.*/101", as = "tibble")
#> # A tibble: 1 × 2
#>   path           value                           
#>   <chr>          <chr>                           
#> 1 /discards/1010 Discarding timed-out partial msg

## JSONpath
j_find_keys_grep(json, "car.*\\['101", as = "tibble", path_type = "JSONpath")
#> # A tibble: 1 × 2
#>   path                  value                           
#>   <chr>                 <chr>                           
#> 1 $['discards']['1010'] Discarding timed-out partial msg

## NDJSON

ndjson_file <-
    system.file(package = "rjsoncons", "extdata", "example.ndjson")
j_flatten(ndjson_file) |>
    noquote()
#> [1] {"/name":"Seattle","/state":"WA"}  {"/name":"New York","/state":"NY"}
#> [3] {"/name":"Bellevue","/state":"WA"} {"/name":"Olympia","/state":"WA"} 
j_find_values_grep(ndjson_file, "e") |>
    str()
#> List of 4
#>  $ :List of 1
#>   ..$ /name: chr "Seattle"
#>  $ :List of 1
#>   ..$ /name: chr "New York"
#>  $ :List of 1
#>   ..$ /name: chr "Bellevue"
#>  $ : Named list()