Facets available for querying cellxgene data

FACETS is a character vector of common fields used to subset cellxgene data.

facets() is used to query the cellxgene database for current values of one or all facets.

facets_filter() provides a convenient way to filter facets based on label or ontology term.

Usage

FACETS

facets(cellxgene_db = db(), facets = FACETS)

facets_filter(facet, key = c("label", "ontology_term_id"), value, exact = TRUE)

Format

FACETS is an object of class character of length 8.

Arguments

cellxgene_db: an (optional) cellxgene_db object, as returned by db().
facets: a character() vector corersponding to one of the facets in FACETS.
facet: the column containing faceted information, e.g., sex in datasets(db).
key: character(1) identifying whether value is a label or ontology_term_id.
value: character() value of the label or ontology term to filter on. The value may be a vector with length(value) > 0 for exact matchs (exact = TRUE, default), or a character(1) regular expression.
exact: logical(1) whether values match exactly (default, TRUE) or as a regular expression (FALSE).

Value

facets() returns a tibble with columns facet, label, ontology_term_id, and n, the number of times the facet label is used in the database.

facets_filter() returns a logical vector with length equal to the length (number of rows) of facet, with TRUE indicating that the value of key is present in the dataset.

Examples

f <- facets()

## levels of each facet
f |>
    dplyr::count(facet)
#> # A tibble: 8 × 2
#>   facet                       n
#>   <chr>                   <int>
#> 1 assay                      45
#> 2 cell_type                1018
#> 3 development_stage         277
#> 4 disease                   169
#> 5 organism                    7
#> 6 self_reported_ethnicity    40
#> 7 sex                         3
#> 8 tissue                    602

## same as facets(, facets = "organism")
f |>
    dplyr::filter(facet == "organism")
#> # A tibble: 7 × 4
#>   facet    label                 ontology_term_id     n
#>   <chr>    <chr>                 <chr>            <int>
#> 1 organism Homo sapiens          NCBITaxon:9606    1365
#> 2 organism Mus musculus          NCBITaxon:10090    386
#> 3 organism Danio rerio           NCBITaxon:7955      32
#> 4 organism Callithrix jacchus    NCBITaxon:9483      27
#> 5 organism Macaca mulatta        NCBITaxon:9544      19
#> 6 organism Pan troglodytes       NCBITaxon:9598       1
#> 7 organism Sus scrofa domesticus NCBITaxon:9825       1

db <- db()
ds <- datasets(db)

## datasets with African American females
ds |>
    dplyr::filter(
        facets_filter(self_reported_ethnicity, "label", "African American"),
        facets_filter(sex, "label", "female")
    )
#> # A tibble: 115 × 33
#>    dataset_id   dataset_version_id collection_id donor_id assay  batch_condition
#>    <chr>        <chr>              <chr>         <list>   <list> <list>         
#>  1 e763ed0d-0e… 5b916cc5-f01f-42c… b9fc3d70-5a7… <chr>    <list> <lgl [1]>      
#>  2 db0752b9-f2… 67a20819-b7a9-4a0… b9fc3d70-5a7… <chr>    <list> <lgl [1]>      
#>  3 d9b4bc69-ed… 7fde015a-c913-4ae… b9fc3d70-5a7… <chr>    <list> <lgl [1]>      
#>  4 bc2a7b3d-f0… c78a352d-38e5-4b0… b9fc3d70-5a7… <chr>    <list> <lgl [1]>      
#>  5 96a3f64b-0e… 9f073903-c5be-4f6… b9fc3d70-5a7… <chr>    <list> <lgl [1]>      
#>  6 59b69042-47… a06a1d9e-b1e8-452… b9fc3d70-5a7… <chr>    <list> <lgl [1]>      
#>  7 aa6f371d-da… fc5b564e-cdc7-421… a96133de-e95… <chr>    <list> <lgl [1]>      
#>  8 494faa16-c4… e95b40d7-eda7-4f7… a96133de-e95… <chr>    <list> <lgl [1]>      
#>  9 34f5307e-7b… 4257cfff-39f5-453… a96133de-e95… <chr>    <list> <lgl [1]>      
#> 10 32b9bdce-24… c304221a-ff6d-490… bcb61471-2a4… <chr>    <list> <lgl [1]>      
#> # ℹ 105 more rows
#> # ℹ 27 more variables: cell_count <int>, cell_type <list>, citation <chr>,
#> #   default_embedding <chr>, development_stage <list>, disease <list>,
#> #   embeddings <list>, explorer_url <chr>, feature_biotype <list>,
#> #   feature_count <int>, feature_reference <list>, is_primary_data <list>,
#> #   mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
#> #   raw_data_location <chr>, schema_version <chr>, …

## datasets with non-European, known ethnicity
facets(db, "self_reported_ethnicity")
#> # A tibble: 40 × 4
#>    facet                   label                          ontology_term_id     n
#>    <chr>                   <chr>                          <chr>            <int>
#>  1 self_reported_ethnicity unknown                        unknown            761
#>  2 self_reported_ethnicity European                       HANCESTRO:0005     707
#>  3 self_reported_ethnicity na                             na                 466
#>  4 self_reported_ethnicity Asian                          HANCESTRO:0008     227
#>  5 self_reported_ethnicity African American               HANCESTRO:0568     124
#>  6 self_reported_ethnicity Hispanic or Latin American     HANCESTRO:0014     124
#>  7 self_reported_ethnicity Native American || Hispanic o… HANCESTRO:0013 …    50
#>  8 self_reported_ethnicity African American or Afro-Cari… HANCESTRO:0016      43
#>  9 self_reported_ethnicity South Asian                    HANCESTRO:0006      31
#> 10 self_reported_ethnicity Greater Middle Eastern  (Midd… HANCESTRO:0015      29
#> # ℹ 30 more rows
ds |>
    dplyr::filter(
        !facets_filter(
            self_reported_ethnicity, "label", c("European", "na", "unknown")
        )
    )
#> # A tibble: 60 × 33
#>    dataset_id   dataset_version_id collection_id donor_id assay  batch_condition
#>    <chr>        <chr>              <chr>         <list>   <list> <list>         
#>  1 2adb1f8a-a6… 5a8e30a8-685c-4e6… 38833785-fac… <chr>    <list> <lgl [1]>      
#>  2 e2824739-ea… 8f0ddb72-fbb9-4cd… a96133de-e95… <chr>    <list> <lgl [1]>      
#>  3 aa6f371d-da… fc5b564e-cdc7-421… a96133de-e95… <chr>    <list> <lgl [1]>      
#>  4 59d14a35-0c… 373e6630-63bd-482… a96133de-e95… <chr>    <list> <lgl [1]>      
#>  5 494faa16-c4… e95b40d7-eda7-4f7… a96133de-e95… <chr>    <list> <lgl [1]>      
#>  6 44941fdb-8a… 5ce097d7-1110-412… a96133de-e95… <chr>    <list> <lgl [1]>      
#>  7 2dd73feb-05… 0d4231cc-5dc1-4a9… a96133de-e95… <chr>    <list> <lgl [1]>      
#>  8 ff995299-bf… 1d68b8ec-fc42-4e4… ba84c7ba-8d8… <chr>    <list> <lgl [1]>      
#>  9 99efb290-fb… 5be5fc91-90a1-473… ba84c7ba-8d8… <chr>    <list> <lgl [1]>      
#> 10 821a8e3e-87… 1c5acd5a-a0ef-4b8… ba84c7ba-8d8… <chr>    <list> <lgl [1]>      
#> # ℹ 50 more rows
#> # ℹ 27 more variables: cell_count <int>, cell_type <list>, citation <chr>,
#> #   default_embedding <chr>, development_stage <list>, disease <list>,
#> #   embeddings <list>, explorer_url <chr>, feature_biotype <list>,
#> #   feature_count <int>, feature_reference <list>, is_primary_data <list>,
#> #   mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
#> #   raw_data_location <chr>, schema_version <chr>, …