FACETS
is a character vector of common fields used
to subset cellxgene data.
facets()
is used to query the cellxgene database for
current values of one or all facets.
facets_filter()
provides a convenient way to filter
facets based on label or ontology term.
Arguments
- cellxgene_db
an (optional) cellxgene_db object, as returned by
db()
.- facets
a character() vector corersponding to one of the facets in
FACETS
.- facet
the column containing faceted information, e.g.,
sex
indatasets(db)
.- key
character(1) identifying whether
value
is alabel
orontology_term_id
.- value
character() value of the label or ontology term to filter on. The value may be a vector with
length(value) > 0
for exact matchs (exact = TRUE
, default), or acharacter(1)
regular expression.- exact
logical(1) whether values match exactly (default,
TRUE
) or as a regular expression (FALSE
).
Value
facets()
returns a tibble with columns facet
, label
,
ontology_term_id
, and n
, the number of times the facet
label is used in the database.
facets_filter()
returns a logical vector with length
equal to the length (number of rows) of facet
, with TRUE
indicating that the value
of key
is present in the dataset.
Examples
f <- facets()
## levels of each facet
f |>
dplyr::count(facet)
#> # A tibble: 8 × 2
#> facet n
#> <chr> <int>
#> 1 assay 45
#> 2 cell_type 1018
#> 3 development_stage 277
#> 4 disease 169
#> 5 organism 7
#> 6 self_reported_ethnicity 40
#> 7 sex 3
#> 8 tissue 602
## same as facets(, facets = "organism")
f |>
dplyr::filter(facet == "organism")
#> # A tibble: 7 × 4
#> facet label ontology_term_id n
#> <chr> <chr> <chr> <int>
#> 1 organism Homo sapiens NCBITaxon:9606 1365
#> 2 organism Mus musculus NCBITaxon:10090 386
#> 3 organism Danio rerio NCBITaxon:7955 32
#> 4 organism Callithrix jacchus NCBITaxon:9483 27
#> 5 organism Macaca mulatta NCBITaxon:9544 19
#> 6 organism Pan troglodytes NCBITaxon:9598 1
#> 7 organism Sus scrofa domesticus NCBITaxon:9825 1
db <- db()
ds <- datasets(db)
## datasets with African American females
ds |>
dplyr::filter(
facets_filter(self_reported_ethnicity, "label", "African American"),
facets_filter(sex, "label", "female")
)
#> # A tibble: 115 × 33
#> dataset_id dataset_version_id collection_id donor_id assay batch_condition
#> <chr> <chr> <chr> <list> <list> <list>
#> 1 e763ed0d-0e… 5b916cc5-f01f-42c… b9fc3d70-5a7… <chr> <list> <lgl [1]>
#> 2 db0752b9-f2… 67a20819-b7a9-4a0… b9fc3d70-5a7… <chr> <list> <lgl [1]>
#> 3 d9b4bc69-ed… 7fde015a-c913-4ae… b9fc3d70-5a7… <chr> <list> <lgl [1]>
#> 4 bc2a7b3d-f0… c78a352d-38e5-4b0… b9fc3d70-5a7… <chr> <list> <lgl [1]>
#> 5 96a3f64b-0e… 9f073903-c5be-4f6… b9fc3d70-5a7… <chr> <list> <lgl [1]>
#> 6 59b69042-47… a06a1d9e-b1e8-452… b9fc3d70-5a7… <chr> <list> <lgl [1]>
#> 7 aa6f371d-da… fc5b564e-cdc7-421… a96133de-e95… <chr> <list> <lgl [1]>
#> 8 494faa16-c4… e95b40d7-eda7-4f7… a96133de-e95… <chr> <list> <lgl [1]>
#> 9 34f5307e-7b… 4257cfff-39f5-453… a96133de-e95… <chr> <list> <lgl [1]>
#> 10 32b9bdce-24… c304221a-ff6d-490… bcb61471-2a4… <chr> <list> <lgl [1]>
#> # ℹ 105 more rows
#> # ℹ 27 more variables: cell_count <int>, cell_type <list>, citation <chr>,
#> # default_embedding <chr>, development_stage <list>, disease <list>,
#> # embeddings <list>, explorer_url <chr>, feature_biotype <list>,
#> # feature_count <int>, feature_reference <list>, is_primary_data <list>,
#> # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
#> # raw_data_location <chr>, schema_version <chr>, …
## datasets with non-European, known ethnicity
facets(db, "self_reported_ethnicity")
#> # A tibble: 40 × 4
#> facet label ontology_term_id n
#> <chr> <chr> <chr> <int>
#> 1 self_reported_ethnicity unknown unknown 761
#> 2 self_reported_ethnicity European HANCESTRO:0005 707
#> 3 self_reported_ethnicity na na 466
#> 4 self_reported_ethnicity Asian HANCESTRO:0008 227
#> 5 self_reported_ethnicity African American HANCESTRO:0568 124
#> 6 self_reported_ethnicity Hispanic or Latin American HANCESTRO:0014 124
#> 7 self_reported_ethnicity Native American || Hispanic o… HANCESTRO:0013 … 50
#> 8 self_reported_ethnicity African American or Afro-Cari… HANCESTRO:0016 43
#> 9 self_reported_ethnicity South Asian HANCESTRO:0006 31
#> 10 self_reported_ethnicity Greater Middle Eastern (Midd… HANCESTRO:0015 29
#> # ℹ 30 more rows
ds |>
dplyr::filter(
!facets_filter(
self_reported_ethnicity, "label", c("European", "na", "unknown")
)
)
#> # A tibble: 60 × 33
#> dataset_id dataset_version_id collection_id donor_id assay batch_condition
#> <chr> <chr> <chr> <list> <list> <list>
#> 1 2adb1f8a-a6… 5a8e30a8-685c-4e6… 38833785-fac… <chr> <list> <lgl [1]>
#> 2 e2824739-ea… 8f0ddb72-fbb9-4cd… a96133de-e95… <chr> <list> <lgl [1]>
#> 3 aa6f371d-da… fc5b564e-cdc7-421… a96133de-e95… <chr> <list> <lgl [1]>
#> 4 59d14a35-0c… 373e6630-63bd-482… a96133de-e95… <chr> <list> <lgl [1]>
#> 5 494faa16-c4… e95b40d7-eda7-4f7… a96133de-e95… <chr> <list> <lgl [1]>
#> 6 44941fdb-8a… 5ce097d7-1110-412… a96133de-e95… <chr> <list> <lgl [1]>
#> 7 2dd73feb-05… 0d4231cc-5dc1-4a9… a96133de-e95… <chr> <list> <lgl [1]>
#> 8 ff995299-bf… 1d68b8ec-fc42-4e4… ba84c7ba-8d8… <chr> <list> <lgl [1]>
#> 9 99efb290-fb… 5be5fc91-90a1-473… ba84c7ba-8d8… <chr> <list> <lgl [1]>
#> 10 821a8e3e-87… 1c5acd5a-a0ef-4b8… ba84c7ba-8d8… <chr> <list> <lgl [1]>
#> # ℹ 50 more rows
#> # ℹ 27 more variables: cell_count <int>, cell_type <list>, citation <chr>,
#> # default_embedding <chr>, development_stage <list>, disease <list>,
#> # embeddings <list>, explorer_url <chr>, feature_biotype <list>,
#> # feature_count <int>, feature_reference <list>, is_primary_data <list>,
#> # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
#> # raw_data_location <chr>, schema_version <chr>, …