FACETS
is a character vector of common fields used
to subset cellxgene data.
facets()
is used to query the cellxgene database for
current values of one or all facets.
facets_filter()
provides a convenient way to filter
facets based on label or ontology term.
Arguments
- cellxgene_db
an (optional) cellxgene_db object, as returned by
db()
.- facets
a character() vector corersponding to one of the facets in
FACETS
.- facet
the column containing faceted information, e.g.,
sex
indatasets(db)
.- key
character(1) identifying whether
value
is alabel
orontology_term_id
.- value
character() value of the label or ontology term to filter on. The value may be a vector with
length(value) > 0
for exact matchs (exact = TRUE
, default), or acharacter(1)
regular expression.- exact
logical(1) whether values match exactly (default,
TRUE
) or as a regular expression (FALSE
).
Value
facets()
returns a tibble with columns facet
, label
,
ontology_term_id
, and n
, the number of times the facet
label is used in the database.
facets_filter()
returns a logical vector with length
equal to the length (number of rows) of facet
, with TRUE
indicating that the value
of key
is present in the dataset.
Examples
f <- facets()
## levels of each facet
f |>
dplyr::count(facet)
#> # A tibble: 8 × 2
#> facet n
#> <chr> <int>
#> 1 assay 40
#> 2 cell_type 910
#> 3 development_stage 249
#> 4 disease 132
#> 5 organism 7
#> 6 self_reported_ethnicity 34
#> 7 sex 3
#> 8 tissue 428
## same as facets(, facets = "organism")
f |>
dplyr::filter(facet == "organism")
#> # A tibble: 7 × 4
#> facet label ontology_term_id n
#> <chr> <chr> <chr> <int>
#> 1 organism Homo sapiens NCBITaxon:9606 1088
#> 2 organism Mus musculus NCBITaxon:10090 364
#> 3 organism Callithrix jacchus NCBITaxon:9483 28
#> 4 organism Macaca mulatta NCBITaxon:9544 19
#> 5 organism Sus scrofa domesticus NCBITaxon:9825 3
#> 6 organism Pan troglodytes NCBITaxon:9598 2
#> 7 organism Gorilla gorilla NCBITaxon:9593 1
db <- db()
ds <- datasets(db)
## datasets with African American females
ds |>
dplyr::filter(
facets_filter(self_reported_ethnicity, "label", "African American"),
facets_filter(sex, "label", "female")
)
#> # A tibble: 58 × 33
#> dataset_id dataset_version_id collection_id donor_id assay batch_condition
#> <chr> <chr> <chr> <list> <list> <list>
#> 1 01ad3cd7-39… 02a1eee1-e290-47d… 7d7cabfd-1d1… <chr> <list> <lgl [1]>
#> 2 de985818-28… f72aae6e-c997-484… c9706a92-0e5… <chr> <list> <lgl [1]>
#> 3 bab7432a-5c… 02a8ff13-a08b-461… 72d37bc9-76c… <chr> <list> <chr [2]>
#> 4 f64e1be1-de… c40911a4-47de-460… 62e8f058-9c3… <chr> <list> <lgl [1]>
#> 5 e9175006-89… db7b4a79-1d96-4aa… 62e8f058-9c3… <chr> <list> <lgl [1]>
#> 6 d4cfefa0-3a… 8f7fa4d2-0bbf-41e… 62e8f058-9c3… <chr> <list> <lgl [1]>
#> 7 d224c8e0-c2… b7d4db11-bca1-4bc… 62e8f058-9c3… <chr> <list> <lgl [1]>
#> 8 a6858c10-c5… 7e57a225-c979-4fa… 62e8f058-9c3… <chr> <list> <lgl [1]>
#> 9 576f193c-75… 1ba7d495-c1a8-480… 62e8f058-9c3… <chr> <list> <lgl [1]>
#> 10 486486d4-94… 090ba5ce-5c7f-473… 62e8f058-9c3… <chr> <list> <lgl [1]>
#> # ℹ 48 more rows
#> # ℹ 27 more variables: cell_count <int>, cell_type <list>, citation <chr>,
#> # default_embedding <chr>, development_stage <list>, disease <list>,
#> # embeddings <list>, explorer_url <chr>, feature_biotype <list>,
#> # feature_count <int>, feature_reference <list>, is_primary_data <list>,
#> # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
#> # raw_data_location <chr>, schema_version <chr>, …
## datasets with non-European, known ethnicity
facets(db, "self_reported_ethnicity")
#> # A tibble: 34 × 4
#> facet label ontology_term_id n
#> <chr> <chr> <chr> <int>
#> 1 self_reported_ethnicity European HANCESTRO:0005 588
#> 2 self_reported_ethnicity unknown unknown 563
#> 3 self_reported_ethnicity na na 408
#> 4 self_reported_ethnicity Asian HANCESTRO:0008 153
#> 5 self_reported_ethnicity African American HANCESTRO:0568 67
#> 6 self_reported_ethnicity Hispanic or Latin American HANCESTRO:0014 67
#> 7 self_reported_ethnicity Native American,Hispanic or L… HANCESTRO:0013,… 50
#> 8 self_reported_ethnicity African American or Afro-Cari… HANCESTRO:0016 32
#> 9 self_reported_ethnicity Greater Middle Eastern (Midd… HANCESTRO:0015 23
#> 10 self_reported_ethnicity African HANCESTRO:0010 20
#> # ℹ 24 more rows
ds |>
dplyr::filter(
!facets_filter(
self_reported_ethnicity, "label", c("European", "na", "unknown")
)
)
#> # A tibble: 31 × 33
#> dataset_id dataset_version_id collection_id donor_id assay batch_condition
#> <chr> <chr> <chr> <list> <list> <list>
#> 1 cfa3c355-ee… 4dc06a70-6d39-4da… 9c8808ce-113… <chr> <list> <lgl [1]>
#> 2 a9c5aecf-3b… 579db439-a9dc-4fc… 3116d060-0a8… <chr> <list> <lgl [1]>
#> 3 6d4b3d09-f1… 431185b3-45d0-4f9… 3116d060-0a8… <chr> <list> <lgl [1]>
#> 4 1368fad2-91… 724f2fae-92cb-4ed… 3116d060-0a8… <chr> <list> <lgl [1]>
#> 5 e6a11140-25… dbcbe0a6-918a-444… e5f58829-1a6… <chr> <list> <lgl [1]>
#> 6 6ec405bb-47… eaf5be60-06d9-45e… e5f58829-1a6… <chr> <list> <lgl [1]>
#> 7 2ba40233-85… 541ef4e5-8142-496… e5f58829-1a6… <chr> <list> <lgl [1]>
#> 8 2423ce2c-31… 5ee25df3-0ff9-437… e5f58829-1a6… <chr> <list> <lgl [1]>
#> 9 2adb1f8a-a6… 7a455e3b-dd79-499… 38833785-fac… <chr> <list> <lgl [1]>
#> 10 a9bedd04-51… 33bfb460-b474-4b1… 6686ada5-43a… <chr> <list> <lgl [1]>
#> # ℹ 21 more rows
#> # ℹ 27 more variables: cell_count <int>, cell_type <list>, citation <chr>,
#> # default_embedding <chr>, development_stage <list>, disease <list>,
#> # embeddings <list>, explorer_url <chr>, feature_biotype <list>,
#> # feature_count <int>, feature_reference <list>, is_primary_data <list>,
#> # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
#> # raw_data_location <chr>, schema_version <chr>, …