FACETS is a character vector of common fields used
to subset cellxgene data.
facets() is used to query the cellxgene database for
current values of one or all facets.
facets_filter() provides a convenient way to filter
facets based on label or ontology term.
Arguments
- cellxgene_db
an (optional) cellxgene_db object, as returned by
db().- facets
a character() vector corersponding to one of the facets in
FACETS.- facet
the column containing faceted information, e.g.,
sexindatasets(db).- key
character(1) identifying whether
valueis alabelorontology_term_id.- value
character() value of the label or ontology term to filter on. The value may be a vector with
length(value) > 0for exact matchs (exact = TRUE, default), or acharacter(1)regular expression.- exact
logical(1) whether values match exactly (default,
TRUE) or as a regular expression (FALSE).
Value
facets() returns a tibble with columns facet, label,
ontology_term_id, and n, the number of times the facet
label is used in the database.
facets_filter() returns a logical vector with length
equal to the length (number of rows) of facet, with TRUE
indicating that the value of key is present in the dataset.
Examples
f <- facets()
## levels of each facet
f |>
dplyr::count(facet)
#> # A tibble: 8 × 2
#> facet n
#> <chr> <int>
#> 1 assay 50
#> 2 cell_type 1122
#> 3 development_stage 292
#> 4 disease 311
#> 5 organism 9
#> 6 self_reported_ethnicity 44
#> 7 sex 4
#> 8 tissue 707
## same as facets(, facets = "organism")
f |>
dplyr::filter(facet == "organism")
#> # A tibble: 9 × 4
#> facet label ontology_term_id n
#> <chr> <chr> <chr> <int>
#> 1 organism Homo sapiens NCBITaxon:9606 1598
#> 2 organism Mus musculus NCBITaxon:10090 421
#> 3 organism Danio rerio NCBITaxon:7955 32
#> 4 organism Callithrix jacchus NCBITaxon:9483 27
#> 5 organism Microcebus murinus NCBITaxon:30608 27
#> 6 organism Macaca mulatta NCBITaxon:9544 19
#> 7 organism Pan troglodytes NCBITaxon:9598 1
#> 8 organism Sus scrofa NCBITaxon:9823 1
#> 9 organism Sus scrofa domesticus NCBITaxon:9825 1
db <- db()
ds <- datasets(db)
## datasets with African American females
ds |>
dplyr::filter(
facets_filter(self_reported_ethnicity, "label", "African American"),
facets_filter(sex, "label", "female")
)
#> # A tibble: 198 × 36
#> dataset_id dataset_version_id collection_id donor_id assay batch_condition
#> <chr> <chr> <chr> <list> <list> <list>
#> 1 d68a8b48-ab… 83bbeaaf-f5e0-42a… 3a5dbf8a-9b3… <chr> <list> <lgl [1]>
#> 2 0b4a15a7-4e… dad69d7e-fb52-46c… a98b828a-622… <chr> <list> <lgl [1]>
#> 3 fe2eecbc-97… 183c4a39-3a63-4d9… 7c4552fd-8a6… <chr> <list> <chr [1]>
#> 4 82346769-87… 9a6c52d0-4d3f-48c… 7c4552fd-8a6… <chr> <list> <chr [1]>
#> 5 38491c97-ce… 7778e9dd-6c23-4f0… 7c4552fd-8a6… <chr> <list> <chr [1]>
#> 6 15e0ec3e-a0… eb7f7869-eb13-4b2… 7c4552fd-8a6… <chr> <list> <chr [1]>
#> 7 0a2d7e87-c3… f298459d-ac2a-405… 7c4552fd-8a6… <chr> <list> <chr [1]>
#> 8 9813a1d4-d1… 9b87c411-2e17-4de… b953c942-f5d… <chr> <list> <lgl [1]>
#> 9 85c60876-7f… 5015051f-abc4-403… b953c942-f5d… <chr> <list> <lgl [1]>
#> 10 e47c65a8-7d… e9041e47-b470-4c4… 4195ab4c-20b… <chr> <list> <lgl [1]>
#> # ℹ 188 more rows
#> # ℹ 30 more variables: cell_count <int>, cell_type <list>, citation <chr>,
#> # default_embedding <chr>, development_stage <list>, disease <list>,
#> # embeddings <list>, explorer_url <chr>, feature_biotype <list>,
#> # feature_count <int>, feature_reference <list>,
#> # genetic_perturbation_strategy <lgl>, is_pre_analysis <lgl>,
#> # is_primary_data <list>, mean_genes_per_cell <dbl>, organism <list>, …
## datasets with non-European, known ethnicity
facets(db, "self_reported_ethnicity")
#> # A tibble: 44 × 4
#> facet label ontology_term_id n
#> <chr> <chr> <chr> <int>
#> 1 self_reported_ethnicity unknown unknown 1437
#> 2 self_reported_ethnicity na na 534
#> 3 self_reported_ethnicity Asian HANCESTRO:0847 296
#> 4 self_reported_ethnicity African American HANCESTRO:0568 209
#> 5 self_reported_ethnicity European American HANCESTRO:0590 182
#> 6 self_reported_ethnicity Hispanic or Latin HANCESTRO:0612 177
#> 7 self_reported_ethnicity British HANCESTRO:0462 63
#> 8 self_reported_ethnicity Hispanic or Latin || Native A… HANCESTRO:0612 … 50
#> 9 self_reported_ethnicity South Asian HANCESTRO:0848 39
#> 10 self_reported_ethnicity Middle Eastern HANCESTRO:0852 29
#> # ℹ 34 more rows
ds |>
dplyr::filter(
!facets_filter(
self_reported_ethnicity, "label", c("European", "na", "unknown")
)
)
#> # A tibble: 156 × 36
#> dataset_id dataset_version_id collection_id donor_id assay batch_condition
#> <chr> <chr> <chr> <list> <list> <list>
#> 1 ed419b4e-db… c8da6eeb-84d7-437… af893e86-8e9… <chr> <list> <lgl [1]>
#> 2 aad97cb5-f3… 9db639c3-5c9c-4b8… af893e86-8e9… <chr> <list> <lgl [1]>
#> 3 8f10185b-e0… 45e411d4-c103-4c2… af893e86-8e9… <chr> <list> <lgl [1]>
#> 4 359f7af4-87… e3c7aa91-5edd-416… af893e86-8e9… <chr> <list> <lgl [1]>
#> 5 11ef37ee-21… 5903aa1b-c323-4ae… af893e86-8e9… <chr> <list> <lgl [1]>
#> 6 0129dbd9-a7… 7f2413c4-38c2-455… af893e86-8e9… <chr> <list> <lgl [1]>
#> 7 00e5dedd-b9… 3d0dcefd-cdf2-4b1… af893e86-8e9… <chr> <list> <lgl [1]>
#> 8 d68a8b48-ab… 83bbeaaf-f5e0-42a… 3a5dbf8a-9b3… <chr> <list> <lgl [1]>
#> 9 92594117-6e… a1e971da-5c35-46f… ad10cef8-9c6… <chr> <list> <lgl [1]>
#> 10 ca140407-ef… 2ab4df39-876d-4fd… 0540ee09-5b4… <chr> <list> <lgl [1]>
#> # ℹ 146 more rows
#> # ℹ 30 more variables: cell_count <int>, cell_type <list>, citation <chr>,
#> # default_embedding <chr>, development_stage <list>, disease <list>,
#> # embeddings <list>, explorer_url <chr>, feature_biotype <list>,
#> # feature_count <int>, feature_reference <list>,
#> # genetic_perturbation_strategy <lgl>, is_pre_analysis <lgl>,
#> # is_primary_data <list>, mean_genes_per_cell <dbl>, organism <list>, …