Skip to contents

FACETS is a character vector of common fields used to subset cellxgene data.

facets() is used to query the cellxgene database for current values of one or all facets.

facets_filter() provides a convenient way to filter facets based on label or ontology term.

Usage

FACETS

facets(cellxgene_db = db(), facets = FACETS)

facets_filter(facet, key = c("label", "ontology_term_id"), value, exact = TRUE)

Format

FACETS is an object of class character of length 8.

Arguments

cellxgene_db

an (optional) cellxgene_db object, as returned by db().

facets

a character() vector corersponding to one of the facets in FACETS.

facet

the column containing faceted information, e.g., sex in datasets(db).

key

character(1) identifying whether value is a label or ontology_term_id.

value

character() value of the label or ontology term to filter on. The value may be a vector with length(value) > 0 for exact matchs (exact = TRUE, default), or a character(1) regular expression.

exact

logical(1) whether values match exactly (default, TRUE) or as a regular expression (FALSE).

Value

facets() returns a tibble with columns facet, label, ontology_term_id, and n, the number of times the facet label is used in the database.

facets_filter() returns a logical vector with length equal to the length (number of rows) of facet, with TRUE indicating that the value of key is present in the dataset.

Examples

f <- facets()

## levels of each facet
f |>
    dplyr::count(facet)
#> # A tibble: 8 × 2
#>   facet                       n
#>   <chr>                   <int>
#> 1 assay                      40
#> 2 cell_type                 910
#> 3 development_stage         249
#> 4 disease                   132
#> 5 organism                    7
#> 6 self_reported_ethnicity    34
#> 7 sex                         3
#> 8 tissue                    428

## same as facets(, facets = "organism")
f |>
    dplyr::filter(facet == "organism")
#> # A tibble: 7 × 4
#>   facet    label                 ontology_term_id     n
#>   <chr>    <chr>                 <chr>            <int>
#> 1 organism Homo sapiens          NCBITaxon:9606    1088
#> 2 organism Mus musculus          NCBITaxon:10090    364
#> 3 organism Callithrix jacchus    NCBITaxon:9483      28
#> 4 organism Macaca mulatta        NCBITaxon:9544      19
#> 5 organism Sus scrofa domesticus NCBITaxon:9825       3
#> 6 organism Pan troglodytes       NCBITaxon:9598       2
#> 7 organism Gorilla gorilla       NCBITaxon:9593       1

db <- db()
ds <- datasets(db)

## datasets with African American females
ds |>
    dplyr::filter(
        facets_filter(self_reported_ethnicity, "label", "African American"),
        facets_filter(sex, "label", "female")
    )
#> # A tibble: 58 × 33
#>    dataset_id   dataset_version_id collection_id donor_id assay  batch_condition
#>    <chr>        <chr>              <chr>         <list>   <list> <list>         
#>  1 01ad3cd7-39… 02a1eee1-e290-47d… 7d7cabfd-1d1… <chr>    <list> <lgl [1]>      
#>  2 de985818-28… f72aae6e-c997-484… c9706a92-0e5… <chr>    <list> <lgl [1]>      
#>  3 bab7432a-5c… 02a8ff13-a08b-461… 72d37bc9-76c… <chr>    <list> <chr [2]>      
#>  4 f64e1be1-de… c40911a4-47de-460… 62e8f058-9c3… <chr>    <list> <lgl [1]>      
#>  5 e9175006-89… db7b4a79-1d96-4aa… 62e8f058-9c3… <chr>    <list> <lgl [1]>      
#>  6 d4cfefa0-3a… 8f7fa4d2-0bbf-41e… 62e8f058-9c3… <chr>    <list> <lgl [1]>      
#>  7 d224c8e0-c2… b7d4db11-bca1-4bc… 62e8f058-9c3… <chr>    <list> <lgl [1]>      
#>  8 a6858c10-c5… 7e57a225-c979-4fa… 62e8f058-9c3… <chr>    <list> <lgl [1]>      
#>  9 576f193c-75… 1ba7d495-c1a8-480… 62e8f058-9c3… <chr>    <list> <lgl [1]>      
#> 10 486486d4-94… 090ba5ce-5c7f-473… 62e8f058-9c3… <chr>    <list> <lgl [1]>      
#> # ℹ 48 more rows
#> # ℹ 27 more variables: cell_count <int>, cell_type <list>, citation <chr>,
#> #   default_embedding <chr>, development_stage <list>, disease <list>,
#> #   embeddings <list>, explorer_url <chr>, feature_biotype <list>,
#> #   feature_count <int>, feature_reference <list>, is_primary_data <list>,
#> #   mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
#> #   raw_data_location <chr>, schema_version <chr>, …

## datasets with non-European, known ethnicity
facets(db, "self_reported_ethnicity")
#> # A tibble: 34 × 4
#>    facet                   label                          ontology_term_id     n
#>    <chr>                   <chr>                          <chr>            <int>
#>  1 self_reported_ethnicity European                       HANCESTRO:0005     588
#>  2 self_reported_ethnicity unknown                        unknown            563
#>  3 self_reported_ethnicity na                             na                 408
#>  4 self_reported_ethnicity Asian                          HANCESTRO:0008     153
#>  5 self_reported_ethnicity African American               HANCESTRO:0568      67
#>  6 self_reported_ethnicity Hispanic or Latin American     HANCESTRO:0014      67
#>  7 self_reported_ethnicity Native American,Hispanic or L… HANCESTRO:0013,…    50
#>  8 self_reported_ethnicity African American or Afro-Cari… HANCESTRO:0016      32
#>  9 self_reported_ethnicity Greater Middle Eastern  (Midd… HANCESTRO:0015      23
#> 10 self_reported_ethnicity African                        HANCESTRO:0010      20
#> # ℹ 24 more rows
ds |>
    dplyr::filter(
        !facets_filter(
            self_reported_ethnicity, "label", c("European", "na", "unknown")
        )
    )
#> # A tibble: 31 × 33
#>    dataset_id   dataset_version_id collection_id donor_id assay  batch_condition
#>    <chr>        <chr>              <chr>         <list>   <list> <list>         
#>  1 cfa3c355-ee… 4dc06a70-6d39-4da… 9c8808ce-113… <chr>    <list> <lgl [1]>      
#>  2 a9c5aecf-3b… 579db439-a9dc-4fc… 3116d060-0a8… <chr>    <list> <lgl [1]>      
#>  3 6d4b3d09-f1… 431185b3-45d0-4f9… 3116d060-0a8… <chr>    <list> <lgl [1]>      
#>  4 1368fad2-91… 724f2fae-92cb-4ed… 3116d060-0a8… <chr>    <list> <lgl [1]>      
#>  5 e6a11140-25… dbcbe0a6-918a-444… e5f58829-1a6… <chr>    <list> <lgl [1]>      
#>  6 6ec405bb-47… eaf5be60-06d9-45e… e5f58829-1a6… <chr>    <list> <lgl [1]>      
#>  7 2ba40233-85… 541ef4e5-8142-496… e5f58829-1a6… <chr>    <list> <lgl [1]>      
#>  8 2423ce2c-31… 5ee25df3-0ff9-437… e5f58829-1a6… <chr>    <list> <lgl [1]>      
#>  9 2adb1f8a-a6… 7a455e3b-dd79-499… 38833785-fac… <chr>    <list> <lgl [1]>      
#> 10 a9bedd04-51… 33bfb460-b474-4b1… 6686ada5-43a… <chr>    <list> <lgl [1]>      
#> # ℹ 21 more rows
#> # ℹ 27 more variables: cell_count <int>, cell_type <list>, citation <chr>,
#> #   default_embedding <chr>, development_stage <list>, disease <list>,
#> #   embeddings <list>, explorer_url <chr>, feature_biotype <list>,
#> #   feature_count <int>, feature_reference <list>, is_primary_data <list>,
#> #   mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
#> #   raw_data_location <chr>, schema_version <chr>, …