Case studies
Martin Morgan
Roswell Park Comprehensive Cancer CenterMartin.Morgan@RoswellPark.org Source:
vignettes/b_case_studies.Rmd
b_case_studies.Rmd
Abstract
This article summarizes short case studies and solutions arising from user queries.
Setup
For each case study, ensure that cellxgenedp (see the Bioconductor package landing page, or GitHub.io site) is installed (additional installation options are at https://mtmorgan.github.io/cellxgenedp/).
if (!"BiocManager" %in% rownames(installed.packages()))
install.packages("BiocManager", repos = "https://CRAN.R-project.org")
BiocManager::install("cellxgenedp")
Load the package.
Case study: authors & datasets
Challenge and solution
This case study arose from a question on the CZI Science Community Slack. A user asked
Hi! Is it possible to search CELLxGENE and identify all datasets by a specific author or set of authors?
Unfortunately, this is not possible from the CELLxGENE web site – authors are only associated with collections, and collections can only be sorted or filtered by title (or publication / tissue / disease / organism).
A cellxgenedp
solution uses authors()
to discover authors and their
collections, and joins this information to datasets()
.
author_datasets <- left_join(
authors(),
datasets(),
by = "collection_id",
relationship = "many-to-many"
)
author_datasets
#> # A tibble: 59,767 × 36
#> collection_id family given consortium dataset_id dataset_version_id donor_id
#> <chr> <chr> <chr> <chr> <chr> <chr> <list>
#> 1 dc3a5256-5c39… McEvoy Cait… NA 0bae7ebf-… e6b8dce0-19e6-419… <chr>
#> 2 dc3a5256-5c39… Murphy Juli… NA 0bae7ebf-… e6b8dce0-19e6-419… <chr>
#> 3 dc3a5256-5c39… Zhang Lin NA 0bae7ebf-… e6b8dce0-19e6-419… <chr>
#> 4 dc3a5256-5c39… Clote… Sergi NA 0bae7ebf-… e6b8dce0-19e6-419… <chr>
#> 5 dc3a5256-5c39… Mathe… Jess… NA 0bae7ebf-… e6b8dce0-19e6-419… <chr>
#> 6 dc3a5256-5c39… An James NA 0bae7ebf-… e6b8dce0-19e6-419… <chr>
#> 7 dc3a5256-5c39… Karim… Mehr… NA 0bae7ebf-… e6b8dce0-19e6-419… <chr>
#> 8 dc3a5256-5c39… Pouya… Dela… NA 0bae7ebf-… e6b8dce0-19e6-419… <chr>
#> 9 dc3a5256-5c39… Su Shen… NA 0bae7ebf-… e6b8dce0-19e6-419… <chr>
#> 10 dc3a5256-5c39… Zasla… Olga NA 0bae7ebf-… e6b8dce0-19e6-419… <chr>
#> # ℹ 59,757 more rows
#> # ℹ 29 more variables: assay <list>, batch_condition <list>, cell_count <int>,
#> # cell_type <list>, citation <chr>, default_embedding <chr>,
#> # development_stage <list>, disease <list>, embeddings <list>,
#> # explorer_url <chr>, feature_biotype <list>, feature_count <int>,
#> # feature_reference <list>, is_primary_data <list>,
#> # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>, …
author_datasets
provides a convenient point from which
to make basic queries, e.g., finding the authors contributing the most
datasets.
author_datasets |>
count(family, given, sort = TRUE)
#> # A tibble: 5,594 × 3
#> family given n
#> <chr> <chr> <int>
#> 1 Teichmann Sarah A. 335
#> 2 Casper Tamara 261
#> 3 Dee Nick 261
#> 4 Chen Fei 258
#> 5 Murray Evan 258
#> 6 Keene C. Dirk 252
#> 7 Hirschstein Daniel 241
#> 8 Macosko Evan Z. 237
#> 9 Ding Song-Lin 230
#> 10 Lein Ed S. 226
#> # ℹ 5,584 more rows
Perhaps one is interested in the most prolific authors based on ‘collections’, rather than ‘datasets’. The five most prolific authors by collection are
prolific_authors <-
authors() |>
count(family, given, sort = TRUE) |>
slice(1:5)
prolific_authors
#> # A tibble: 5 × 3
#> family given n
#> <chr> <chr> <int>
#> 1 Teichmann Sarah A. 34
#> 2 Meyer Kerstin B. 16
#> 3 Polanski Krzysztof 16
#> 4 Regev Aviv 15
#> 5 Haniffa Muzlifah 14
The datasets associated with authors are
right_join(
author_datasets,
prolific_authors,
by = c("family", "given")
)
#> # A tibble: 822 × 37
#> collection_id family given consortium dataset_id dataset_version_id donor_id
#> <chr> <chr> <chr> <chr> <chr> <chr> <list>
#> 1 b1a879f6-5638… Polan… Krzy… NA fd072bc3-… 06c6546e-719b-403… <chr>
#> 2 b1a879f6-5638… Polan… Krzy… NA f1e4a73c-… 9762dea1-fe6b-4e5… <chr>
#> 3 b1a879f6-5638… Polan… Krzy… NA e21487a3-… 14777d63-e4f5-4db… <chr>
#> 4 b1a879f6-5638… Polan… Krzy… NA dcc10fd8-… 6e1268fc-0ae4-4c0… <chr>
#> 5 b1a879f6-5638… Polan… Krzy… NA d0c12af4-… 628d0c56-e77d-48c… <chr>
#> 6 b1a879f6-5638… Polan… Krzy… NA c4e2bde2-… beab2fab-9b39-4c2… <chr>
#> 7 b1a879f6-5638… Polan… Krzy… NA aede7ec2-… d9c3fc9a-299b-4f5… <chr>
#> 8 b1a879f6-5638… Polan… Krzy… NA aa633105-… f75c06fc-4a65-4dd… <chr>
#> 9 b1a879f6-5638… Polan… Krzy… NA 8d73847b-… dc9e1cd4-309d-42f… <chr>
#> 10 b1a879f6-5638… Polan… Krzy… NA 83de2c2c-… 5a1c9d6d-bb96-457… <chr>
#> # ℹ 812 more rows
#> # ℹ 30 more variables: assay <list>, batch_condition <list>, cell_count <int>,
#> # cell_type <list>, citation <chr>, default_embedding <chr>,
#> # development_stage <list>, disease <list>, embeddings <list>,
#> # explorer_url <chr>, feature_biotype <list>, feature_count <int>,
#> # feature_reference <list>, is_primary_data <list>,
#> # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>, …
Alternatively, one might be interested in specific authors. This is
most easily accomplished with a simple filter on
author_datasets
, e.g.,
author_datasets |>
filter(
family %in% c("Teichmann", "Regev", "Haniffa")
)
#> # A tibble: 599 × 36
#> collection_id family given consortium dataset_id dataset_version_id donor_id
#> <chr> <chr> <chr> <chr> <chr> <chr> <list>
#> 1 b1a879f6-5638… Hanif… Muzl… NA fd072bc3-… 06c6546e-719b-403… <chr>
#> 2 b1a879f6-5638… Hanif… Muzl… NA f1e4a73c-… 9762dea1-fe6b-4e5… <chr>
#> 3 b1a879f6-5638… Hanif… Muzl… NA e21487a3-… 14777d63-e4f5-4db… <chr>
#> 4 b1a879f6-5638… Hanif… Muzl… NA dcc10fd8-… 6e1268fc-0ae4-4c0… <chr>
#> 5 b1a879f6-5638… Hanif… Muzl… NA d0c12af4-… 628d0c56-e77d-48c… <chr>
#> 6 b1a879f6-5638… Hanif… Muzl… NA c4e2bde2-… beab2fab-9b39-4c2… <chr>
#> 7 b1a879f6-5638… Hanif… Muzl… NA aede7ec2-… d9c3fc9a-299b-4f5… <chr>
#> 8 b1a879f6-5638… Hanif… Muzl… NA aa633105-… f75c06fc-4a65-4dd… <chr>
#> 9 b1a879f6-5638… Hanif… Muzl… NA 8d73847b-… dc9e1cd4-309d-42f… <chr>
#> 10 b1a879f6-5638… Hanif… Muzl… NA 83de2c2c-… 5a1c9d6d-bb96-457… <chr>
#> # ℹ 589 more rows
#> # ℹ 29 more variables: assay <list>, batch_condition <list>, cell_count <int>,
#> # cell_type <list>, citation <chr>, default_embedding <chr>,
#> # development_stage <list>, disease <list>, embeddings <list>,
#> # explorer_url <chr>, feature_biotype <list>, feature_count <int>,
#> # feature_reference <list>, is_primary_data <list>,
#> # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>, …
or more carefully by constructing at data.frame
of
family and given names, and performing a join with
author_datasets
authors_of_interest <-
tibble(
family = c("Teichmann", "Regev", "Haniffa"),
given = c("Sarah A.", "Aviv", "Muzlifah")
)
right_join(
author_datasets,
authors_of_interest,
by = c("family", "given")
)
#> # A tibble: 513 × 36
#> collection_id family given consortium dataset_id dataset_version_id donor_id
#> <chr> <chr> <chr> <chr> <chr> <chr> <list>
#> 1 b1a879f6-5638… Hanif… Muzl… NA fd072bc3-… 06c6546e-719b-403… <chr>
#> 2 b1a879f6-5638… Hanif… Muzl… NA f1e4a73c-… 9762dea1-fe6b-4e5… <chr>
#> 3 b1a879f6-5638… Hanif… Muzl… NA e21487a3-… 14777d63-e4f5-4db… <chr>
#> 4 b1a879f6-5638… Hanif… Muzl… NA dcc10fd8-… 6e1268fc-0ae4-4c0… <chr>
#> 5 b1a879f6-5638… Hanif… Muzl… NA d0c12af4-… 628d0c56-e77d-48c… <chr>
#> 6 b1a879f6-5638… Hanif… Muzl… NA c4e2bde2-… beab2fab-9b39-4c2… <chr>
#> 7 b1a879f6-5638… Hanif… Muzl… NA aede7ec2-… d9c3fc9a-299b-4f5… <chr>
#> 8 b1a879f6-5638… Hanif… Muzl… NA aa633105-… f75c06fc-4a65-4dd… <chr>
#> 9 b1a879f6-5638… Hanif… Muzl… NA 8d73847b-… dc9e1cd4-309d-42f… <chr>
#> 10 b1a879f6-5638… Hanif… Muzl… NA 83de2c2c-… 5a1c9d6d-bb96-457… <chr>
#> # ℹ 503 more rows
#> # ℹ 29 more variables: assay <list>, batch_condition <list>, cell_count <int>,
#> # cell_type <list>, citation <chr>, default_embedding <chr>,
#> # development_stage <list>, disease <list>, embeddings <list>,
#> # explorer_url <chr>, feature_biotype <list>, feature_count <int>,
#> # feature_reference <list>, is_primary_data <list>,
#> # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>, …
Areas of interest
There are several interesting questions that suggest themselves, and several areas where some additional work is required.
It might be interesting to identify authors working on similar
disease, or other areas of interest. The disease
column in
the author_datasets
table is a list.
author_datasets |>
select(family, given, dataset_id, disease)
#> # A tibble: 59,767 × 4
#> family given dataset_id disease
#> <chr> <chr> <chr> <list>
#> 1 McEvoy Caitriona M. 0bae7ebf-eb54-46a6-be9a-3461cecefa4c <list [1]>
#> 2 Murphy Julia M. 0bae7ebf-eb54-46a6-be9a-3461cecefa4c <list [1]>
#> 3 Zhang Lin 0bae7ebf-eb54-46a6-be9a-3461cecefa4c <list [1]>
#> 4 Clotet-Freixas Sergi 0bae7ebf-eb54-46a6-be9a-3461cecefa4c <list [1]>
#> 5 Mathews Jessica A. 0bae7ebf-eb54-46a6-be9a-3461cecefa4c <list [1]>
#> 6 An James 0bae7ebf-eb54-46a6-be9a-3461cecefa4c <list [1]>
#> 7 Karimzadeh Mehran 0bae7ebf-eb54-46a6-be9a-3461cecefa4c <list [1]>
#> 8 Pouyabahar Delaram 0bae7ebf-eb54-46a6-be9a-3461cecefa4c <list [1]>
#> 9 Su Shenghui 0bae7ebf-eb54-46a6-be9a-3461cecefa4c <list [1]>
#> 10 Zaslaver Olga 0bae7ebf-eb54-46a6-be9a-3461cecefa4c <list [1]>
#> # ℹ 59,757 more rows
This is because a single dataset may involve more than one disease.
Furthermore, each entry in the list contains two elements, the
label
and ontology_term_id
of the disease.
There are two approaches to working with this data.
One approach to working with this data uses facilities in cellxgenedp as outlined in an accompanying article. Discover possible diseases.
facets(db(), "disease")
#> # A tibble: 169 × 4
#> facet label ontology_term_id n
#> <chr> <chr> <chr> <int>
#> 1 disease normal PATO:0000461 1541
#> 2 disease COVID-19 MONDO:0100096 66
#> 3 disease dementia MONDO:0001627 50
#> 4 disease breast cancer MONDO:0007254 34
#> 5 disease myocardial infarction MONDO:0005068 30
#> 6 disease diabetic kidney disease MONDO:0005016 26
#> 7 disease Alzheimer disease MONDO:0004975 24
#> 8 disease autosomal dominant polycystic kidney disease MONDO:0004691 24
#> 9 disease nonpapillary renal cell carcinoma MONDO:0007763 20
#> 10 disease colorectal cancer MONDO:0005575 17
#> # ℹ 159 more rows
Focus on COVID-19
, and use facets_filter()
to select relevant author-dataset combinations.
author_datasets |>
filter(facets_filter(disease, "label", "COVID-19"))
#> # A tibble: 1,912 × 36
#> collection_id family given consortium dataset_id dataset_version_id donor_id
#> <chr> <chr> <chr> <chr> <chr> <chr> <list>
#> 1 b9fc3d70-5a72… Jin Kang NA e763ed0d-… 5b916cc5-f01f-42c… <chr>
#> 2 b9fc3d70-5a72… Jin Kang NA db0752b9-… 67a20819-b7a9-4a0… <chr>
#> 3 b9fc3d70-5a72… Jin Kang NA d9b4bc69-… 7fde015a-c913-4ae… <chr>
#> 4 b9fc3d70-5a72… Jin Kang NA bc2a7b3d-… c78a352d-38e5-4b0… <chr>
#> 5 b9fc3d70-5a72… Jin Kang NA ae5341b8-… 497ab773-4fd5-426… <chr>
#> 6 b9fc3d70-5a72… Jin Kang NA 96a3f64b-… 9f073903-c5be-4f6… <chr>
#> 7 b9fc3d70-5a72… Jin Kang NA 5e717147-… ad80600f-5ae0-496… <chr>
#> 8 b9fc3d70-5a72… Jin Kang NA 59b69042-… a06a1d9e-b1e8-452… <chr>
#> 9 b9fc3d70-5a72… Jin Kang NA 4c4cd77c-… c23f6f92-873d-46b… <chr>
#> 10 b9fc3d70-5a72… Jin Kang NA 055ca631-… 1401027b-5283-4ca… <chr>
#> # ℹ 1,902 more rows
#> # ℹ 29 more variables: assay <list>, batch_condition <list>, cell_count <int>,
#> # cell_type <list>, citation <chr>, default_embedding <chr>,
#> # development_stage <list>, disease <list>, embeddings <list>,
#> # explorer_url <chr>, feature_biotype <list>, feature_count <int>,
#> # feature_reference <list>, is_primary_data <list>,
#> # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>, …
Authors contributing to these datasets are
author_datasets |>
filter(facets_filter(disease, "label", "COVID-19")) |>
count(family, given, sort = TRUE)
#> # A tibble: 836 × 3
#> family given n
#> <chr> <chr> <int>
#> 1 Farber Donna L. 29
#> 2 Guo Xinzheng V. 28
#> 3 Saqi Anjali 28
#> 4 Baldwin Matthew R. 27
#> 5 Chait Michael 27
#> 6 Connors Thomas J. 27
#> 7 Davis-Porada Julia 27
#> 8 Dogra Pranay 27
#> 9 Gray Joshua I. 27
#> 10 Idzikowski Emma 27
#> # ℹ 826 more rows
A second approach is to follow the practices in R for Data Science, the
disease
column can be ‘unnested’ twice, the first time to
expand the author_datasets
table for each disease, and the
second time to separate the two columns of each disease.
author_dataset_diseases <-
author_datasets |>
select(family, given, dataset_id, disease) |>
tidyr::unnest_longer(disease) |>
tidyr::unnest_wider(disease)
author_dataset_diseases
#> # A tibble: 77,220 × 5
#> family given dataset_id label ontology_term_id
#> <chr> <chr> <chr> <chr> <chr>
#> 1 McEvoy Caitriona M. 0bae7ebf-eb54-46a6-be9a-3… norm… PATO:0000461
#> 2 Murphy Julia M. 0bae7ebf-eb54-46a6-be9a-3… norm… PATO:0000461
#> 3 Zhang Lin 0bae7ebf-eb54-46a6-be9a-3… norm… PATO:0000461
#> 4 Clotet-Freixas Sergi 0bae7ebf-eb54-46a6-be9a-3… norm… PATO:0000461
#> 5 Mathews Jessica A. 0bae7ebf-eb54-46a6-be9a-3… norm… PATO:0000461
#> 6 An James 0bae7ebf-eb54-46a6-be9a-3… norm… PATO:0000461
#> 7 Karimzadeh Mehran 0bae7ebf-eb54-46a6-be9a-3… norm… PATO:0000461
#> 8 Pouyabahar Delaram 0bae7ebf-eb54-46a6-be9a-3… norm… PATO:0000461
#> 9 Su Shenghui 0bae7ebf-eb54-46a6-be9a-3… norm… PATO:0000461
#> 10 Zaslaver Olga 0bae7ebf-eb54-46a6-be9a-3… norm… PATO:0000461
#> # ℹ 77,210 more rows
Author-dataset combinations associated with COVID-19, and contributors to these datasets, are
author_dataset_diseases |>
filter(label == "COVID-19")
author_dataset_diseases |>
filter(label == "COVID-19") |>
count(family, given, sort = TRUE)
These computations are the same as the earlier iteration using functionality in cellxgenedp.
A further resource that might be of interest is the [OSLr][] package article illustrating how the ontologies used by CELLxGENE can be manipulated to, e.g., identify studies with terms that derive from a common term (e.g., all disease terms related to ‘carcinoma’).
Collaboration
TODO.
It might be interesting to know which authors have collaborated with
one another. This can be computed from the author_datasets
table, following approaches developed in the grantpubcite package to
identify collaborations between projects in the NIH-funded ITCR program.
See the graph visualization in the ITCR
collaboration section for inspiration.
Duplicate collection-author combinations
Here are the authors
authors <- authors()
authors
#> # A tibble: 7,305 × 4
#> collection_id family given consortium
#> <chr> <chr> <chr> <chr>
#> 1 dc3a5256-5c39-4a21-ac0c-4ede3e7b2323 McEvoy Caitriona M. NA
#> 2 dc3a5256-5c39-4a21-ac0c-4ede3e7b2323 Murphy Julia M. NA
#> 3 dc3a5256-5c39-4a21-ac0c-4ede3e7b2323 Zhang Lin NA
#> 4 dc3a5256-5c39-4a21-ac0c-4ede3e7b2323 Clotet-Freixas Sergi NA
#> 5 dc3a5256-5c39-4a21-ac0c-4ede3e7b2323 Mathews Jessica A. NA
#> 6 dc3a5256-5c39-4a21-ac0c-4ede3e7b2323 An James NA
#> 7 dc3a5256-5c39-4a21-ac0c-4ede3e7b2323 Karimzadeh Mehran NA
#> 8 dc3a5256-5c39-4a21-ac0c-4ede3e7b2323 Pouyabahar Delaram NA
#> 9 dc3a5256-5c39-4a21-ac0c-4ede3e7b2323 Su Shenghui NA
#> 10 dc3a5256-5c39-4a21-ac0c-4ede3e7b2323 Zaslaver Olga NA
#> # ℹ 7,295 more rows
There are 7305 collection-author combinations. We expect these to be distinct (each row identifying a unique collection-author combination). But this is not true
Duplicated data are
authors |>
count(collection_id, family, given, consortium, sort = TRUE) |>
filter(n > 1)
#> # A tibble: 24 × 5
#> collection_id family given consortium n
#> <chr> <chr> <chr> <chr> <int>
#> 1 51544e44-293b-4c2b-8c26-560678423380 Betts Michael R. NA 2
#> 2 51544e44-293b-4c2b-8c26-560678423380 Faryabi Robert B. NA 2
#> 3 51544e44-293b-4c2b-8c26-560678423380 Fasolino Maria NA 2
#> 4 51544e44-293b-4c2b-8c26-560678423380 Feldman Michael NA 2
#> 5 51544e44-293b-4c2b-8c26-560678423380 Goldman Naomi NA 2
#> 6 51544e44-293b-4c2b-8c26-560678423380 Golson Maria L. NA 2
#> 7 51544e44-293b-4c2b-8c26-560678423380 Japp Alberto S. NA 2
#> 8 51544e44-293b-4c2b-8c26-560678423380 Kaestner Klaus H. NA 2
#> 9 51544e44-293b-4c2b-8c26-560678423380 Kondo Ayano NA 2
#> 10 51544e44-293b-4c2b-8c26-560678423380 Liu Chengyang NA 2
#> # ℹ 14 more rows
Discover details of the first duplicated collection,
e5f58829-1a66-40b5-a624-9046778e74f5
duplicate_authors <-
collections() |>
filter(collection_id == "e5f58829-1a66-40b5-a624-9046778e74f5")
duplicate_authors
#> # A tibble: 1 × 18
#> collection_id collection_version_id collection_url consortia contact_email
#> <chr> <chr> <chr> <list> <chr>
#> 1 e5f58829-1a66-40… d9c9abfb-ddd1-4172-b… https://cellx… <chr [2]> angela.olive…
#> # ℹ 13 more variables: contact_name <chr>, curator_name <chr>,
#> # description <chr>, doi <chr>, links <list>, name <chr>,
#> # publisher_metadata <list>, revising_in <lgl>, revision_of <lgl>,
#> # visibility <chr>, created_at <date>, published_at <date>, revised_at <date>
The author information comes from the publisher_metadata
column
publisher_metadata <-
duplicate_authors |>
pull(publisher_metadata)
This is a ‘list-of-lists’, with relevant information as elements in the first list
names(publisher_metadata[[1]])
#> [1] "authors" "is_preprint" "journal" "published_at"
#> [5] "published_day" "published_month" "published_year"
and relevant information in the authors
field, of which
there are 221
length(publisher_metadata[[1]][["authors"]])
#> [1] 164
Inspection shows that there are four authors with family name
Pisco
and given name Angela Oliveira
: it
appears that the data provided by CZI indeed includes duplicate author
names.
From a pragmatic perspective, it might make sense to remove duplicate
entries from authors
before down-stream analysis.
deduplicated_authors <- distinct(authors)
Tools that I have found useful when working with list-of-lists style data rare listviewer::jsonedit() for visualization, and rjsoncons for filtering and querying these data using JSONpointer, JSONpath, or JMESpath expression (a more R-centric tool is the purrr package).
What is an ‘author’?
The combination of family and given name may refer to two (or more) different individuals (e.g., two individuals named ‘Martin Morgan’), or a single individual may be recorded under two different names (e.g., given name sometimes ‘Martin’ and sometimes ‘Martin T.’). It is not clear how this could be resolved; recording ORCID identifiers migth help with disambiguation.
Case study: using ontology to identify datasets
This case study was developed in response to the following Slack question:
CELLxGENE’s webpage is using different ontologies and displaying them in an easy to interogate manner (choosing amongst 3 possible coarseness for cell types, tissues and age) I was wondering if this simplified tree of the 3 subgroups for cell type, tissue and age categories was available somewhere?
As indicated in the question, CELLxGENE provides some access to ontologies through a hand-curated three-tiered classification of specific facets; the tiers can be retrieved from publicly available code, but one might want to develop a more flexible or principled approach.
CELLxGENE dataset facets like ‘disease’ and ‘cell type’ use terms from ontologies. Ontologies arrange terms in directed acyclic graphs, and use of ontologies can be useful to identify related datasets. For instance, one might be interesed in cancer-related datasets (derived from the ‘carcinoma’ term in the corresponding ontology) in general, rather than, e.g., ‘B-cell non-Hodgkins lymphoma’.
In exploring this question in R, I found myself developing the OLSr package to query and process ontologies from the EMBL-EBI Ontology Lookup Service. See the ‘Case Study: CELLxGENE Ontologies’ article in the OLSr package for full details.
Session information
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] cellxgenedp_1.13.1 dplyr_1.1.4 BiocStyle_2.36.0
#>
#> loaded via a namespace (and not attached):
#> [1] jsonlite_2.0.0 compiler_4.5.1 BiocManager_1.30.26
#> [4] promises_1.3.3 Rcpp_1.1.0 tidyselect_1.2.1
#> [7] tidyr_1.3.1 later_1.4.2 jquerylib_0.1.4
#> [10] systemfonts_1.2.3 textshaping_1.0.1 yaml_2.3.10
#> [13] fastmap_1.2.0 mime_0.13 R6_2.6.1
#> [16] rjsoncons_1.3.2 generics_0.1.4 curl_6.4.0
#> [19] knitr_1.50 htmlwidgets_1.6.4 tibble_3.3.0
#> [22] bookdown_0.43 desc_1.4.3 shiny_1.11.1
#> [25] bslib_0.9.0 pillar_1.11.0 rlang_1.1.6
#> [28] utf8_1.2.6 DT_0.33 cachem_1.1.0
#> [31] httpuv_1.6.16 xfun_0.52 fs_1.6.6
#> [34] sass_0.4.10 cli_3.6.5 withr_3.0.2
#> [37] pkgdown_2.1.3 magrittr_2.0.3 digest_0.6.37
#> [40] xtable_1.8-4 lifecycle_1.0.4 vctrs_0.6.5
#> [43] evaluate_1.0.4 glue_1.8.0 ragg_1.4.0
#> [46] purrr_1.0.4 rmarkdown_2.29 httr_1.4.7
#> [49] tools_4.5.1 pkgconfig_2.0.3 htmltools_0.5.8.1