single-cell

Screen Shot 2021-06-18 at 21 45 02


WormBase has developed two tools for exploring published C. elegans single cell RNA sequencing (scRNAseq) data: scdefg for interactive differential expression on integrated datasets and wormcells-viz for visualization of gene expression. These tools have been deployed at WormBase with public C. elegans datasets and will continue to be updated as new datasets are published. Source code is available at github.com/WormBase/scdefg and github.com/WormBase/wormcells-viz, together with instructions on how to deploy these tools with any scRNAseq dataset.

For a detailed overview, see the Single cell tools for WormBase preprint (July 2021).

For additional discussion see this 45 min talk from May 2021: [talk, slides].

Integrated Differential Expression: scdefg.textpressolab.com

Three datasets (CeNGEN, Packer 2019, Ben-David 2021) have been integrated and can be compared with differential expression. More information about each dataset is at the bottom of this page. Additionally, you can also visualize gene expression on the annotated cell types of each datasets using the links below

Visualize CeNGEN L4 neuron dataset: cengen.textpressolab.com

Visualize Packer 2019 embryogenesis dataset: packer2019.textpressolab.com

Visualize Ben-David 2021 L2 larvae dataset: bendavid2021.textpressolab.com

About the apps

image

The scdefg app is written in Python using Flask, and provides a single web page with an interface for selecting two groups of cells according to the existing annotations in the data. For example, the user can select a group according to a combination of cell type, sample, tissue and experimental group. Results are displayed in the form of an interactive volcano plot (log fold change vs p-value) and MA plot (log fold change vs mean expression) that display gene descriptions upon mouseover, and two sortable tabular views of the p-values and log fold changes of expression levels showing enriched and depleted genes. The tabular results can be downloaded in csv and Excel format or copied to the clipboard. The app can be launched from the command line by specifying the path to a trained scVI model and the user may specify data annotations by which the groups may be stratified (e.g. cell type, experiment). Differential expression is performed on the fly and can be done in reasonable time without using GPUs. We have deployed the app on a cloud instance with only 8GB RAM and 2 vCPUs and observed this configuration is sufficient for handling a few concurrent users with results being returned in about 15s.

The wormcells-viz app is written in Javascript and Python and uses React.js and D3.js for providing interactive and responsive visualizations of heatmaps, gene expression histograms and swarm plots (see below). Deploying the app requires having the pre-computed gene expression values stored in three custom anndata files as described in the the wormcells-viz repository. The following visualizations are currently implemented.

Heatmap

Visualization of scVI inferred expression rates for a selection of cell types and genes. The expression rates can be shown as either a traditional heatmap, or as a monochrome dotplot.

Gene expression histogram

Histograms of the scVI inferred expression rates for a given gene across all cell types in the data. The histogram bin counts are computed from the scVI inferred expression rates for each cell.

Swarm plot

For a given cell type, swarm plots visualize the relative expression of a set of genes across all cells annotated in a dataset. These plots are useful for identifying candidate marker genes.

The Y axis displays the set of selected genes, and the X axis displays the log fold change in gene expression between the cell type of interest and all other cell types. This is computed by doing pairwise differential expression of each annotated cell type vs the cell type of interest.

A Colab tutorial on how to make swarm plots is available here

How WormBase processes single cell RNA data: scvi-tools

There are currently hundreds of software tools and pipelines developed for scRNAseq data (see https://www.scrna-tools.org). For processing single cell data at WormBase we have chosen to use the scvi-tools.org framework. scvi-tools is different from most other scRNAseq tools in that it uses variational autoencoders to learn the distribution underlying the input data and create a generative model. Interested readers can learn more in about the framework in the scvi-tools documentation. Here we briefly highlight a few considerations that lead to our choice of using the the framework for driving scRNAseq analysis.

WormBase deployment philosophy for single cell tools

At the moment, the majority of scRNAseq data is generated using the 10X Genomics Chromium technology, with v2 and v3 chemistry. This is also true for C. elegans scRNAseq data. For the time being WormBase will focus development efforts on scRNAseq tools on 10X Genomics data. Two considerations drive this:

List of all C. elegans single cell datasets in anndata format

The anndata format (extension .h5ad) was published in 2018 as a generic class for handling annotated data matrices, with a focus on scRNA-seq data and Python support for machine learning, and with integration with the SCANPY analysis framework. Anndata is an efficient storage format because it uses HDF5 compression, and has come to be the standard format for manipulating scRNAseq data in Python, as well as providing support in R (see also zellkonverter).

Owing to the advantages of anndata and its popularity, WormBase adopted a convention for structuring published C. elegans scRNAseq data into anndata files with standard field names, to streamline their reuse in code pipelines. The guidelines used when wrangling data into the WormBase anndata convention are described in the supplemental tables and maintained at github.com/WormBase/anndata-wrangling.

Here we provide a curated collection of all C. elegans single cell RNA seq high throughput data wrangled into WormBase anndata standard fields. For completeness, we also list other low throughput single cell datasets that were not wrangled.

</tr>
Short Name Total cells Method h5ad Summary Article/preprint Original Data Notes
Taylor 2020 100,955 10x v2/v3 Download at Caltech Data L4 larvae neurons selected via flow cytometry Molecular topography of an entire nervous system. GSE136049 CeNGEN website Shiny R app to explore the data
Ben-David 2021 55,508 10x v2 Download at Caltech Data L2 larvae Whole-organism mapping of the genetics of gene expression at cellular resolution biorxiv 2020. PRJNA658829 Gene count matrix was kindly provided by the authors on request
Packer 2019 89,701 10x v2 Download at Caltech Data Several timepoints of embryo development A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution Science 2019. GSE126954 VisCello app for data exploration
Cao 2017 35,987 sci-RNA-seq Download at Caltech Data L2 larvae A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution Science 2019. GSE98561 and GSM4318946 (reprocessed) GSM4318946 release was a reannotation of the data
Tintori 2016 216 SMARTer kit Not wrangled Embryo through the 16-cell stage A Transcriptional Lineage of the Early C. elegans Embryo Dev Cell 2016. GSE77944 They made a custom visualizer at tintori.bio.unc.edu.
Hashimshony 2012 96 CEL-Seq Not wrangled Blastomere cells CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification Cell Rep. 2012 SRP014672 This was one of the pioneering works in scRNAseq and introduced the CEL-Seq technique.