AUCell Tutorial

In this notebook, we introduce a tutorial to calculate the AUCell score with our library: bioalpha.

The implementation of bioalpha.sc.tl.aucell is based on pySCENIC.

1. Prepare

This notebook uses gene set of mouse downloaded from GSEA and a 10k cell single-cell ma-seq dataset obtained from 10xGENOMICS.

Dataset: 10k Heart Cells from an E18 mouse (v3 chemistry)

Gene sets: m1.all.v2023.1.Mm

1.1. Load required library

import bioalpha

1.2. Download data

!wget -P ./data/enrichment/ https://data.broadinstitute.org/gsea-msigdb/msigdb/release/2023.1.Mm/m1.all.v2023.1.Mm.json
!wget -P ./data/datasets/ https://cf.10xgenomics.com/samples/cell-exp/3.0.0/heart_10k_v3/heart_10k_v3_raw_feature_bc_matrix.h5

1.3. Read data

adata = bioalpha.sc.read_10x_h5("./data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5")
adata.var_names_make_unique()
adata
AnnData object with n_obs × n_vars = 6794880 × 31053
    var: 'gene_ids', 'feature_types', 'genome'

BioAlpha supports read .json file enrichment downloaded from GSEA.

ENRICHMENT_PATH = "./data/enrichment/m1.all.v2023.1.Mm.json"
# gene_sigs = bioalpha.sc.read_gene_signatures(ENRICHMENT_PATH)

Or you can save gene signatures directly to adata.uns["gene_sigs"] by passing argument to parameter adata.

bioalpha.sc.read_gene_signatures(ENRICHMENT_PATH, adata=adata)

2. Basic filter

bioalpha.sc.pp.filter_cells(adata, min_genes=200)
bioalpha.sc.pp.filter_genes(adata, min_cells=3)

3. Calc AUCell

In this notebook, we set normalize = True in order to visualize heatmap easier (which will introduce in next step).

bioalpha.sc.tl.aucell(adata, normalize=True, gene_sigs="gene_sigs")

If your have your own gene signature, pass argument to parameter gene_sigs.

# bioalpha.sc.tl.aucell(adata, gene_sigs=gene_sigs, normalize=True)

4. Plot AUCell

4.1. Heatmap

Note that we only show the first 1000 cells in each group.

bioalpha.sc.pl.aucell_heatmap(adata, cmap="YlOrRd", dendorgram=False)
WARNING: groups size is larger than 1000, only use first 1000 cells
../_images/5940f8058ddfa1b051f5a19f3c68d886fe2106fdd1dfb75a6a21e8b0a0cf8afb.png

4.2. Plot with define gene sets

We now want to draw the AUCell score on a t-SNE or UMAP coordinate.

# Get 2D-array of t-SNE
bioalpha.sc.pp.pca(adata, n_comps=50)
bioalpha.sc.tl.tsne(adata)

bioalpha.sc.pl.aucell(adata, gene_sets=['MM79', 'MM80'], basis="tsne")
../_images/e6968b6e77393ab3062a8e547e322561419e06124dd9370bd1ab263aedbfc7e0.png