AUCell Mapping File Tutorial

In this notebook, we introduce a tutorial to calculate the AUCell score with our library: bioalpha.

The implementation of bioalpha.sc.tl.aucell is based on pySCENIC.

1. Prepare

This notebook uses gene set of mouse downloaded from GSEA and a 10k cell single-cell ma-seq dataset obtained from 10xGENOMICS.

Dataset: 10k Heart Cells from an E18 mouse (v3 chemistry)

Gene sets: m1.all.v2023.1.Mm

1.1. Load required library

import bioalpha

1.2. Download data

!wget -P ./data/enrichment/ https://data.broadinstitute.org/gsea-msigdb/msigdb/release/2023.1.Mm/m1.all.v2023.1.Mm.json
!wget -P ./data/datasets/ https://cf.10xgenomics.com/samples/cell-exp/3.0.0/heart_10k_v3/heart_10k_v3_raw_feature_bc_matrix.h5

1.3. Read data

adata = bioalpha.sc.read_10x_h5("./data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5")
adata.var_names_make_unique()
adata.write_h5ad("./data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5ad")
adata = bioalpha.h5ad_map.H5ADMap("./data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5ad")
UserWarnings: H5ADMap may write direct to disk. Please backup raw file.
UserWarnings: H5ADMap read data from disk. Please download data to local.
adata
AnnData object with n_obs × n_vars = 6794880 × 31053 backed at './data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5ad'
    var: 'gene_ids', 'feature_types', 'genome'

Or you can save gene signatures directly to adata.uns["gene_sigs"] by passing argument to parameter adata.

2. Basic filter

bioalpha.sc.pp.filter_cells(adata, min_genes=200)
bioalpha.sc.pp.filter_genes(adata, min_cells=3)
adata
AnnData object with n_obs × n_vars = 6794880 × 31053 backed at './data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5ad'
    obs: 'n_genes', 'filter_cells_mask'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'filter_genes_mask'

Subset data and copy to a new file. It helps us back up and filter raw data.

adata = adata.diet_subset(
    "./data/datasets/heart_10k_v3_raw_feature_bc_matrix_processed.h5ad",
    row_index="filter_cells_mask", col_index="filter_genes_mask")
adata
AnnData object with n_obs × n_vars = 8028 × 20193 backed at './data/datasets/heart_10k_v3_raw_feature_bc_matrix_processed.h5ad'
    obs: 'n_genes', 'filter_cells_mask'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'filter_genes_mask'

3. Read gene signatue

BioAlpha supports read .json file enrichment downloaded from GSEA.

ENRICHMENT_PATH = "./data/enrichment/m1.all.v2023.1.Mm.json"
# gene_sigs = bioalpha.sc.read_gene_signatures(ENRICHMENT_PATH)
bioalpha.sc.read_gene_signatures(ENRICHMENT_PATH, adata=adata)

4. Calc AUCell

In this notebook, we set normalize = True in order to visualize heatmap easier (which will introduce in next step).

bioalpha.sc.tl.aucell(adata, normalize=True, gene_sigs="gene_sigs")

If your have your own gene signature, pass argument to parameter gene_sigs.

# bioalpha.sc.tl.aucell(adata, gene_sigs=gene_sigs, normalize=True)

5. Plot AUCell

5.1. Heatmap

Note that we only show the first 1000 cells in each group.

bioalpha.sc.pl.aucell_heatmap(adata, cmap="YlOrRd", dendorgram=False)
../_images/bf1cede3423c06ebbaf9d8a459b4673dfec5070f342962fcc4b51feeaccca9b1.png

5.2. Plot with define gene sets

We now want to draw the AUCell score on a t-SNE or UMAP coordinate.

# Get 2D-array of t-SNE
bioalpha.sc.pp.pca(adata, n_comps=50)
bioalpha.sc.tl.tsne(adata)

bioalpha.sc.pl.aucell(adata, gene_sets=['MM79', 'MM80'], basis="tsne")
../_images/b7a3146ad4d9a9293b9597f9241a71187667b4bad39ca528c21e01c2f86c3423.png
adata.file.close()