AUCell Mapping File Tutorial
In this notebook, we introduce a tutorial to calculate the AUCell score with our library: bioalpha
.
The implementation of bioalpha.sc.tl.aucell
is based on pySCENIC.
1. Prepare
This notebook uses gene set of mouse downloaded from GSEA and a 10k cell single-cell ma-seq dataset obtained from 10xGENOMICS.
Dataset: 10k Heart Cells from an E18 mouse (v3 chemistry)
Gene sets: m1.all.v2023.1.Mm
1.1. Load required library
import bioalpha
1.2. Download data
!wget -P ./data/enrichment/ https://data.broadinstitute.org/gsea-msigdb/msigdb/release/2023.1.Mm/m1.all.v2023.1.Mm.json
!wget -P ./data/datasets/ https://cf.10xgenomics.com/samples/cell-exp/3.0.0/heart_10k_v3/heart_10k_v3_raw_feature_bc_matrix.h5
1.3. Read data
adata = bioalpha.sc.read_10x_h5("./data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5")
adata.var_names_make_unique()
adata.write_h5ad("./data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5ad")
adata = bioalpha.h5ad_map.H5ADMap("./data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5ad")
UserWarnings: H5ADMap may write direct to disk. Please backup raw file.
UserWarnings: H5ADMap read data from disk. Please download data to local.
adata
AnnData object with n_obs × n_vars = 6794880 × 31053 backed at './data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5ad'
var: 'gene_ids', 'feature_types', 'genome'
Or you can save gene signatures directly to adata.uns["gene_sigs"]
by passing argument to parameter adata
.
2. Basic filter
bioalpha.sc.pp.filter_cells(adata, min_genes=200)
bioalpha.sc.pp.filter_genes(adata, min_cells=3)
adata
AnnData object with n_obs × n_vars = 6794880 × 31053 backed at './data/datasets/heart_10k_v3_raw_feature_bc_matrix.h5ad'
obs: 'n_genes', 'filter_cells_mask'
var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'filter_genes_mask'
Subset data and copy to a new file. It helps us back up and filter raw data.
adata = adata.diet_subset(
"./data/datasets/heart_10k_v3_raw_feature_bc_matrix_processed.h5ad",
row_index="filter_cells_mask", col_index="filter_genes_mask")
adata
AnnData object with n_obs × n_vars = 8028 × 20193 backed at './data/datasets/heart_10k_v3_raw_feature_bc_matrix_processed.h5ad'
obs: 'n_genes', 'filter_cells_mask'
var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'filter_genes_mask'
3. Read gene signatue
BioAlpha
supports read .json
file enrichment downloaded from GSEA.
ENRICHMENT_PATH = "./data/enrichment/m1.all.v2023.1.Mm.json"
# gene_sigs = bioalpha.sc.read_gene_signatures(ENRICHMENT_PATH)
bioalpha.sc.read_gene_signatures(ENRICHMENT_PATH, adata=adata)
4. Calc AUCell
In this notebook, we set normalize = True
in order to visualize heatmap easier (which will introduce in next step).
bioalpha.sc.tl.aucell(adata, normalize=True, gene_sigs="gene_sigs")
If your have your own gene signature, pass argument to parameter gene_sigs
.
# bioalpha.sc.tl.aucell(adata, gene_sigs=gene_sigs, normalize=True)
5. Plot AUCell
5.1. Heatmap
Note that we only show the first 1000 cells in each group.
bioalpha.sc.pl.aucell_heatmap(adata, cmap="YlOrRd", dendorgram=False)
5.2. Plot with define gene sets
We now want to draw the AUCell score on a t-SNE or UMAP coordinate.
# Get 2D-array of t-SNE
bioalpha.sc.pp.pca(adata, n_comps=50)
bioalpha.sc.tl.tsne(adata)
bioalpha.sc.pl.aucell(adata, gene_sets=['MM79', 'MM80'], basis="tsne")
adata.file.close()