bioalpha.singlecell.preprocessing.calculate_qc_metrics
- bioalpha.singlecell.preprocessing.calculate_qc_metrics(adata: AnnData | H5ADMap, *, expr_type: str = 'counts', var_type: str = 'genes', qc_vars: Collection[str] = (), percent_top: Collection[int] | None = (50, 100, 200, 500), layer: str | None = None, use_raw: bool = False, inplace: bool = False, log1p: bool = True, parallel: bool | None = None, obs_mask: str | None = None, var_mask: str | None = None) Tuple[DataFrame, DataFrame] | None
Calculate quality control metrics. Calculates a number of qc metrics for an AnnData object, see section Returns for specifics. Largely based on calculateQCMetrics from scater. Currently is most efficient on a sparse CSR or dense matrix. Note that this method can take a while to compile on the first call. That result is then cached to disk to be used later.
- Parameters:
adata (Union[
AnnData
,H5ADMap
]) – The annotated or mapping data matrix of shapen_obs
*n_vars
. Rows correspond to cells and columns to genes.expr_type (
str
, default = “counts”) – Name of kind of values in X.var_type (
str
, default = “genes”) – The kind of thing the variables are.qc_vars (Collection[
str
], default = ()) – Keys for boolean columns of .var which identify variables you could want to control for (e.g. “ERCC” or “mito”).percent_top (Optional[Collection[
int
]], default =(50, 100, 200, 500)
) – Which proportions of top genes to cover. If empty or None don’t calculate. Values are considered 1-indexed,percent_top=[50]
finds cumulative proportion to the 50th most expressed gene.layer (Optional[
str
], default =None
) – If provided, useadata.layers[layer]
for expression values instead ofadata.X
.use_raw (
bool
, default =False
) – IfTrue
, useadata.raw.X
for expression values instead ofadata.X
.inplace (
bool
, default =False
) – Whether to place calculated metrics in adata’s.obs
and.var
.log1p (
bool
, default =True
) – Set to False to skip computinglog1p
transformed annotations.obs_mask (Optional[
str
], default =None
) – Ifobs_mask
is notNone
, filter cells byadata.obs[obs_mask]
.var_mask (Optional[
str
], default =None
) – Ifvar_mask
is notNone
, filter genes byadata.var[var_mask]
.
- Returns:
Depending on inplace returns calculated metrics (as
DataFrame
) or updates adata’s obs and var.Observation level metrics include –
total_{var_type}_by_{expr_type} E.g. “total_genes_by_counts”. Number of genes with positive counts in a cell.
total_{expr_type} E.g. “total_counts”. Total number of counts for a cell.
pct_{expr_type}_in_top_{n}_{var_type} For n in percent_top. E.g. “pct_counts_in_top_50_genes”. Cumulative percentage of counts for 50 most expressed genes in a cell.
total_{expr_type}_{qc_var} For qc_var in qc_vars. E.g. “total_counts_mito”. Total number of counts for variabes in qc_vars.
pct_{expr_type}_{qc_var} For qc_var in qc_vars. E.g. “pct_counts_mito”. Proportion of total counts for a cell which are mitochondrial.
Variable level metrics include –
total_{expr_type}: E.g. “total_counts”. Sum of counts for a gene.
n_genes_by_{expr_type}: E.g. “n_genes_by_counts”. The number of genes with at least 1 count in a cell. Calculated for all cells.
mean_{expr_type}: E.g. “mean_counts”. Mean expression over all cells.
n_cells_by_{expr_type}: E.g. “n_cells_by_counts”. Number of cells this expression is measured in.
pct_dropout_by_{expr_type}: E.g. “pct_dropout_by_counts”. Percentage of cells this feature does not appear in.