bioalpha.singlecell.tools.rank_genes_groups

bioalpha.singlecell.tools.rank_genes_groups(adata: AnnData | H5ADMap, groupby: str, use_raw: bool | None = None, groups: Literal['all'] | Iterable[str] = 'all', reference: str = 'rest', n_genes: int | None = None, rankby_abs: bool = False, pts: bool = False, key_added: str | None = None, copy: bool | str = False, method: Literal['venice', 'logreg', 't-test', 'wilcoxon', 't-test_overestim_var'] | None = 'venice', corr_method: Literal['benjamini-hochberg', 'bonferroni'] = 'benjamini-hochberg', tie_correct: bool = False, dscore_correct: bool = True, layer: str | None = None, **kwds) → AnnData | None

Rank genes for characterizing groups.

Expects logarithmized data.

Parameters:

adata (AnnData) – The annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to genes.
groupby (str) – The key of the observations grouping to consider.
use_raw (Optional[bool], defautl = None) – Use raw attribute of adata if present.
groups (Union["all", Iterable[str]], default = "all") – Subset of groups, e.g. [“g1”, “g2”, “g3”], to which comparison shall be restricted, or “all” (default), for all groups.
reference (str, default = "rest") – If "rest", compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.
n_genes (Optional[int], default = None) – The number of genes that appear in the returned tables. Defaults to all genes.
rankby_abs (bool, default = False) – Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.
pts (bool, default = False) – Compute the fraction of cells expressing the genes.
key_added (Optional[str], default = None) – The key in adata.uns information is saved to.
copy (bool or str, default = False) – Whether to modify copied input object. If adata is mapping AnnData, copy will be False or Path.
method (Optional[Literal["venice", "logreg", "t-test", "wilcoxon", "t-test_overestim_var"]], default = "venice") –
The default method is "venice",
- "venice" Using algorithm in this paper: https://www.biorxiv.org/content/10.1101/2020.11.16.384479v1.full
- "t-test_overestim_var" overestimates variance of each group,
- "wilcoxon" uses Wilcoxon rank-sum,
- "logreg" uses logistic regression,
corr_method (Literal["benjamini-hochberg", "bonferroni"], default = "benjamini-hochberg") – p-value correction method. Used only for "t-test", "t-test_overestim_var", and "wilcoxon".
tie_correct (bool, default = False) – Use tie correction for "wilcoxon" scores. Used only for method = "wilcoxon".
dscore_correct (bool, default = True,) – Use dscore correction for "venice" scores. Used only for method = "venice".
layer (Optional[str] , default = None) – Key from adata.layers whose value will be used to perform tests on.
kwds (dict) – Are passed to test methods.

Returns:

**names** (structured np.ndarray (.uns["rank_genes_groups"])) – Structured array to be indexed by group id storing the gene names. Ordered according to scores.
**scores** (structured np.ndarray (.uns["rank_genes_groups"])) – Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.
**logfoldchanges** (structured np.ndarray (.uns["rank_genes_groups"])) – Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is "t-test" like or "venice". Note: this is an approximation calculated from mean-log values.
**pvals** (structured np.ndarray (.uns["rank_genes_groups"])) – p-values.
**pvals_adj** (structured np.ndarray (.uns["rank_genes_groups"])) – Corrected p-values.
**pts** (pandas.DataFrame (.uns["rank_genes_groups"])) – Fraction of cells expressing the genes for each group.
**pts_rest** (pandas.DataFrame (.uns["rank_genes_groups"])) – Only if reference is set to "rest". Fraction of cells from the union of the rest of each group expressing the genes.

Example

>>> from bioalpha import sc
>>> adata = sc.datasets.pbmc68k_reduced()
>>> sc.tl.rank_genes_groups(adata, "bulk_labels", method="venice")
>>> # to visualize the results
>>> sc.pl.rank_genes_groups_volcano(adata)