bioalpha.singlecell.tools.rank_genes_groups

bioalpha.singlecell.tools.rank_genes_groups(adata: AnnData | H5ADMap, groupby: str, use_raw: bool | None = None, groups: Literal['all'] | Iterable[str] = 'all', reference: str = 'rest', n_genes: int | None = None, rankby_abs: bool = False, pts: bool = False, key_added: str | None = None, copy: bool | str = False, method: Literal['venice', 'logreg', 't-test', 'wilcoxon', 't-test_overestim_var'] | None = 'venice', corr_method: Literal['benjamini-hochberg', 'bonferroni'] = 'benjamini-hochberg', tie_correct: bool = False, dscore_correct: bool = True, layer: str | None = None, **kwds) AnnData | None

Rank genes for characterizing groups.

Expects logarithmized data.

Parameters:
  • adata (AnnData) – The annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to genes.

  • groupby (str) – The key of the observations grouping to consider.

  • use_raw (Optional[bool], defautl = None) – Use raw attribute of adata if present.

  • groups (Union["all", Iterable[str]], default = "all") – Subset of groups, e.g. [“g1”, “g2”, “g3”], to which comparison shall be restricted, or “all” (default), for all groups.

  • reference (str, default = "rest") – If "rest", compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.

  • n_genes (Optional[int], default = None) – The number of genes that appear in the returned tables. Defaults to all genes.

  • rankby_abs (bool, default = False) – Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.

  • pts (bool, default = False) – Compute the fraction of cells expressing the genes.

  • key_added (Optional[str], default = None) – The key in adata.uns information is saved to.

  • copy (bool or str, default = False) – Whether to modify copied input object. If adata is mapping AnnData, copy will be False or Path.

  • method (Optional[Literal["venice", "logreg", "t-test", "wilcoxon", "t-test_overestim_var"]], default = "venice") –

    The default method is "venice",

  • corr_method (Literal["benjamini-hochberg", "bonferroni"], default = "benjamini-hochberg") – p-value correction method. Used only for "t-test", "t-test_overestim_var", and "wilcoxon".

  • tie_correct (bool, default = False) – Use tie correction for "wilcoxon" scores. Used only for method = "wilcoxon".

  • dscore_correct (bool, default = True,) – Use dscore correction for "venice" scores. Used only for method = "venice".

  • layer (Optional[str] , default = None) – Key from adata.layers whose value will be used to perform tests on.

  • kwds (dict) – Are passed to test methods.

Returns:

  • **names** (structured np.ndarray (.uns["rank_genes_groups"])) – Structured array to be indexed by group id storing the gene names. Ordered according to scores.

  • **scores** (structured np.ndarray (.uns["rank_genes_groups"])) – Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.

  • **logfoldchanges** (structured np.ndarray (.uns["rank_genes_groups"])) – Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is "t-test" like or "venice". Note: this is an approximation calculated from mean-log values.

  • **pvals** (structured np.ndarray (.uns["rank_genes_groups"])) – p-values.

  • **pvals_adj** (structured np.ndarray (.uns["rank_genes_groups"])) – Corrected p-values.

  • **pts** (pandas.DataFrame (.uns["rank_genes_groups"])) – Fraction of cells expressing the genes for each group.

  • **pts_rest** (pandas.DataFrame (.uns["rank_genes_groups"])) – Only if reference is set to "rest". Fraction of cells from the union of the rest of each group expressing the genes.

Example

>>> from bioalpha import sc
>>> adata = sc.datasets.pbmc68k_reduced()
>>> sc.tl.rank_genes_groups(adata, "bulk_labels", method="venice")
>>> # to visualize the results
>>> sc.pl.rank_genes_groups_volcano(adata)