bioalpha.singlecell.tools.rank_genes_groups
- bioalpha.singlecell.tools.rank_genes_groups(adata: AnnData | H5ADMap, groupby: str, use_raw: bool | None = None, groups: Literal['all'] | Iterable[str] = 'all', reference: str = 'rest', n_genes: int | None = None, rankby_abs: bool = False, pts: bool = False, key_added: str | None = None, copy: bool | str = False, method: Literal['venice', 'logreg', 't-test', 'wilcoxon', 't-test_overestim_var'] | None = 'venice', corr_method: Literal['benjamini-hochberg', 'bonferroni'] = 'benjamini-hochberg', tie_correct: bool = False, dscore_correct: bool = True, layer: str | None = None, **kwds) AnnData | None
Rank genes for characterizing groups.
Expects logarithmized data.
- Parameters:
adata (
AnnData) – The annotated data matrix of shapen_obsxn_vars. Rows correspond to cells and columns to genes.groupby (
str) – The key of the observations grouping to consider.use_raw (Optional[
bool], defautl =None) – Userawattribute ofadataif present.groups (Union[
"all", Iterable[str]], default ="all") – Subset of groups, e.g. [“g1”, “g2”, “g3”], to which comparison shall be restricted, or “all” (default), for all groups.reference (
str, default ="rest") – If"rest", compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.n_genes (Optional[
int], default =None) – The number of genes that appear in the returned tables. Defaults to all genes.rankby_abs (
bool, default =False) – Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.pts (
bool, default =False) – Compute the fraction of cells expressing the genes.key_added (Optional[
str], default =None) – The key inadata.unsinformation is saved to.copy (
boolorstr, default =False) – Whether to modify copied input object. Ifadatais mapping AnnData, copy will beFalseorPath.method (Optional[Literal[
"venice","logreg","t-test","wilcoxon","t-test_overestim_var"]], default ="venice") –The default method is
"venice","venice"Using algorithm in this paper: https://www.biorxiv.org/content/10.1101/2020.11.16.384479v1.full"t-test_overestim_var"overestimates variance of each group,"wilcoxon"uses Wilcoxon rank-sum,"logreg"uses logistic regression,
corr_method (Literal[
"benjamini-hochberg","bonferroni"], default ="benjamini-hochberg") – p-value correction method. Used only for"t-test","t-test_overestim_var", and"wilcoxon".tie_correct (
bool, default =False) – Use tie correction for"wilcoxon"scores. Used only formethod = "wilcoxon".dscore_correct (
bool, default =True,) – Use dscore correction for"venice"scores. Used only formethod = "venice".layer (Optional[
str] , default =None) – Key fromadata.layerswhose value will be used to perform tests on.kwds (
dict) – Are passed to test methods.
- Returns:
**names** (structured
np.ndarray(.uns["rank_genes_groups"])) – Structured array to be indexed by group id storing the gene names. Ordered according to scores.**scores** (structured
np.ndarray(.uns["rank_genes_groups"])) – Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.**logfoldchanges** (structured
np.ndarray(.uns["rank_genes_groups"])) – Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is"t-test"like or"venice". Note: this is an approximation calculated from mean-log values.**pvals** (structured
np.ndarray(.uns["rank_genes_groups"])) – p-values.**pvals_adj** (structured
np.ndarray(.uns["rank_genes_groups"])) – Corrected p-values.**pts** (
pandas.DataFrame(.uns["rank_genes_groups"])) – Fraction of cells expressing the genes for each group.**pts_rest** (
pandas.DataFrame(.uns["rank_genes_groups"])) – Only ifreferenceis set to"rest". Fraction of cells from the union of the rest of each group expressing the genes.
Example
>>> from bioalpha import sc >>> adata = sc.datasets.pbmc68k_reduced() >>> sc.tl.rank_genes_groups(adata, "bulk_labels", method="venice") >>> # to visualize the results >>> sc.pl.rank_genes_groups_volcano(adata)