bioalpha.singlecell.tools.rank_genes_groups
- bioalpha.singlecell.tools.rank_genes_groups(adata: AnnData | H5ADMap, groupby: str, use_raw: bool | None = None, groups: Literal['all'] | Iterable[str] = 'all', reference: str = 'rest', n_genes: int | None = None, rankby_abs: bool = False, pts: bool = False, key_added: str | None = None, copy: bool | str = False, method: Literal['venice', 'logreg', 't-test', 'wilcoxon', 't-test_overestim_var'] | None = 'venice', corr_method: Literal['benjamini-hochberg', 'bonferroni'] = 'benjamini-hochberg', tie_correct: bool = False, dscore_correct: bool = True, layer: str | None = None, **kwds) AnnData | None
Rank genes for characterizing groups.
Expects logarithmized data.
- Parameters:
adata (
AnnData
) – The annotated data matrix of shapen_obs
xn_vars
. Rows correspond to cells and columns to genes.groupby (
str
) – The key of the observations grouping to consider.use_raw (Optional[
bool
], defautl =None
) – Useraw
attribute ofadata
if present.groups (Union[
"all"
, Iterable[str
]], default ="all"
) – Subset of groups, e.g. [“g1”, “g2”, “g3”], to which comparison shall be restricted, or “all” (default), for all groups.reference (
str
, default ="rest"
) – If"rest"
, compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.n_genes (Optional[
int
], default =None
) – The number of genes that appear in the returned tables. Defaults to all genes.rankby_abs (
bool
, default =False
) – Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.pts (
bool
, default =False
) – Compute the fraction of cells expressing the genes.key_added (Optional[
str
], default =None
) – The key inadata.uns
information is saved to.copy (
bool
orstr
, default =False
) – Whether to modify copied input object. Ifadata
is mapping AnnData, copy will beFalse
orPath
.method (Optional[Literal[
"venice"
,"logreg"
,"t-test"
,"wilcoxon"
,"t-test_overestim_var"
]], default ="venice"
) –The default method is
"venice"
,"venice"
Using algorithm in this paper: https://www.biorxiv.org/content/10.1101/2020.11.16.384479v1.full"t-test_overestim_var"
overestimates variance of each group,"wilcoxon"
uses Wilcoxon rank-sum,"logreg"
uses logistic regression,
corr_method (Literal[
"benjamini-hochberg"
,"bonferroni"
], default ="benjamini-hochberg"
) – p-value correction method. Used only for"t-test"
,"t-test_overestim_var"
, and"wilcoxon"
.tie_correct (
bool
, default =False
) – Use tie correction for"wilcoxon"
scores. Used only formethod = "wilcoxon"
.dscore_correct (
bool
, default =True
,) – Use dscore correction for"venice"
scores. Used only formethod = "venice"
.layer (Optional[
str
] , default =None
) – Key fromadata.layers
whose value will be used to perform tests on.kwds (
dict
) – Are passed to test methods.
- Returns:
**names** (structured
np.ndarray
(.uns["rank_genes_groups"]
)) – Structured array to be indexed by group id storing the gene names. Ordered according to scores.**scores** (structured
np.ndarray
(.uns["rank_genes_groups"]
)) – Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.**logfoldchanges** (structured
np.ndarray
(.uns["rank_genes_groups"]
)) – Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is"t-test"
like or"venice"
. Note: this is an approximation calculated from mean-log values.**pvals** (structured
np.ndarray
(.uns["rank_genes_groups"]
)) – p-values.**pvals_adj** (structured
np.ndarray
(.uns["rank_genes_groups"]
)) – Corrected p-values.**pts** (
pandas.DataFrame
(.uns["rank_genes_groups"]
)) – Fraction of cells expressing the genes for each group.**pts_rest** (
pandas.DataFrame
(.uns["rank_genes_groups"]
)) – Only ifreference
is set to"rest"
. Fraction of cells from the union of the rest of each group expressing the genes.
Example
>>> from bioalpha import sc >>> adata = sc.datasets.pbmc68k_reduced() >>> sc.tl.rank_genes_groups(adata, "bulk_labels", method="venice") >>> # to visualize the results >>> sc.pl.rank_genes_groups_volcano(adata)