bioalpha.singlecell.tools.kmeans
- bioalpha.singlecell.tools.kmeans(data: AnnData, k: int, restrict_to: Tuple[str, Sequence[str]] | None = None, random_state: int = 0, key_added: str = 'kmeans', use_rep: str | None = None, n_pcs: int | None = None, copy: bool = False, return_info: bool = True, **kwargs) AnnData | None
Cluster cells into subgroups. Cluster cells using the K-means algorithm.
- Parameters:
adata (
AnnData
) – The annotated data matrix of shapen_obs
xn_vars
. Rows correspond to cells and columns to genes.k (int) – The number of clusters
restrict_to (Optional[Tuple[
str
, Sequence[str
]]], default =None
) – Restrict the clustering to the categories within the key for sample annotation, tuple needs to contain(obs_key, list_of_categories)
.random_state (Optional[Union[
int
,RandomState
]], default =0
) – Change the initialization of the optimization.key_added (
str
, default ="leiden"
) –adata.obs
key under which to add the cluster labels.n_pcs (Optional[
int
], default =None
,) – Use this many PCs. Ifn_pcs==0
use.X
ifuse_rep is None
.use_rep (Optional[
str
], default =None
) – Use the indicated representation."X"
or any key for.obsm
is valid. IfNone
, the representation is chosen automatically: For.n_vars
< 50,.X
is used, otherwise “X_pca” is used. If “X_pca” is not present, it’s computed with default parameters.copy (
bool
, default =False
) – Whether to copyadata
or modify it inplace.return_info (
bool
, default =True
) – Whether returningcentroids
andtotal_distances
kwargs (
dict
) – Any further arguments to pass to_sctools.clustering.kmeans
(which in turn passes arguments to thepartition_type
).
- Returns:
adata – If
copy=True
it returns or else adds fields toadata
:.obs[
key_added
] Array of dim (number of samples) that stores the subgroup id ("0"
,"1"
, …) for each cell..uns[
key_added
]["centroids"
] The centroids of each clusters. Only forreturn_info = True
..uns[
key_added
]["total_distances"
] Total distances from cells to centroids. Only forreturn_info = True
.
- Return type:
AnnData