bioalpha.singlecell.tools.kmeans

bioalpha.singlecell.tools.kmeans(data: AnnData, k: int, restrict_to: Tuple[str, Sequence[str]] | None = None, random_state: int = 0, key_added: str = 'kmeans', use_rep: str | None = None, n_pcs: int | None = None, copy: bool = False, return_info: bool = True, **kwargs) → AnnData | None

Cluster cells into subgroups. Cluster cells using the K-means algorithm.

Parameters:

adata (AnnData) – The annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to genes.
k (int) – The number of clusters
restrict_to (Optional[Tuple[str, Sequence[str]]], default = None) – Restrict the clustering to the categories within the key for sample annotation, tuple needs to contain (obs_key, list_of_categories).
random_state (Optional[Union[int, RandomState]], default = 0) – Change the initialization of the optimization.
key_added (str, default = "leiden") – adata.obs key under which to add the cluster labels.
n_pcs (Optional[int], default = None,) – Use this many PCs. If n_pcs==0 use .X if use_rep is None.
use_rep (Optional[str], default = None) – Use the indicated representation. "X" or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < 50, .X is used, otherwise “X_pca” is used. If “X_pca” is not present, it’s computed with default parameters.
copy (bool, default = False) – Whether to copy adata or modify it inplace.
return_info (bool, default = True) – Whether returning centroids and total_distances
kwargs (dict) – Any further arguments to pass to _sctools.clustering.kmeans (which in turn passes arguments to the partition_type).

Returns:

adata – If copy=True it returns or else adds fields to adata:

.obs[key_added] Array of dim (number of samples) that stores the subgroup id ("0", "1", …) for each cell.
.uns[key_added]["centroids"] The centroids of each clusters. Only for return_info = True.
.uns[key_added]["total_distances"] Total distances from cells to centroids. Only for return_info = True.

Return type:

AnnData