bioalpha.singlecell.preprocessing.harmony_integrate
- bioalpha.singlecell.preprocessing.harmony_integrate(adata: AnnData, key: str, basis: str = 'X_pca', adjusted_basis: str = 'X_pca_harmony', use_centroids: str | ndarray | None = None, copy=False, **kwargs) AnnData | None
Use harmony algorithm to integrate different experiments.
Harmony is an algorithm for integrating single-cell data from multiple experiments. As Harmony works by adjusting the principal components, this function should be run after performing PCA but before computing the neighbor graph, as illustrated in the example below.
- Parameters:
adata (
AnnData
) – The annotated data matrix of shapen_obs
xn_vars
. Rows correspond to cells and columns to genes.key (
str
) – The name of the column inadata.obs
that differentiates among experiments/batches.basis (
str
, default ="X_pca"
) – The name of the field inadata.obsm
where the PCA table is stored. Defaults to"X_pca"
, which is the default forsc.tl.pca()
.adjusted_basis (
str
, default ="X_pca_harmony"
) – The name of the field inadata.obsm
where the adjusted PCA table will be stored after running this function.use_centroids (Optional[
str
,np.ndarray
], default =None
) – Whether using precomputed centroids. Ifstr
type is passed, the name ofuse_centroids
must be in.uns
. Default is using centroids computed fromsc.tl.kmeans
.copy (
bool
, default =False
) – Whether return copied data.kwargs (
dict
) – Any additional arguments will be passed to_sctools.batch_correction.run_harmony
.
- Returns:
adata – If
copy=True
it returns or else adds fields toadata
:.obsm[
adjusted_basis
] Principal components adjusted by Harmony such that different experiments are integrated.
- Return type:
AnnData
Examples
First, load libraries and example dataset, and preprocess.
>>> from bioalpha import sc >>> adata = sc.datasets.pbmc3k() >>> sc.pp.recipe_zheng17(adata) >>> sc.tl.pca(adata)
We now arbitrarily assign a batch metadata variable to each cell for the sake of example, but during real usage there would already be a column in
adata.obs
giving the experiment each cell came from.>>> adata.obs["batch"] = 1350*["a"] + 1350*["b"]
Finally, run harmony. Afterwards, there will be a new table in
adata.obsm
containing the adjusted PC’s.>>> # Can run with sc.external.pp.harmony_integrate(adata, "batch") for Scanpy backward compatible >>> sc.pp.harmony_integrate(adata, "batch") >>> "X_pca_harmony" in adata.obsm True