bioalpha.singlecell.preprocessing.pca

bioalpha.singlecell.preprocessing.pca(data: AnnData | ndarray | spmatrix, n_comps: int | None = None, return_info: bool = False, use_highly_variable: bool | None = None, layer: str | None = None, rep_name='X_pca', copy: bool | str = False, csr_key: str | None = None, obs_mask: str | None = None, var_mask: str | None = None, metric: Literal['exact_pca', 'auto'] = 'auto') AnnData | None

Principal component analysis.

Computes PCA coordinates, loadings and variance decomposition. Uses _sctools.dimred.pca implementation.

Parameters:
  • adata (Union[AnnData, ndarray, spmatrix]) – The (annotated) data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to genes.

  • n_comps (int, default = None) – Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation.

  • return_info (bool, default = False) – Only relevant when not passing an AnnData. See “Returns”.

  • use_highly_variable (bool, default = None) – Whether to use highly variable genes only, stored in .var["highly_variable"]. By default uses them if they have been determined beforehand.

  • rep_name (str, default = "X_pca") – Representation name that will be saved in adata.obsm

  • layer (Optional[str], default = None) – Layer to normalize instead of X. If None, X is normalized.

  • copy (bool, default = False) – If an AnnData is passed, determines whether a copy is returned. Is ignored otherwise. If adata is mapping AnnData, copy will be False or Path.

  • csr_key (Optional[str], default = None) – Key for run to_csr() when run on disk. If csr_key existed, we just loading it and do not rerun .to_csr. Be careful when using same csr_key on other matrices.

  • obs_mask (Optional[str], default = None) – If obs_mask is not None, filter cells by adata.obs[obs_mask].

  • var_mask (Optional[str], default = None) – If obs_mask is not None, filter genes by adata.obs[obs_mask].

  • metric (Literal[“exact_pca”, “auto”], default = auto) – If metric is auto, run history_pca with data has more then 100000 cells. This parameter is ineffective with H5ADMap.

Returns:

  • X_pca (spmatrix, ndarray) – If data is array-like and return_info=False was passed, this function only returns X_pca

  • adata (AnnData) –

    If copy=True it returns or else adds fields to adata:

    • .obsm[rep_name]: PCA representation of data.

    • .varm["PCs"] or .varm[rep_name+"_PCs"]: The principal components containing the loadings.