bioalpha.singlecell.preprocessing.log_normalize

bioalpha.singlecell.preprocessing.log_normalize(adata: AnnData | H5ADMap, target_sum: float = 10000.0, base: int | float | None = None, layer: str | None = None, inplace: bool = True, copy: bool = False, key_added: str | None = None, obs_mask: str | None = None, var_mask: str | None = None, **kwargs) dict | AnnData | None

Normalize counts per cell and calculate log1p.

Parameters:
  • adata (AnnData) – The annotated data matrix of shape n_obs x n_vars. Rows correspond to cells and columns to genes.

  • target_sum (float, default = 1e4) – The value to scale while normalizing.

  • base (Optional[Union[int, float]], default = None) – Base of the logarithm. Natural logarithm is used by default.

  • layer (Optional[str], default = None) – Layer to normalize instead of X. If None, X is normalized.

  • inplace (bool, default = True) – Whether to update adata or return dictionary with normalized copies of adata.X and adata.layers.

  • copy (bool, default = False) – Whether to modify copied input object. Not compatible with inplace = False.

  • key_added (Optional[str], default = None) – Name of the field in adata.layers where the normalized data is stored. None means do not add to adata.layers.

  • obs_mask (Optional[str], default = None) – If obs_mask is not None, filter cells by adata.obs[obs_mask].

  • var_mask (Optional[str], default = None) – If obs_mask is not None, filter genes by adata.obs[obs_mask].

  • **kwargs – Other arguments passed to BatchReader

Return type:

Returns dictionary with log1p normalized copies of adata.X and adata.layers or updates adata with log1p normalized version of the original adata.X and adata.layers, depending on inplace.

Example

>>> from anndata import AnnData
>>> from bioalpha import sc
>>> sc.settings.verbosity = 2
>>> np.set_printoptions(precision=2)
>>> adata = AnnData(np.array([
...    [3, 3, 3, 6, 6],
...    [1, 1, 1, 2, 2],
...    [1, 22, 1, 2, 2],
... ]))
>>> adata.X
array([[ 3.,  3.,  3.,  6.,  6.],
       [ 1.,  1.,  1.,  2.,  2.],
       [ 1., 22.,  1.,  2.,  2.]], dtype=float32)
>>> X_norm = sc.pp.log_normalize(adata, target_sum=1, inplace=False)["X"]
>>> X_norm.toarray()
array([[0.13, 0.13, 0.13, 0.25, 0.25],
       [0.13, 0.13, 0.13, 0.25, 0.25],
       [0.04, 0.58, 0.04, 0.07, 0.07]], dtype=float32)