bioalpha.singlecell.preprocessing.subsample
- bioalpha.singlecell.preprocessing.subsample(data: AnnData | ndarray, fraction: float | None = None, n_obs: int | None = None, random_state: None | int | RandomState = 0, copy: bool = False, method: Literal['geosketching', 'random'] = 'geosketching', use_rep: str | None = None, subset_path: str | None = None, **kwargs) AnnData | Tuple[ndarray, ndarray] | None
Subsample to a fraction of the number of observations.
- Parameters:
data (Union[
AnnData,ndarray]) – The (annotated) data matrix of shapen_obsxn_vars. Rows correspond to cells and columns to genes.fraction (Optional[
float], default =None) – Subsample to thisfractionof the number of observations.n_obs (Optional[
int], default =None) – Subsample to this number of observations. Not compatible withfraction.random_state (
AnyRandom, default =0) – Random seed to change subsampling.copy (
bool, default =False) – Whether to modify copied input object.method (Literal[
"geosketching","random"], default ="geosketching") – Which method for subsampling. “geosketching” require anndarray.use_rep (Optional[
str], default =None) – Use the indicated representation."X"or any key for.obsmis valid. IfNone, the representation is chosen automatically: For.n_vars< 50,.Xis used, otherwise “X_pca” is used. If “X_pca” is not present, it’s computed with default parameters. Ignore when input isAnnDatainstance andmethod="random".subset_path (
str, default =None) – H5ADMap data do not support inplace subset, sosubset_pathwill be passed into.diet_subsetfunction of H5ADMap data. This parameter will be ignored ifcopy=True.kwargs (
dict) – Any additional arguments will be passed to_sctools.sampling.geosketching.
- Return type:
Returns
X[obs_indices], obs_indicesif data is array-like, otherwise subsamples the passed intoadata(copy == False) or returns a subsampled copy of it (copy == True).