bioalpha.singlecell.preprocessing.subsample
- bioalpha.singlecell.preprocessing.subsample(data: AnnData | ndarray, fraction: float | None = None, n_obs: int | None = None, random_state: None | int | RandomState = 0, copy: bool = False, method: Literal['geosketching', 'random'] = 'geosketching', use_rep: str | None = None, subset_path: str | None = None, **kwargs) AnnData | Tuple[ndarray, ndarray] | None
Subsample to a fraction of the number of observations.
- Parameters:
data (Union[
AnnData
,ndarray
]) – The (annotated) data matrix of shapen_obs
xn_vars
. Rows correspond to cells and columns to genes.fraction (Optional[
float
], default =None
) – Subsample to thisfraction
of the number of observations.n_obs (Optional[
int
], default =None
) – Subsample to this number of observations. Not compatible withfraction
.random_state (
AnyRandom
, default =0
) – Random seed to change subsampling.copy (
bool
, default =False
) – Whether to modify copied input object.method (Literal[
"geosketching"
,"random"
], default ="geosketching"
) – Which method for subsampling. “geosketching” require anndarray
.use_rep (Optional[
str
], default =None
) – Use the indicated representation."X"
or any key for.obsm
is valid. IfNone
, the representation is chosen automatically: For.n_vars
< 50,.X
is used, otherwise “X_pca” is used. If “X_pca” is not present, it’s computed with default parameters. Ignore when input isAnnData
instance andmethod="random"
.subset_path (
str
, default =None
) – H5ADMap data do not support inplace subset, sosubset_path
will be passed into.diet_subset
function of H5ADMap data. This parameter will be ignored ifcopy=True
.kwargs (
dict
) – Any additional arguments will be passed to_sctools.sampling.geosketching
.
- Return type:
Returns
X[obs_indices], obs_indices
if data is array-like, otherwise subsamples the passed intoadata
(copy == False
) or returns a subsampled copy of it (copy == True
).