bioalpha.singlecell.preprocessing.filter_genes

bioalpha.singlecell.preprocessing.filter_genes(adata: AnnData, min_counts: int | None = None, min_cells: int | None = None, max_counts: int | None = None, max_cells: int | None = None, inplace: bool = True, key_added: str | None = 'filter_genes_mask', layer: str | None = None, obs_mask: str | None = None, var_mask: str | None = None, **kwargs) Tuple[ndarray, ndarray] | None

Filter genes based on number of cells or counts.

Parameters:
  • adata (AnnData) – The annotated data matrix of shape n_obs * n_vars. Rows correspond to cells and columns to genes.

  • csr_mtx (csr_matrix) – (n_cells x n_genes) The csr sparse expression matrix.

  • min_counts (Optional[np.float32], default = None) – Minimum number of counts to keep genes.

  • min_cells (Optional[np.float32], default = None) – Minimum number of cells expressed to keep genes.

  • max_counts (Optional[np.float32], default = None) – Maximum number of counts to keep genes.

  • max_cells (Optional[np.float32], default = None) – Maximum number of cells expressed to keep genes.

  • inplace (bool, default = True) – Perform computation inplace or return result.

  • key_added (Optional[str], default = filter_genes_mask) – Name of the field in adata.var where the filter array is stored. Only for mapping data.

  • layer (Optional[str], default = None) – Layer to filtering instead of X. If None, X is used. Only for mapping data.

  • obs_mask (Optional[str], default = None) – If obs_mask is not None, filter cells by adata.obs[obs_mask].

  • var_mask (Optional[str], default = None) – If obs_mask is not None, filter genes by adata.obs[obs_mask].

  • **kwargs – Other parameters passed to BatchReader.

Returns:

  • Depending on inplace, returns the following arrays or directly subsets

  • and annotates the data matrix

  • gene_subset (ndarray) – Boolean index mask that does filtering. True means that the gene is kept. False means the gene is removed.

  • number_per_gene (ndarray) – Depending on what was tresholded (counts or cells), the array stores n_counts or n_cells per gene.