INFER Reference

API Documentation for the INFER submodule

Functions for computing the Information Entropy of Ranks

gnatpy.infer_functions.infer_gene_set_entropy(expression_data: ndarray[Tuple[int, int], dtype[float32 | float64 | int64]] | DataFrame, sample_group1, sample_group2, gene_network, kernel_density_estimate: bool = True, bw_method: str | float | Callable[[gaussian_kde], float] | None = None, iterations: int = 1000, replace: bool = True, seed: int | None = None, processes=1) Tuple[float, float]

Calculate the difference in information entropy of ranks, and it’s significance

Parameters:
  • expression_data (np.ndarray | pd.DataFrame) – Gene expression data, either a numpy array or a pandas DataFrame, with rows representing different samples, and columns representing different genes

  • sample_group1 – Which samples belong to group1. If expression_data is a numpy array, this should be a something able to index the rows of the array. If expression_data is a pandas dataframe, this should be something that can index rows of a dataframe inside a .loc (see pandas documentation for details)

  • sample_group2 – Which samples belong to group2, see sample_group1 information for more details.

  • gene_network – Which genes belong to the gene network. If expression_data is a numpy array, this should be something able to index the columns of the array. If expression_data is a pandas dataframe, this should be something be anything that can index columns of a dataframe inside a .loc (see pandas documentation for details)

  • kernel_density_estimate (bool) – Whether to use a kernel density estimate for calculating the p-value. If True, will use a Gaussian Kernel Density Estimate, if False will use an empirical CDF

  • bw_method (Optional[Union[str|float|Callable[[gaussian_kde], float]]]) – Bandwidth method, see scipy.stats.gaussian_kde for details

  • iterations (int) – Number of iterations to perform during bootstrapping the null distribution

  • replace (bool) – Whether to sample with replacement when randomly sampling from the sample groups during bootstrapping

  • seed (int) – Seed to use for the random number generation during bootstrapping

  • processes (int) – Number of processes to use during the bootstrapping, default 1

Returns:

Tuple of the difference in information entropy of ranks, and the significance level found via bootstrapping

Return type:

Tuple[float,float]