INFER Reference

API Documentation for the INFER submodule

Functions for computing the Information Entropy of Ranks

gnatpy.infer_functions.infer_gene_set_entropy(expression_data: ndarray[Tuple[int, int], dtype[float32 | float64 | int64]] | DataFrame, sample_group1, sample_group2, gene_network, kernel_density_estimate: bool = True, bw_method: str | float | Callable[[gaussian_kde], float] | None = None, iterations: int = 1000, replace: bool = True, seed: int | None = None, processes=1) Tuple[float, float]

Calculate the difference in information entropy of ranks, and it’s significance

Parameters:
  • expression_data (np.ndarray | pd.DataFrame) – Gene expression data, either a numpy array or a pandas DataFrame, with rows representing different samples, and columns representing different genes

  • sample_group1 – Which samples belong to group1. If expression_data is a numpy array, this should be a something able to index the rows of the array. If expression_data is a pandas dataframe, this should be something that can index rows of a dataframe inside a .loc (see pandas documentation for details)

  • sample_group2 – Which samples belong to group2, see sample_group1 information for more details.

  • gene_network – Which genes belong to the gene network. If expression_data is a numpy array, this should be something able to index the columns of the array. If expression_data is a pandas dataframe, this should be something be anything that can index columns of a dataframe inside a .loc (see pandas documentation for details)

  • kernel_density_estimate (bool) – Whether to use a kernel density estimate for calculating the p-value. If True, will use a Gaussian Kernel Density Estimate, if False will use an empirical CDF

  • bw_method (Optional[Union[str|float|Callable[[gaussian_kde], float]]]) – Bandwidth method, see scipy.stats.gaussian_kde for details

  • iterations (int) – Number of iterations to perform during bootstrapping the null distribution

  • replace (bool) – Whether to sample with replacement when randomly sampling from the sample groups during bootstrapping

  • seed (int) – Seed to use for the random number generation during bootstrapping

  • processes (int) – Number of processes to use during the bootstrapping, default 1

Returns:

Tuple of the difference in information entropy of ranks, and the significance level found via bootstrapping

Return type:

Tuple[float,float]

Notes

With INFER, having different sized sample groups will artificially inflate the significance of rank entropy differences between the groups of samples. This method should only be used when comparing identical sample sizes.