INFER Reference
API Documentation for the INFER submodule
Functions for computing the Information Entropy of Ranks
- gnatpy.infer_functions.infer_gene_set_entropy(expression_data: ndarray[Tuple[int, int], dtype[float32 | float64 | int64]] | DataFrame, sample_group1, sample_group2, gene_network, kernel_density_estimate: bool = True, bw_method: str | float | Callable[[gaussian_kde], float] | None = None, iterations: int = 1000, replace: bool = True, seed: int | None = None, processes=1) Tuple[float, float]
Calculate the difference in information entropy of ranks, and it’s significance
- Parameters:
expression_data (np.ndarray | pd.DataFrame) – Gene expression data, either a numpy array or a pandas DataFrame, with rows representing different samples, and columns representing different genes
sample_group1 – Which samples belong to group1. If expression_data is a numpy array, this should be a something able to index the rows of the array. If expression_data is a pandas dataframe, this should be something that can index rows of a dataframe inside a .loc (see pandas documentation for details)
sample_group2 – Which samples belong to group2, see sample_group1 information for more details.
gene_network – Which genes belong to the gene network. If expression_data is a numpy array, this should be something able to index the columns of the array. If expression_data is a pandas dataframe, this should be something be anything that can index columns of a dataframe inside a .loc (see pandas documentation for details)
kernel_density_estimate (bool) – Whether to use a kernel density estimate for calculating the p-value. If True, will use a Gaussian Kernel Density Estimate, if False will use an empirical CDF
bw_method (Optional[Union[str|float|Callable[[gaussian_kde], float]]]) – Bandwidth method, see scipy.stats.gaussian_kde for details
iterations (int) – Number of iterations to perform during bootstrapping the null distribution
replace (bool) – Whether to sample with replacement when randomly sampling from the sample groups during bootstrapping
seed (int) – Seed to use for the random number generation during bootstrapping
processes (int) – Number of processes to use during the bootstrapping, default 1
- Returns:
Tuple of the difference in information entropy of ranks, and the significance level found via bootstrapping
- Return type:
Tuple[float,float]
Notes
With INFER, having different sized sample groups will artificially inflate the significance of rank entropy differences between the groups of samples. This method should only be used when comparing identical sample sizes.