DIRAC Reference
API Documentation for the DIRAC submodule
Functions for computing differential rank conservation (DIRAC)
- gnatpy.dirac_functions.dirac_gene_set_classification(expression_data: ndarray[Tuple[int, int], dtype[float32 | float64 | int64]] | DataFrame, sample_group1, sample_group2, gene_network, kernel_density_estimate: bool = True, bw_method: str | float | Callable[[gaussian_kde], float] | None = None, iterations: int = 10000, replace: bool = True, seed: int | None = None, processes=1) Tuple[float, float]
Calculate the classification rate using DIRAC rank difference scores for a given network and its significance
- Parameters:
expression_data (np.ndarray | pd.DataFrame) – Gene expression data, either a numpy array or a pandas DataFrame, with rows representing different samples, and columns representing different genes
sample_group1 – Which samples belong to each group. If expression_data is a numpy array, this should be a something able to index the rows of the array. If expression_data is a pandas dataframe, this should be something that can index rows of a dataframe inside a .loc (see pandas documentation for details)
sample_group2 – Which samples belong to each group. If expression_data is a numpy array, this should be a something able to index the rows of the array. If expression_data is a pandas dataframe, this should be something that can index rows of a dataframe inside a .loc (see pandas documentation for details)
gene_network – Which genes belong to the gene network. If expression_data is a numpy array, this should be something able to index the columns of the array. If expression_data is a pandas dataframe, this should be something be anything that can index columns of a dataframe inside a .loc (see pandas documentation for details)
kernel_density_estimate (bool) – Whether to use a kernel density estimate for calculating the p-value. If True, will use a Gaussian Kernel Density Estimate, if False will use an empirical CDF
bw_method (Optional[Union[str|float|Callable[[gaussian_kde], float]]]) – Bandwidth method, see scipy.stats.gaussian_kde for details
iterations (int) – Number of iterations to perform during bootstrapping the null distribution
replace (bool) – Whether to sample with replacement when randomly sampling from the sample groups during bootstrapping
seed (int) – Seed to use for the random number generation during bootstrapping
processes (int) – Number of processes to use during the bootstrapping, default 1
- Returns:
Tuple of the classification rate, and the significance level found via bootstrapping
- Return type:
Tuple[float,float]
- gnatpy.dirac_functions.dirac_gene_set_entropy(expression_data: ndarray[Tuple[int, int], dtype[float32 | float64 | int64]] | DataFrame, sample_group1, sample_group2, gene_network, kernel_density_estimate: bool = True, bw_method: str | float | Callable[[gaussian_kde], float] | None = None, iterations: int = 1000, replace: bool = True, seed: int | None = None, processes=1) Tuple[float, float]
Calculate the difference in rank conservation indices, and its significance
- Parameters:
expression_data (np.ndarray or pd.DataFrame) – Gene expression data, either a numpy array or a pandas DataFrame, with rows representing different samples, and columns representing different genes
sample_group1 – Which samples belong to each group. If expression_data is a numpy array, this should be a something able to index the rows of the array. If expression_data is a pandas dataframe, this should be something that can index rows of a dataframe inside a .loc (see pandas documentation for details)
sample_group2 – Which samples belong to each group. If expression_data is a numpy array, this should be a something able to index the rows of the array. If expression_data is a pandas dataframe, this should be something that can index rows of a dataframe inside a .loc (see pandas documentation for details)
gene_network – Which genes belong to the gene network. If expression_data is a numpy array, this should be something able to index the columns of the array. If expression_data is a pandas dataframe, this should be something be anything that can index columns of a dataframe inside a .loc (see pandas documentation for details)
kernel_density_estimate (bool) – Whether to use a kernel density estimate for calculating the p-value. If True, will use a Gaussian Kernel Density Estimate, if False will use an empirical CDF
bw_method (str or float or Callable[[gaussian_kde], float], optional) –
Bandwidth method, see scipy.stats.gaussian_kde for details
iterations (int) – Number of iterations to perform during bootstrapping the null distribution
replace (bool) – Whether to sample with replacement when randomly sampling from the sample groups during bootstrapping
seed (int) – Seed to use for the random number generation during bootstrapping
processes (int) – Number of processes to use during the bootstrapping, default 1
- Returns:
Tuple of the difference in rank conservation index, and the significance level found via bootstrapping
- Return type:
tuple of float,float
- gnatpy.dirac_functions.dirac_multiway_classification(expression_data: ndarray[Tuple[int, int], dtype[float32 | float64 | int64]] | DataFrame, sample_groups: Iterable[ndarray[Tuple[int], dtype[float32 | float64 | int64]]] | Iterable[Iterable[Hashable]], gene_network: ndarray[Tuple[int], dtype[float32 | float64 | int64]] | Iterable[Hashable], kernel_density_estimate: bool = True, bw_method: str | float | Callable[[gaussian_kde], float] | None = None, iterations: int = 1000, replace: bool = True, seed: int | None = None, processes: int = -1) Tuple[float, float]
Calculate the DIRAC multiway rank classification, an extension of DIRAC classification rate to more than 2 groups
- Parameters:
expression_data (Array2D or pd.DataFrame) – Gene expression data, either a numpy array or a pandas dataframe, with rows representing different samples, and columns representing different genes
sample_groups (Iterable of Array1D or Iterable of Iterable of Hashable) – The sample groups to compare, can be an iterable of numpy arrays with integer indices, or an iterable of iterables of values used to index a pandas DataFrame (if the expression data is a DataFrame)
gene_network (Array1D or Iterable of Hashable) – Which genes belong to the gene network, can be a numpy array with integer indices, or an iterable of values used to index a pandas DataFrame (if the expression data is a DataFrame)
kernel_density_estimate (bool) – Whether to use a kernel density estimate for calculating the p-value. If True, will use a Gaussian Kernel Density Estimate, if False will use an empirical CDF
bw_method (Optional[Union[str|float|Callable[[gaussian_kde], float]]]) –
Bandwidth method, see scipy.stats.gaussian_kde for details
iterations (int) – Number of iterations to perform during bootstrapping the null distribution
replace (bool) – Whether to sample with replacement when randomly sampling from the sample groups during bootstrapping
seed (int) – Seed to use for the random number generation during bootstrapping
processes (int) – Number of processes to use during the bootstrapping, default 1
- Returns:
Tuple of the multiway DIRAC statistic, and the significance level found via bootstrapping
- Return type:
Tuple of (float,float)