qbm.utils package
Submodules
qbm.utils.discretization module
- class qbm.utils.discretization.Discretizer(df, n_bits, epsilon={})
Bases:
object
- bit_array_to_df(bit_array)
Converts bit array a dataframe of floats.
- Parameters:
bit_array – Bit array which to convert.
returns – Dataframe of shape (bit_array.shape[0], len(self.columns)).
- static bit_vector_to_int(bit_vector)
Converts a bit vector to a bit string.
- Parameters:
bit_vector – Input bit vector.
- Returns:
Bit string of the input bit vector.
- static bit_vector_to_string(bit_vector)
Converts a bit vector to a bit string.
- Parameters:
bit_vector – Input bit vector.
- Returns:
Bit string of the input bit vector.
- df_to_bit_array(df)
Converts a dataframe of floats to a bit array.
- Parameters:
df – Dataframe which to convert.
returns – Array of bits of shape (df.shape[0], self.n_bits_total).
- discretize = <numpy.vectorize object>
- discretize_df(df)
Convert all columns of a dataframe to bit representation.
- Parameters:
df – Dataframe which to convert.
- Returns:
A discretized version of df.
- static int_to_bit_vector(x, n_bits)
Converts the integer x to an n_bits-bit bit vector.
- Parameters:
x – Integer value which to convert.
n_bits – Length of the bit vector.
- Returns:
Bit vector of length n_bits.
- undiscretize = <numpy.vectorize object>
- undiscretize_df(df)
Convert all columns of a dataframe to floats from bit representation.
- Parameters:
df – Dataframe which to convert.
- Returns:
An undiscretized version of df_discretized.
qbm.utils.misc module
- qbm.utils.misc.df_ensemble_stats(dfs)
Computes the means, medians, and standard deviations column/row-wise over the input list of dataframes.
- Parameters:
dfs – List of dataframes with identical row/column names.
- Returns:
Dictionary of dataframes with the means, medians, and standard deviations.
- qbm.utils.misc.df_stats(df)
Compute the min, max, mean, median, and standard deviation of the columns in the dataframe.
- Parameters:
df – Dataframe.
- Returns:
Dataframe of the statistics.
- qbm.utils.misc.filter_df_on_values(df, column_values, drop_filter_columns=True)
Return a copy of the dataframe filtered conditionally on provided column values.
- Parameters:
df – Dataframe to filter.
column_values – Dictionary where the keys are column names, and the values are values on which to filter the dataframe.
drop_filter_columns – If True returns a copy of the dataframe with the filtered columns dropped.
- Returns:
A dataframe filtered conditionally on the provided column values.
- qbm.utils.misc.get_project_dir()
Gets the project directory path from the environment and checks if it is valid.
- Returns:
Path object of the project directory.
- qbm.utils.misc.get_rng(seed=None)
Creates a random number generator with the specified seed value.
- Parameters:
seed – Seed value for the rng.
- Returns:
Numpy RandomState object.
- qbm.utils.misc.kl_divergence(p_data, q_data, n_bins=32, epsilon_smooth=None, relative_smooth=False)
Computes the D_KL(p_data || p_samples).
Note: this is a crude approximation of the KL divergence.
- Parameters:
p_data – Array of data values to compute the p distribution from.
q_data – Array of data values to compute the q distribution from.
n_bins – Number of bins to use in histograms.
epsilon_smooth – Value to use with q distribution smoothing.
relative_smooth – Whether or not the smoothed values are relative to the p distribution.
- Returns:
D_KL(p || q).
- qbm.utils.misc.load_artifact(file_path)
Loads a pickle or json artifact (depending on the file extension).
- Parameters:
file_path – Path of the file to load.
- Returns:
Loaded python object.
- qbm.utils.misc.save_artifact(artifact, file_path)
Saves a pickle or json artifact (depending on the file extension).
- Parameters:
artifact – Python object to save.
file_path – Path of the file to save.
qbm.utils.transformations module
- class qbm.utils.transformations.PowerTransformer(df, threshold=1, power=0.5, columns=None)
Bases:
object
Transforms data points that lie beyond the provided threshold by a taking their power (<1) to scale them closer to the mean.
- inverse_transform(df, inplace=False)
Transforms the data back from the scaled space.
- Parameters:
df – Dataframe to scale.
inplace – If True then it operates on the same dataframe, if False then it creates a copy.
- Returns:
Dataframe of untransformed data (if inplace == False).
- transform(df, inplace=False)
Transforms the data to the scaled space.
- Parameters:
df – Dataframe to scale.
inplace – If True then it operates on the same dataframe, if False then it creates a copy.
- Returns:
Dataframe of transformed data (if inplace == False).
Module contents
- class qbm.utils.Discretizer(df, n_bits, epsilon={})
Bases:
object
- bit_array_to_df(bit_array)
Converts bit array a dataframe of floats.
- Parameters:
bit_array – Bit array which to convert.
returns – Dataframe of shape (bit_array.shape[0], len(self.columns)).
- static bit_vector_to_int(bit_vector)
Converts a bit vector to a bit string.
- Parameters:
bit_vector – Input bit vector.
- Returns:
Bit string of the input bit vector.
- static bit_vector_to_string(bit_vector)
Converts a bit vector to a bit string.
- Parameters:
bit_vector – Input bit vector.
- Returns:
Bit string of the input bit vector.
- df_to_bit_array(df)
Converts a dataframe of floats to a bit array.
- Parameters:
df – Dataframe which to convert.
returns – Array of bits of shape (df.shape[0], self.n_bits_total).
- discretize = <numpy.vectorize object>
- discretize_df(df)
Convert all columns of a dataframe to bit representation.
- Parameters:
df – Dataframe which to convert.
- Returns:
A discretized version of df.
- static int_to_bit_vector(x, n_bits)
Converts the integer x to an n_bits-bit bit vector.
- Parameters:
x – Integer value which to convert.
n_bits – Length of the bit vector.
- Returns:
Bit vector of length n_bits.
- undiscretize = <numpy.vectorize object>
- undiscretize_df(df)
Convert all columns of a dataframe to floats from bit representation.
- Parameters:
df – Dataframe which to convert.
- Returns:
An undiscretized version of df_discretized.
- class qbm.utils.PowerTransformer(df, threshold=1, power=0.5, columns=None)
Bases:
object
Transforms data points that lie beyond the provided threshold by a taking their power (<1) to scale them closer to the mean.
- inverse_transform(df, inplace=False)
Transforms the data back from the scaled space.
- Parameters:
df – Dataframe to scale.
inplace – If True then it operates on the same dataframe, if False then it creates a copy.
- Returns:
Dataframe of untransformed data (if inplace == False).
- transform(df, inplace=False)
Transforms the data to the scaled space.
- Parameters:
df – Dataframe to scale.
inplace – If True then it operates on the same dataframe, if False then it creates a copy.
- Returns:
Dataframe of transformed data (if inplace == False).
- qbm.utils.df_ensemble_stats(dfs)
Computes the means, medians, and standard deviations column/row-wise over the input list of dataframes.
- Parameters:
dfs – List of dataframes with identical row/column names.
- Returns:
Dictionary of dataframes with the means, medians, and standard deviations.
- qbm.utils.df_stats(df)
Compute the min, max, mean, median, and standard deviation of the columns in the dataframe.
- Parameters:
df – Dataframe.
- Returns:
Dataframe of the statistics.
- qbm.utils.filter_df_on_values(df, column_values, drop_filter_columns=True)
Return a copy of the dataframe filtered conditionally on provided column values.
- Parameters:
df – Dataframe to filter.
column_values – Dictionary where the keys are column names, and the values are values on which to filter the dataframe.
drop_filter_columns – If True returns a copy of the dataframe with the filtered columns dropped.
- Returns:
A dataframe filtered conditionally on the provided column values.
- qbm.utils.get_project_dir()
Gets the project directory path from the environment and checks if it is valid.
- Returns:
Path object of the project directory.
- qbm.utils.get_rng(seed=None)
Creates a random number generator with the specified seed value.
- Parameters:
seed – Seed value for the rng.
- Returns:
Numpy RandomState object.
- qbm.utils.kl_divergence(p_data, q_data, n_bins=32, epsilon_smooth=None, relative_smooth=False)
Computes the D_KL(p_data || p_samples).
Note: this is a crude approximation of the KL divergence.
- Parameters:
p_data – Array of data values to compute the p distribution from.
q_data – Array of data values to compute the q distribution from.
n_bins – Number of bins to use in histograms.
epsilon_smooth – Value to use with q distribution smoothing.
relative_smooth – Whether or not the smoothed values are relative to the p distribution.
- Returns:
D_KL(p || q).
- qbm.utils.load_artifact(file_path)
Loads a pickle or json artifact (depending on the file extension).
- Parameters:
file_path – Path of the file to load.
- Returns:
Loaded python object.
- qbm.utils.save_artifact(artifact, file_path)
Saves a pickle or json artifact (depending on the file extension).
- Parameters:
artifact – Python object to save.
file_path – Path of the file to save.