qbm.utils package

Submodules

qbm.utils.discretization module

class qbm.utils.discretization.Discretizer(df, n_bits, epsilon={})

Bases: object

bit_array_to_df(bit_array)

Converts bit array a dataframe of floats.

Parameters:
  • bit_array – Bit array which to convert.

  • returns – Dataframe of shape (bit_array.shape[0], len(self.columns)).

static bit_vector_to_int(bit_vector)

Converts a bit vector to a bit string.

Parameters:

bit_vector – Input bit vector.

Returns:

Bit string of the input bit vector.

static bit_vector_to_string(bit_vector)

Converts a bit vector to a bit string.

Parameters:

bit_vector – Input bit vector.

Returns:

Bit string of the input bit vector.

df_to_bit_array(df)

Converts a dataframe of floats to a bit array.

Parameters:
  • df – Dataframe which to convert.

  • returns – Array of bits of shape (df.shape[0], self.n_bits_total).

discretize = <numpy.vectorize object>
discretize_df(df)

Convert all columns of a dataframe to bit representation.

Parameters:

df – Dataframe which to convert.

Returns:

A discretized version of df.

static int_to_bit_vector(x, n_bits)

Converts the integer x to an n_bits-bit bit vector.

Parameters:
  • x – Integer value which to convert.

  • n_bits – Length of the bit vector.

Returns:

Bit vector of length n_bits.

undiscretize = <numpy.vectorize object>
undiscretize_df(df)

Convert all columns of a dataframe to floats from bit representation.

Parameters:

df – Dataframe which to convert.

Returns:

An undiscretized version of df_discretized.

qbm.utils.misc module

qbm.utils.misc.df_ensemble_stats(dfs)

Computes the means, medians, and standard deviations column/row-wise over the input list of dataframes.

Parameters:

dfs – List of dataframes with identical row/column names.

Returns:

Dictionary of dataframes with the means, medians, and standard deviations.

qbm.utils.misc.df_stats(df)

Compute the min, max, mean, median, and standard deviation of the columns in the dataframe.

Parameters:

df – Dataframe.

Returns:

Dataframe of the statistics.

qbm.utils.misc.filter_df_on_values(df, column_values, drop_filter_columns=True)

Return a copy of the dataframe filtered conditionally on provided column values.

Parameters:
  • df – Dataframe to filter.

  • column_values – Dictionary where the keys are column names, and the values are values on which to filter the dataframe.

  • drop_filter_columns – If True returns a copy of the dataframe with the filtered columns dropped.

Returns:

A dataframe filtered conditionally on the provided column values.

qbm.utils.misc.get_project_dir()

Gets the project directory path from the environment and checks if it is valid.

Returns:

Path object of the project directory.

qbm.utils.misc.get_rng(seed=None)

Creates a random number generator with the specified seed value.

Parameters:

seed – Seed value for the rng.

Returns:

Numpy RandomState object.

qbm.utils.misc.kl_divergence(p_data, q_data, n_bins=32, epsilon_smooth=None, relative_smooth=False)

Computes the D_KL(p_data || p_samples).

Note: this is a crude approximation of the KL divergence.

Parameters:
  • p_data – Array of data values to compute the p distribution from.

  • q_data – Array of data values to compute the q distribution from.

  • n_bins – Number of bins to use in histograms.

  • epsilon_smooth – Value to use with q distribution smoothing.

  • relative_smooth – Whether or not the smoothed values are relative to the p distribution.

Returns:

D_KL(p || q).

qbm.utils.misc.load_artifact(file_path)

Loads a pickle or json artifact (depending on the file extension).

Parameters:

file_path – Path of the file to load.

Returns:

Loaded python object.

qbm.utils.misc.save_artifact(artifact, file_path)

Saves a pickle or json artifact (depending on the file extension).

Parameters:
  • artifact – Python object to save.

  • file_path – Path of the file to save.

qbm.utils.transformations module

class qbm.utils.transformations.PowerTransformer(df, threshold=1, power=0.5, columns=None)

Bases: object

Transforms data points that lie beyond the provided threshold by a taking their power (<1) to scale them closer to the mean.

inverse_transform(df, inplace=False)

Transforms the data back from the scaled space.

Parameters:
  • df – Dataframe to scale.

  • inplace – If True then it operates on the same dataframe, if False then it creates a copy.

Returns:

Dataframe of untransformed data (if inplace == False).

transform(df, inplace=False)

Transforms the data to the scaled space.

Parameters:
  • df – Dataframe to scale.

  • inplace – If True then it operates on the same dataframe, if False then it creates a copy.

Returns:

Dataframe of transformed data (if inplace == False).

Module contents

class qbm.utils.Discretizer(df, n_bits, epsilon={})

Bases: object

bit_array_to_df(bit_array)

Converts bit array a dataframe of floats.

Parameters:
  • bit_array – Bit array which to convert.

  • returns – Dataframe of shape (bit_array.shape[0], len(self.columns)).

static bit_vector_to_int(bit_vector)

Converts a bit vector to a bit string.

Parameters:

bit_vector – Input bit vector.

Returns:

Bit string of the input bit vector.

static bit_vector_to_string(bit_vector)

Converts a bit vector to a bit string.

Parameters:

bit_vector – Input bit vector.

Returns:

Bit string of the input bit vector.

df_to_bit_array(df)

Converts a dataframe of floats to a bit array.

Parameters:
  • df – Dataframe which to convert.

  • returns – Array of bits of shape (df.shape[0], self.n_bits_total).

discretize = <numpy.vectorize object>
discretize_df(df)

Convert all columns of a dataframe to bit representation.

Parameters:

df – Dataframe which to convert.

Returns:

A discretized version of df.

static int_to_bit_vector(x, n_bits)

Converts the integer x to an n_bits-bit bit vector.

Parameters:
  • x – Integer value which to convert.

  • n_bits – Length of the bit vector.

Returns:

Bit vector of length n_bits.

undiscretize = <numpy.vectorize object>
undiscretize_df(df)

Convert all columns of a dataframe to floats from bit representation.

Parameters:

df – Dataframe which to convert.

Returns:

An undiscretized version of df_discretized.

class qbm.utils.PowerTransformer(df, threshold=1, power=0.5, columns=None)

Bases: object

Transforms data points that lie beyond the provided threshold by a taking their power (<1) to scale them closer to the mean.

inverse_transform(df, inplace=False)

Transforms the data back from the scaled space.

Parameters:
  • df – Dataframe to scale.

  • inplace – If True then it operates on the same dataframe, if False then it creates a copy.

Returns:

Dataframe of untransformed data (if inplace == False).

transform(df, inplace=False)

Transforms the data to the scaled space.

Parameters:
  • df – Dataframe to scale.

  • inplace – If True then it operates on the same dataframe, if False then it creates a copy.

Returns:

Dataframe of transformed data (if inplace == False).

qbm.utils.df_ensemble_stats(dfs)

Computes the means, medians, and standard deviations column/row-wise over the input list of dataframes.

Parameters:

dfs – List of dataframes with identical row/column names.

Returns:

Dictionary of dataframes with the means, medians, and standard deviations.

qbm.utils.df_stats(df)

Compute the min, max, mean, median, and standard deviation of the columns in the dataframe.

Parameters:

df – Dataframe.

Returns:

Dataframe of the statistics.

qbm.utils.filter_df_on_values(df, column_values, drop_filter_columns=True)

Return a copy of the dataframe filtered conditionally on provided column values.

Parameters:
  • df – Dataframe to filter.

  • column_values – Dictionary where the keys are column names, and the values are values on which to filter the dataframe.

  • drop_filter_columns – If True returns a copy of the dataframe with the filtered columns dropped.

Returns:

A dataframe filtered conditionally on the provided column values.

qbm.utils.get_project_dir()

Gets the project directory path from the environment and checks if it is valid.

Returns:

Path object of the project directory.

qbm.utils.get_rng(seed=None)

Creates a random number generator with the specified seed value.

Parameters:

seed – Seed value for the rng.

Returns:

Numpy RandomState object.

qbm.utils.kl_divergence(p_data, q_data, n_bins=32, epsilon_smooth=None, relative_smooth=False)

Computes the D_KL(p_data || p_samples).

Note: this is a crude approximation of the KL divergence.

Parameters:
  • p_data – Array of data values to compute the p distribution from.

  • q_data – Array of data values to compute the q distribution from.

  • n_bins – Number of bins to use in histograms.

  • epsilon_smooth – Value to use with q distribution smoothing.

  • relative_smooth – Whether or not the smoothed values are relative to the p distribution.

Returns:

D_KL(p || q).

qbm.utils.load_artifact(file_path)

Loads a pickle or json artifact (depending on the file extension).

Parameters:

file_path – Path of the file to load.

Returns:

Loaded python object.

qbm.utils.save_artifact(artifact, file_path)

Saves a pickle or json artifact (depending on the file extension).

Parameters:
  • artifact – Python object to save.

  • file_path – Path of the file to save.