snowmobile.core.qa

Derived Statement classes.

These objects derive from snowmobile.core.statement.Statement and override its process() method to perform additional post-processing of the statement’s results in conjunction with any parameters provided within the statement’s tags.

s.process() modifies a statement’s outcome attribute (bool) on which an assertion is run before continuing execution of the script.

Note

The on_exception and on_failure parameters of script.run() are passed directly and only applicable to these derived statement classes.

on_exception is used to control the exception-handling of errors encountered in the post-processing invoked by s.process()

on_failure is used to control the exception-handling of a failed assertion ran on the outcome of the post-processing invoked by s.process()

Module Contents

Classes

QA

Base class for QA st.

Empty

QA class for verification that a statement’s results are empty.

Diff

QA class for comparison of values within a table based on

class snowmobile.core.qa.QA(sn: snowmobile.core.connection.Snowmobile, **kwargs)

Bases: snowmobile.core.Statement

Base class for QA st.

Initialize self. See help(type(self)) for accurate signature.

set_outcome(self)

Updates ._outcome upon completion of processing invoked by .process().

class snowmobile.core.qa.Empty(sn: snowmobile.core.connection.Snowmobile, **kwargs)

Bases: snowmobile.core.qa.QA

QA class for verification that a statement’s results are empty.

The most widely applicable use of Empty is for simple verification that a table’s dimensions are as expected.

Initialize self. See help(type(self)) for accurate signature.

process(self)snowmobile.core.qa.QA

Over-ride method; checks if results are empty and updates outcome

class snowmobile.core.qa.Diff(sn: snowmobile.core.connection.Snowmobile = None, **kwargs)

Bases: snowmobile.core.qa.QA

QA class for comparison of values within a table based on partitioning on a field.

partition_on

Column name to partition data on before comparing the partitioned datasets; defaults to ‘src_description`.

Type

str

end_index_at

Column name that marks the last column to use as an index column when joining the partitioned datasets back together.

Type

str

compare_patterns

Regex patterns to match columns on that should be included in comparison (numeric columns you’re running QA on).

Type

list

ignore_patterns

Regex patterns to match columns on that should be ignored both for the comparison and the index.

Type

list

generic_metric_col_nm

Column name to use for the melted field names; defaults to ‘Metric’.

Type

str

compare_cols

Columns that are used in comparison once statement is executed and parsing is applied.

Type

list

drop_cols

Columns that are dropped once statement is executed and parsing is applied.

Type

list

idx_cols

Columns that are used for the index to join the data back together once statement is executed and parsing is applied.

Type

list

ub_raw

Maximum absolute raw difference (upper bound) that two fields that are being compared can differ from each other without causing a failure.

Type

float

ub_perc

Maximum absolute percentage difference (upper bound) that two comparison fields can differ from each other without causing a failure.

Type

float

Instantiates a qa-diff statement.

Parameters
  • delta_column_suffix (str) – Suffix to add to columns that comparison is being run on; defaults to ‘Delta’.

  • partition_on (str) – Column to partition the data on in order to compare.

  • end_index_at (str) – Column name that marks the last column to use as an index when joining the partitioned datasets back together.

  • compare_patterns (list) – Regex patterns matching columns to be included in comparison.

  • ignore_patterns (list) – Regex patterns to match columns on that should be ignored both for the comparison and the index.

  • generic_metric_col_nm (str) – Column name to use for the melted field names; defaults to ‘Metric’.

  • raw_upper_bound (float) – Maximum absolute raw difference that two fields that are being compared can differ from each other without causing a failure.

  • percentage_upper_bound (float) – Maximum absolute percentage difference that two comparison fields can differ from each other without causing a failure.

split_cols(self)snowmobile.core.qa.Diff

Post-processes results returned from a qa-diff statement.

Executes private methods to split columns into:
  • Index columns

  • Drop columns

  • Comparison columns

Then runs checks needed to ensure minimum requirements are met in order for a valid partition/comparison to be made.

property partitioned_by(self)Set[Any]

Distinct values within the partition_on column that data is partitioned by.

static partitions_are_equal(partitions: Dict[str, pd.DataFrame], abs_tol: float, rel_tol: float)bool

Evaluates if a dictionary of DataFrames are identical.

Parameters
  • partitions (Dict[str, pd.DataFrame]) – A dictionary of DataFrames returned by snowmobile.DataFrame().

  • abs_tol (float) – Absolute tolerance for difference in any value amongst the DataFrames being compared.

  • rel_tol (float) – Relative tolerance for difference in any value amongst the DataFrames being compared.

Returns (bool):

Indication of equality amongst all the DataFrames contained in partitions.

process(self)snowmobile.core.qa.Diff

Post-processing for Diff-specific results.