`snowmobile.core.qa`¶

Derived Statement classes.

These objects derive from snowmobile.core.statement.Statement and override its process() method to perform additional post-processing of the statement’s results in conjunction with any parameters provided within the statement’s tags.

s.process() modifies a statement’s outcome attribute (bool) on which an assertion is run before continuing execution of the script.

Note

The on_exception and on_failure parameters of script.run() are passed directly and only applicable to these derived statement classes.

on_exception is used to control the exception-handling of errors encountered in the post-processing invoked by s.process()

on_failure is used to control the exception-handling of a failed assertion ran on the outcome of the post-processing invoked by s.process()

Module Contents¶

Classes¶

`QA`	Base class for QA st.
`Empty`	QA class for verification that a statement’s results are empty.
`Diff`	QA class for comparison of values within a table based on

class snowmobile.core.qa.QA(sn: snowmobile.core.connection.Snowmobile, **kwargs)¶

Bases: snowmobile.core.Statement

Base class for QA st.

Initialize self. See help(type(self)) for accurate signature.

set_outcome(self)¶: Updates ._outcome upon completion of processing invoked by .process().

class snowmobile.core.qa.Empty(sn: snowmobile.core.connection.Snowmobile, **kwargs)¶

Bases: snowmobile.core.qa.QA

QA class for verification that a statement’s results are empty.

The most widely applicable use of Empty is for simple verification that a table’s dimensions are as expected.

Initialize self. See help(type(self)) for accurate signature.

process(self) → snowmobile.core.qa.QA ¶: Over-ride method; checks if results are empty and updates outcome

class snowmobile.core.qa.Diff(sn: snowmobile.core.connection.Snowmobile = None, **kwargs)¶

Bases: snowmobile.core.qa.QA

QA class for comparison of values within a table based on partitioning on a field.

partition_on¶

Column name to partition data on before comparing the partitioned datasets; defaults to ‘src_description`.

Type: str

end_index_at¶

Column name that marks the last column to use as an index column when joining the partitioned datasets back together.

Type: str

compare_patterns¶

Regex patterns to match columns on that should be included in comparison (numeric columns you’re running QA on).

Type: list

ignore_patterns¶

Regex patterns to match columns on that should be ignored both for the comparison and the index.

Type: list

generic_metric_col_nm¶

Column name to use for the melted field names; defaults to ‘Metric’.

Type: str

compare_cols¶

Columns that are used in comparison once statement is executed and parsing is applied.

Type: list

drop_cols¶

Columns that are dropped once statement is executed and parsing is applied.

Type: list

idx_cols¶

Columns that are used for the index to join the data back together once statement is executed and parsing is applied.

Type: list

ub_raw¶

Maximum absolute raw difference (upper bound) that two fields that are being compared can differ from each other without causing a failure.

Type: float

ub_perc¶

Maximum absolute percentage difference (upper bound) that two comparison fields can differ from each other without causing a failure.

Type: float

Instantiates a qa-diff statement.

Parameters

delta_column_suffix (str) – Suffix to add to columns that comparison is being run on; defaults to ‘Delta’.
partition_on (str) – Column to partition the data on in order to compare.
end_index_at (str) – Column name that marks the last column to use as an index when joining the partitioned datasets back together.
compare_patterns (list) – Regex patterns matching columns to be included in comparison.
ignore_patterns (list) – Regex patterns to match columns on that should be ignored both for the comparison and the index.
generic_metric_col_nm (str) – Column name to use for the melted field names; defaults to ‘Metric’.
raw_upper_bound (float) – Maximum absolute raw difference that two fields that are being compared can differ from each other without causing a failure.
percentage_upper_bound (float) – Maximum absolute percentage difference that two comparison fields can differ from each other without causing a failure.

split_cols(self) → snowmobile.core.qa.Diff ¶

Post-processes results returned from a qa-diff statement.

Executes private methods to split columns into:

Index columns
Drop columns
Comparison columns

Then runs checks needed to ensure minimum requirements are met in order for a valid partition/comparison to be made.

property partitioned_by(self) → Set[Any]¶: Distinct values within the partition_on column that data is partitioned by.

static partitions_are_equal(partitions: Dict[str, pd.DataFrame], abs_tol: float, rel_tol: float) → bool ¶

Evaluates if a dictionary of DataFrames are identical.

Parameters

partitions (Dict[str, pd.DataFrame]) – A dictionary of DataFrames returned by snowmobile.DataFrame().
abs_tol (float) – Absolute tolerance for difference in any value amongst the DataFrames being compared.
rel_tol (float) – Relative tolerance for difference in any value amongst the DataFrames being compared.

Returns (bool):: Indication of equality amongst all the DataFrames contained in partitions.

process(self) → snowmobile.core.qa.Diff ¶: Post-processing for Diff-specific results.

snowmobile.core.qa¶

Module Contents¶

Classes¶

`snowmobile.core.qa`¶