`snowmobile.core`¶

snowmobile lives in snowmobile.core to keep from cluttering intellisense/autocomplete while interacting with the API.

Subpackages¶

snowmobile.core.cfg

Submodules¶

Package Contents¶

Classes¶

`Generic`	Generic dunder implementation for `snowmobile` objects.
`ExceptionHandler`	All snowmobile classes contain a `ExceptionHandler`.
`Configuration`	A parsed snowmobile.toml file.
`Snowmobile`	Primary method of statement execution and accessor to parsed snowmobile.toml.
`connect`	Primary method of statement execution and accessor to parsed snowmobile.toml.
`Section`	Represents any (1-6 level) header section within Script Name (doc).md.
`Scope`	Handles the scope/context for `Statement` objects and derived classes.
`Name`	Handles the decomposition/parsing of statement name.
`Statement`	Base class for all `Statement` objects.
`Column`	A single column within a `SnowFrame`.
`Diff`	QA class for comparison of values within a table based on
`Empty`	QA class for verification that a statement’s results are empty.
`SnowFrame`	Extends a `DataFrame` with a `.snf` entry point.
`SQL`	SQL class for generation & execution of common sql commands.
`Markup`	Contains all sections within the context of a `Script`.
`Script`	Parser and operator of local sql files.
`Table`	Constructed with a `DataFrame` and a table name to load into.

class snowmobile.core.Generic¶

Bases: object

Generic dunder implementation for snowmobile objects.

Base class for all snowmobile objects that do not inherit from pydantic’s BaseModel or configuration class, Config.

class snowmobile.core.ExceptionHandler(within: snowmobile.core.errors.Optional[Any] = None, ctx_id: snowmobile.core.errors.Optional[int] = None, in_context: bool = False, children: snowmobile.core.errors.Dict[int, Any] = None, is_active_parent: bool = False, to_mirror: snowmobile.core.errors.Optional[List[Any]] = None)¶

Bases: snowmobile.core.Generic

All snowmobile classes contain a ExceptionHandler.

Parameters

within (Optional[Any]) – Class for which the ExceptionHandler is intended.
ctx_id (Optional[int]) – Context ID; set/unset by methods when entering/exiting certain contexts.
in_context (bool) – Class is currently within a specific ctx_id
children (Dict[int, Any]) – Attributes of the within class for which the ExceptionHandler should mirror the methods called on the parent class. # TODO: Refactor this out; it’s essentially janky multi inheritance
is_active_parent (bool) – The within class is currently enforcing the context rules on its children
to_mirror (Optional[List[Any]]) – Methods called in the attr:`within class that should be applied to its children (i.e. set/reset context ID, etc)

property current(self)¶: All exceptions in the current context.

collect(self, e: Any[snowmobile_errors])¶: Stores an exception.

property first(self) → snowmobile.core.errors.Error ¶: First exception encountered.

property last(self) → snowmobile.core.errors.Error ¶: Last exception encountered.

seen(self, from_ctx: snowmobile.core.errors.Optional[int] = None, of_type: snowmobile.core.errors.Optional[Any[snowmobile_errors], List[snowmobile_errors]] = None, to_raise: snowmobile.core.errors.Optional[bool] = None, with_ids: snowmobile.core.errors.Optional[int, List[int], Set[int]] = None, all_time: bool = False) → bool ¶: Boolean indicator of if an exception has been seen.

get(self, from_ctx: snowmobile.core.errors.Optional[int] = None, of_type: snowmobile.core.errors.Optional[Any[snowmobile_errors], List[snowmobile_errors]] = None, to_raise: snowmobile.core.errors.Optional[bool] = None, with_ids: snowmobile.core.errors.Optional[int, List[int], Set[int]] = None, all_time: bool = False, last: bool = False, first: bool = False, _raise: bool = False)¶: Boolean indicator of if an exception has been seen.

property ctx_id(self)¶: Current context id.

set(self, ctx_id: snowmobile.core.errors.Optional[int] = None, in_context: bool = False, outcome: snowmobile.core.errors.Optional[int] = None)¶: Set attributes on self.

set_from(self, other: snowmobile.core.exception_handler.ExceptionHandler) → snowmobile.core.exception_handler.ExceptionHandler ¶: Updates attributes of self with those from ‘other’.

reset(self, ctx_id: bool = False, in_context: bool = False, outcome: bool = False) → snowmobile.core.exception_handler.ExceptionHandler ¶: Resets attributes on self.

property by_tmstmp(self)¶: All exceptions by timestamp, ordered by most to least recent.

class snowmobile.core.Configuration(creds: Optional[str] = None, config_file_nm: Optional[str] = None, from_config: Optional[Path, str] = None, export_dir: Optional[Path, str] = None, silence: bool = False)¶

Bases: snowmobile.core.base.Generic

A parsed snowmobile.toml file.

All keyword arguments optional.

Parameters

config_file_nm (Optional[str]) – Name of configuration file to use; defaults to snowmobile.toml.
creds (Optional[str]) – Alias for the set of credentials to authenticate with; default behavior will fall back to the connection.default-creds specified in snowmobile.toml, or the first set of credentials stored if this configuration option is left blank.
from_config (Optional[str, Path]) – A full path to a specific configuration file to use; bypasses any checks for a cached file location and can be useful for container-based processes with restricted access to the local file system.
export_dir (Optional[Path]) – Path to save a template snowmobile.toml file to; if pr, the file will be exported within the __init__ method and nothing else will be instantiated.

file_nm¶

Configuration file name; defaults to ‘snowmobile.toml’.

Type: str

cache¶

Persistent cache; caches location.

Type: snowmobile.core.cache.Cache

location¶

Full path to configuration file.

Type: pathlib.Path

connection :Optional[cfg.Connection]¶

[connection] from snowmobile.toml.

Type: snowmobile.core.cfg.Connection

loading :Optional[cfg.Loading]¶

[loading] from snowmobile.toml.

Type: snowmobile.core.cfg.Loading

script :Optional[cfg.Script]¶

[script] from snowmobile.toml.

Type: snowmobile.core.cfg.Script

sql :Optional[cfg.SQL]¶

[sql] from snowmobile-ext.toml.

Type: snowmobile.core.cfg.SQL

ext_sources :Optional[cfg.Location]¶

[external-sources] from snowmobile.toml.

Type: snowmobile.core.cfg.Location

property markdown(self) → snowmobile.core.cfg.Markup ¶: Accessor for cfg.script.markdown.

property attrs(self) → snowmobile.core.cfg.Attributes ¶: Accessor for cfg.script.markdown.attributes.

property wildcards(self) → snowmobile.core.cfg.Wildcard ¶: Accessor for cfg.script.patterns.wildcards.

static batch_set_attrs(obj: Any, attrs: dict, to_none: bool = False)¶

Batch sets attributes on an object from a dictionary.

Parameters

obj (Any) – Object to set attributes on.
attrs (dict) – Dictionary containing attributes.
to_none (bool) – Set all of the object’s attributes batching a key in wrap to None; defaults ot False.

Returns (Any):: Object post-setting attributes.

static attrs_from_obj(obj: Any, within: Optional[List[str]] = None) → Dict[str, MethodType]¶: Utility to return attributes/properties from an object as a dictionary.

static methods_from_obj(obj: Any, within: Optional[List[str]] = None) → Dict[str, MethodType]¶: Returns callable components of an object as a dictionary.

property scopes(self)¶: All combinations of scope type and scope attribute.

scopes_from_kwargs(self, only_populated: bool = False, **kwargs) → Dict¶

Turns script.filter() arguments into a valid set of kwargs for Scope.

Returns dictionary of all combinations of ‘arg’ (“kw”, “obj”, “desc”, “anchor” and “nm”), including empty sets for any ‘arg’ not included in the keyword arguments provided.

scopes_from_tag(self, t: Any)¶: Generates list of keyword arguments to instantiate all scopes for a wrap.

json(self, by_alias: bool = False, **kwargs)¶: Serialization method for core object model.

class snowmobile.core.Snowmobile(creds: Optional[str] = None, delay: bool = False, ensure_alive: bool = True, config_file_nm: Optional[str] = None, from_config: Optional[str, Path] = None, silence: bool = False, **connect_kwargs)¶

Bases: snowmobile.core.sql.SQL

Primary method of statement execution and accessor to parsed snowmobile.toml.

Parameters

creds (Optional[str]) – Alias for the set of credentials to authenticate with; default behavior will fall back to the connection.default-creds specified in snowmobile.toml, or the first set of credentials stored if this configuration option is left blank.
delay (bool) – Optionally delay establishing a connection when the object is instantiated, enabling access to the configuration object model through the Connection.cfg attribute; defaults to False.
ensure_alive (bool) – Establish a new connection if a method requiring a connection against the database is called while alive is False; defaults to True.
config_file_nm (Optional[str]) – Name of configuration file to use; defaults to snowmobile.toml.
from_config (Optional[str, Path]) – A full path to a specific configuration file to use; bypasses any checks for a cached file location and can be useful for container-based processes with restricted access to the local file system.
**connect_kwargs –
Additional arguments to provide to snowflake.connector.connect(); arguments provided here will over-ride connection arguments specified in snowmobile.toml, including:
- Connection parameters in connection.default-arguments
- Credentials parameters associated with a given alias
- Connection parameters associated with a given alias

Initializes a snowmobile.SQL object.

cfg :Configuration¶

snowmobile.toml

Type: snowmobile.core.configuration.Configuration

con :Optional[SnowflakeConnection]¶

Can be None until set by Snowmobile.connect()

Type: SnowflakeConnection

e :ExceptionHandler¶

Exception / context management

Type: snowmobile.core.exception_handler.ExceptionHandler

ensure_alive :bool¶

Reconnect to Snowflake if connection is lost

Type: bool

connect(self, **kwargs) → snowmobile.core.connection.Snowmobile ¶

Establishes connection to Snowflake.

Re-implements snowflake.connector.connect() with connection arguments sourced from snowmobile’s object model, specifically:

Credentials from snowmobile.toml.

Default connection arguments from snowmobile.toml.

Optional keyword arguments either passed to snowmobile.connect() or directly to this method.

kwargs:
Optional keyword arguments to pass to snowflake.connector.connect(); arguments passed here will over-ride connection.default-arguments specified in snowmobile.toml.

disconnect(self) → snowmobile.core.connection.Snowmobile ¶: Disconnect from connection with which Connection() was instantiated.

property alive(self) → bool ¶: Check if the connection is alive.

property cursor(self) → snowflake.connector.connection.SnowflakeCursor¶: SnowflakeCursor accessor.

property dictcursor(self) → snowflake.connector.DictCursor¶: DictCursor accessor.

ex(self, sql: str, on_error: Optional[str] = None, **kwargs) → snowflake.connector.connection.SnowflakeCursor¶

Executes a command via SnowflakeCursor.

Parameters

sql (str) – sql command as a string.
on_error (str) – String value to impose a specific behavior if an error occurs during the execution of sql.
**kwargs – Optional keyword arguments for SnowflakeCursor.execute().

Returns (SnowflakeCursor):: SnowflakeCursor object that executed the command.

exd(self, sql: str, on_error: Optional[str] = None, **kwargs) → snowflake.connector.DictCursor¶

Executes a command via DictCursor.

Parameters

sql (str) – sql command as a string.
on_error (str) – String value to impose a specific behavior if an error occurs during the execution of sql.
**kwargs – Optional keyword arguments for SnowflakeCursor.execute().

Returns (DictCursor):: DictCursor object that executed the command.

query(self, sql: str, as_df: bool = False, as_cur: bool = False, as_dcur: bool = False, as_scalar: bool = False, lower: bool = True, on_error: Optional[str] = None) → Union[pd.DataFrame, SnowflakeCursor]¶

Execute a query and return results.

Default behavior of results=True will return results as a pandas.DataFrame, otherwise will execute the sql provided with a SnowflakeCursor and return the cursor object.

Parameters

sql (str) – Raw SQL to execute.
as_df (bool) – Return results in DataFrame.
as_cur (bool) – Return results in Cursor.
as_dcur (bool) – Return results in a DictCursor.
as_scalar (bool) – Return results as a single scalar value.
lower (bool) – Boolean value indicating whether or not to return results with columns lower-cased.
on_error (str) – String value to impose a specific behavior if an error occurs during the execution of sql.

Returns (Union[pd.DataFrame, SnowflakeCursor]):: Results from sql as a DataFrame by default or the SnowflakeCursor object if results=False.

class snowmobile.core.connect(creds: Optional[str] = None, delay: bool = False, ensure_alive: bool = True, config_file_nm: Optional[str] = None, from_config: Optional[str, Path] = None, silence: bool = False, **connect_kwargs)¶

Bases: snowmobile.core.sql.SQL

Primary method of statement execution and accessor to parsed snowmobile.toml.

Parameters

creds (Optional[str]) – Alias for the set of credentials to authenticate with; default behavior will fall back to the connection.default-creds specified in snowmobile.toml, or the first set of credentials stored if this configuration option is left blank.
delay (bool) – Optionally delay establishing a connection when the object is instantiated, enabling access to the configuration object model through the Connection.cfg attribute; defaults to False.
ensure_alive (bool) – Establish a new connection if a method requiring a connection against the database is called while alive is False; defaults to True.
config_file_nm (Optional[str]) – Name of configuration file to use; defaults to snowmobile.toml.
from_config (Optional[str, Path]) – A full path to a specific configuration file to use; bypasses any checks for a cached file location and can be useful for container-based processes with restricted access to the local file system.
**connect_kwargs –
Additional arguments to provide to snowflake.connector.connect(); arguments provided here will over-ride connection arguments specified in snowmobile.toml, including:
- Connection parameters in connection.default-arguments
- Credentials parameters associated with a given alias
- Connection parameters associated with a given alias

Initializes a snowmobile.SQL object.

cfg :Configuration¶

snowmobile.toml

Type: snowmobile.core.configuration.Configuration

con :Optional[SnowflakeConnection]¶

Can be None until set by Snowmobile.connect()

Type: SnowflakeConnection

e :ExceptionHandler¶

Exception / context management

Type: snowmobile.core.exception_handler.ExceptionHandler

ensure_alive :bool¶

Reconnect to Snowflake if connection is lost

Type: bool

connect(self, **kwargs) → snowmobile.core.connection.Snowmobile ¶

Establishes connection to Snowflake.

Re-implements snowflake.connector.connect() with connection arguments sourced from snowmobile’s object model, specifically:

Credentials from snowmobile.toml.

Default connection arguments from snowmobile.toml.

Optional keyword arguments either passed to snowmobile.connect() or directly to this method.

kwargs:
Optional keyword arguments to pass to snowflake.connector.connect(); arguments passed here will over-ride connection.default-arguments specified in snowmobile.toml.

disconnect(self) → snowmobile.core.connection.Snowmobile ¶: Disconnect from connection with which Connection() was instantiated.

property alive(self) → bool ¶: Check if the connection is alive.

property cursor(self) → snowflake.connector.connection.SnowflakeCursor¶: SnowflakeCursor accessor.

property dictcursor(self) → snowflake.connector.DictCursor¶: DictCursor accessor.

ex(self, sql: str, on_error: Optional[str] = None, **kwargs) → snowflake.connector.connection.SnowflakeCursor¶

Executes a command via SnowflakeCursor.

Parameters

sql (str) – sql command as a string.
on_error (str) – String value to impose a specific behavior if an error occurs during the execution of sql.
**kwargs – Optional keyword arguments for SnowflakeCursor.execute().

Returns (SnowflakeCursor):: SnowflakeCursor object that executed the command.

exd(self, sql: str, on_error: Optional[str] = None, **kwargs) → snowflake.connector.DictCursor¶

Executes a command via DictCursor.

Parameters

sql (str) – sql command as a string.
on_error (str) – String value to impose a specific behavior if an error occurs during the execution of sql.
**kwargs – Optional keyword arguments for SnowflakeCursor.execute().

Returns (DictCursor):: DictCursor object that executed the command.

query(self, sql: str, as_df: bool = False, as_cur: bool = False, as_dcur: bool = False, as_scalar: bool = False, lower: bool = True, on_error: Optional[str] = None) → Union[pd.DataFrame, SnowflakeCursor]¶

Execute a query and return results.

Default behavior of results=True will return results as a pandas.DataFrame, otherwise will execute the sql provided with a SnowflakeCursor and return the cursor object.

Parameters

sql (str) – Raw SQL to execute.
as_df (bool) – Return results in DataFrame.
as_cur (bool) – Return results in Cursor.
as_dcur (bool) – Return results in a DictCursor.
as_scalar (bool) – Return results as a single scalar value.
lower (bool) – Boolean value indicating whether or not to return results with columns lower-cased.
on_error (str) – String value to impose a specific behavior if an error occurs during the execution of sql.

Returns (Union[pd.DataFrame, SnowflakeCursor]):: Results from sql as a DataFrame by default or the SnowflakeCursor object if results=False.

class snowmobile.core.Section(cfg: snowmobile.core.Configuration, is_marker: bool = None, h_contents: Optional[str] = None, index: Optional[int] = None, parsed: Optional[Dict] = None, raw: Optional[str] = None, sql: Optional[str] = None, results: Optional[pd.DataFrame] = None, incl_sql_tag: bool = False, is_multiline: bool = False, result_wrap: Optional[str] = None)¶

Bases: snowmobile.core.Generic

Represents any (1-6 level) header section within Script Name (doc).md.

Class is created with a call to the as_section() method or by the snowmobile.core.markup.Markup class in the case of a Marker.

In order to include execution metadata if available without sacrificing base-case parsing, the below implementation heavily relies properties over attributes to reconcile what’s populated in the st vs executed attributes of Script.

Parameters

is_marker (bool) – Information provided is associated with a marker as opposed to a statement; defaults to False.
h_contents (str) – String representation of header contents.
index (int) – Statement index position or None if marker.
parsed (dict) – Parsed arguments from the statement or marker within the script.
raw (str) – Raw wrap as parsed was parsed from.
sql (str) – Statement’s raw sql or None if marker.
results (pd.DataFrame) – Results returned by statement’s sql as a DataFrame; will be None if statement hasn’t been executed or if a marker.

hx¶

String representation of header level (e.g. ‘#’ for h1), based on the script/statement header-level specifications in snowmobile.toml.

Type: str

Instantiation of a script.Section object.

reorder_attrs(self, parsed: dict, cfg: snowmobile.core.Configuration) → Dict¶: Re-orders parsed attributes based on configuration.

parse_contents(self, cfg: snowmobile.core.Configuration) → List[Item]¶: Unpacks sorted dictionary of parsed attributes into formatted Items.

property header(self) → str ¶

Constructs the header for a section.

Uses specifications in snowmobile.toml to determine:

The level of the header depending on whether it’s a statement section or a script section.
Whether or not to include the statement index as part of the header.

Returns: Formatted header line as a string.

property sql_md(self) → str ¶: Returns renderable sql or an empty string if script-level section.

property body(self) → str ¶: All section content except for header.

property md(self) → str ¶

Constructs a full section as a string from various components.

Returns: Full string of valid markdown for the section.

class snowmobile.core.Scope(arg: str, base: str)¶

Bases: snowmobile.core.Generic

Handles the scope/context for Statement objects and derived classes.

Should never be interacted with from the user-facing API.

base¶

The left-most word within a statement wrap. For generic st this will be the keyword and for QA statements this will be the literal word qa.

Type: str

component¶

The component within a given wrap that is being evaluated; this will be exactly one of kw, obj, anchor, desc, or nm.

Type: str

incl_arg¶

The keyword argument that would be used to exclude a given component;

e.g. if component is kw, incl_arg would be incl_kw.

Type: str

excl_arg¶

The keyword argument that would be used to exclude a given component; this would be the same as the above example except the value would be excl_kw as opposed to incl_kw.

Type: str

fallback_to¶

The default values to fall back to for incl_arg and excl_arg if they are not passed as a keyword argument by the user in Script; defaults to including the base and excluding an empty list.

type
dict

provided_args (dict):

The set of keyword arguments provided at the time of the last call to eval().

check_against_args (dict):: The set of keyword arguments checked against at the time of the last call to eval(); will use provided arguments if they exist and the arguments from fallback_to otherwise.
is_included (bool):: Name is included based on the results of the last call to eval().
is_excluded (bool):: Name is excluded based on the results of the last call to eval().

Instantiates a Scope object.

parse_kwargs(self, **kwargs) → None ¶

Parses all filter arguments looking for those that match its base.

Looks for include/exclude arguments within kwargs, populating provided_args with those that were provided and populates check_against_args with the same values if they were provided and fills in defaults from fallback_to otherwise.

Parameters: **kwargs – Keyword arguments passed to Script.filter() (e.g. incl_kw, excl_kw, ..)

matches_patterns(self, arg: str) → bool ¶

Returns indication of if base matches a given set of patterns.

Parameters: arg (str) – Will either be the value of incl_arg or exclude_arg.

Returns (bool):: Indication of whether any matches were found.

property included(self)¶: Name is included based on results of last eval().

eval(self, **kwargs) → bool ¶

Evaluates filter arguments and updates context accordingly.

Updates the values of is_included, is_excluded, and included.

Parameters: **kwargs – Keyword arguments passed to Script.filter() (e.g. incl_kw, excl_kw, ..)

Returns (bool):: Indicator of whether or not the statement should be included/excluded based on the context/keyword arguments pr.

class snowmobile.core.Name(configuration: snowmobile.core.Configuration, nm_pr: Optional[str] = None, sql: Optional[str] = None, index: Optional[int] = None)¶

Bases: snowmobile.core.Generic

Handles the decomposition/parsing of statement name.

Should never be instantiated directly by the user-facing API but its attributes are likely to be accessed often as part of Statement and derived classes.

cfg¶

snowmobile.Configuration object; represents fully parsed snowmobile.toml file.

Type: snowmobile.Configuration

patt¶

snowmobile.Schema.Pattern object; represents script.patterns section of snowmobile.toml.

Type: snowmobile.Schema.Pattern

_nm_pr¶

Provided wrap name for a given Statement; can be empty.

Type: str

index¶

Statement index position within Script; can be empty.

Type: int

is_included¶

Indicator of whether or not the combination of all scopes for this statement wrap is included within a given context.

Type: bool

incl_idx_in_desc¶

Indicator of whether or not to include the statement index in the description component of the wrap; defaults to True so that all generated statement tags are guaranteed to be unique for a given script.

Mainly included for testing purposes where setting to False enables comparing generated to provided statement tags without having to change the index position of the hard-coded/pr statement wrap when adding/removing tests.

Type: bool

first_line_remainder¶

The remainder of the first line once excluding the first_keyword and stripping repeating whitespace.

Type: str

scopes¶

Combination of all scopes for a given wrap; this is essentially the all possible combinations of including/excluding any of the kw, nm, obj, desc, and anchor for a given instance of Name.

Type: set[Scope]

scope(self, **kwargs) → bool ¶

Evaluates all component’s of a wrap’s scope against a set of filter args.

**kwargs:
Keyword arguments passed to Script.filter() (e.g. incl_kw, excl_kw, ..)

Returns (bool):: Value indicating whether or not the statement should be included based on the outcome of the evaluation of all of its components.

nm(self, ge: bool = False, pr: bool = False, og: bool = True) → str ¶

The final statement’s name that is used by the API.

This will be the full statement name if a tag exists and a parsed/generated name otherwise.

kw(self, ge: bool = False, pr: bool = False)¶

The final statement’s keyword that is used by the API.

This will be the provided keyword if a statement wrap exists and a parsed/ge keyword otherwise.

obj(self, ge: bool = False, pr: bool = False)¶

The final statement’s object that is used by the API.

This will be the object within a wrap if a statement wrap exists and follows the correct structure and a parsed/ge object otherwise.

desc(self, ge: bool = False, pr: bool = False)¶

The final statement’s description that is used by the API.

This will be the description within a wrap if a statement wrap exists and follows the correct structure and a parsed/ge description otherwise.

anchor(self, ge: bool = False, pr: bool = False)¶

The final statement’s anchor that is used by the API.

This will be the anchor within a wrap if a statement wrap exists and follows the correct structure and a parsed/ge wrap name otherwise.

set(self, key, value) → snowmobile.core.name.Name ¶: Custom attribute setting.

class snowmobile.core.Statement(sn: snowmobile.core.connection.Snowmobile, statement: Union[sqlparse.sql.Statement, str], index: Optional[int] = None, attrs_raw: Optional[str] = None, e: Optional[ExceptionHandler] = None, **kwargs)¶

Bases: snowmobile.core.tag.Attrs, snowmobile.core.Name, snowmobile.core.Generic

Base class for all Statement objects.

Home for attributes and methods that are associated with all statement objects, generic or QA.

sn¶

snowmobile.connect object.

Type: snowmobile.connect

statement¶

A sqlparse.sql.Statement object.

Type: Union[sqlparse.sql.Statement, str]

index¶

The context-specific index position of a statement within a script; can be None.

Type: int

patterns¶

config.Pattern object for more succinct access to values specified in snowmobile.toml.

Type: config.Pattern

results¶

The results of the statement if executed as a pandas.DataFrame.

Type: pd.DataFrame

outcome¶

Numeric indicator of outcome; defaults to 0 and is modified based on the outcome of statement execution and/or QA validation for derived classes.

Type: int

outcome_txt¶

Plain text of outcome (‘skipped’, ‘failed’, ‘completed’, ‘passed’).

Type: str

outcome_html¶

HTML text for the outcome as an admonition/information banner based on the following mapping of outcome_txt to admonition argument:

failed ——> warning

completed –> info

passed —–> success

Type: str

start_time¶

Unix timestamp of the query start time if executed; 0 otherwise.

Type: int

end_time¶

Unix timestamp of the query end time if executed; 0 otherwise.

Type: int

execution_time¶

Execution time of the query in seconds if executed; 0 otherwise.

Type: int

execution_time_txt¶

Plain text description of execution time if executed; returned in seconds if execution time is less than 60 seconds, minutes otherwise.

Type: str

first_keyword¶

The first keyword within the statement as a sqlparse.sql.Token.

Type: sqlparse.sql.Token

sql¶

The sql associated with the statement as a raw string.

Type: str

Initialize self. See help(type(self)) for accurate signature.

sql(self, set_as: Optional[str] = None, tag: bool = False) → Union[str, Statement]¶: Raw sql from statement, including result limit if enabled.

parse(self) → Tuple[Dict, str]¶

Parses tag contents into a valid dictionary.

Uses the values specified in snowmobile.toml to parse a raw string of statement attributes into a valid dictionary.

Note

If is_multiline is True and name is not included within the arguments, an assertion error will be thrown.
If is_multiline is False, the raw string within the wrap will be treated as the name.
The wrap attribute is set once parsing is completed and name has been validated.

Returns (dict):: Parsed wrap arguments as a dictionary.

start(self)¶: Sets start_time attribute.

end(self)¶

Updates execution time attributes.

In namespace, sets:

end_time
execution_time
execution_time_txt

trim(self) → str ¶: Statement as a string including only the sql and a single-line wrap name.

Note

The wrap name used here will be the user-pr wrap from the original script or a generated Name.nm if a wrap was not provided for a given statement.

property is_derived(self)¶: Indicates whether or not it’s a generic or derived (QA) statement.

property lines(self) → List[str]¶: Returns each line within the statement as a list.

as_section(self, incl_sql_tag: Optional[bool] = None, result_wrap: Optional[str] = None) → snowmobile.core.Section ¶: Returns current statement as a Section object.

set_state(self, index: Optional[int] = None, ctx_id: Optional[int] = None, in_context: Optional[bool] = None, filters: dict = None) → snowmobile.core.statement.Statement ¶

Sets current state/context on a statement object.

Parameters

ctx_id (int) – Unix timestamp the script.filter() context manager was invoked.
filters (dict) – Kwargs passed to script.filter().
index (int) – Integer to set as the statement’s index position.

reset(self, index: bool = False, ctx_id: bool = False, in_context: bool = False, scope: bool = False) → snowmobile.core.statement.Statement ¶

Resets attributes on the statement object to reflect as if read from source.

In its current form, includes:

Resetting the statement/wrap’s index to their original values.
Resetting the is_included attribute of the statement’s wrap to True.
Populating error_last with errors from current context.
Caching current context’s timestamp and resetting back to None.

process(self)¶: Used by derived classes for post-processing the returned results.

run(self, as_df: bool = True, lower: bool = True, render: bool = False, on_error: Optional[str] = None, on_exception: Optional[str] = None, on_failure: Optional[str] = None, ctx_id: Optional[int] = None) → snowmobile.core.statement.Statement ¶

Run method for all statement objects.

Parameters

as_df (bool) – Store results of query as pandas.DataFrame or SnowflakeCursor.
lower (bool) – Lower case column names in results DataFrame if as_df=True.
render (bool) – Render the sql executed as markdown.
on_error (str) –
Behavior if an execution/database error is encountered
- None: default behavior, exception will be raised
- c: continue with execution
on_exception (str) –
Behavior if an exception is raised in the post-processing of results from a derived class of Statement ( Empty and Diff).
- None: default behavior, exception will be raised
- c: continue with execution
on_failure (str) –
Behavior if no error is encountered in execution or post-processing but the result of the post-processing has turned the statement’s outcome attribute to False, indicating the results returned by the statement have failed validation.
- None: default behavior, exception will be raised
- c: continue with execution

Returns (Statement):: Statement object post-executing query.

outcome_txt(self, _id: Optional[int] = None) → str ¶: Outcome as a string.

property outcome_html(self) → str ¶: Outcome as an html admonition banner.

class snowmobile.core.Column(original: str, current: Optional[str] = None, prior: Optional[str] = None, src: Optional[str] = None)¶

Bases: snowmobile.core.Generic

A single column within a SnowFrame.

original¶

Original column name.

Type: str

current¶

Current version of the column name.

Type: str

prior¶

Prior version of the column name (version n-1).

Type: str

src¶

Column source (original df or added by snowmobile).

Type: str

original¶

Original column name.

Type: str

src¶

Source of column; ‘df’ if from source DataFrame, ‘snowmobile’ otherwise.

Type: str

current¶

Current column name.

Type: str

prior¶

Prior (version of) column name.

Type: str

update(self)¶: Migrate from prior to current context within this context.

lower(self) → str ¶: Lower case column.

upper(self) → str ¶: Upper case column.

static dedupe(current: str, char: Optional[str] = None) → str ¶

Dedupes consecutive characters within a string.

Note

Must iterate through matches and perform replacements in the order of the largest to the smallest by number of characters; this is to avoid altering the matches found before replacing them.

Parameters

current (str) – String containing characters to dedupe.
char (str) – Character to dedupe.

reformat(self, fill_char: Optional[str] = None, dedupe_special: bool = True) → str ¶

Reformat column for a load to the database.

Parameters

fill_char (str) – Character to replace special characters and whitespace with; defaults to _.
dedupe_special (bool) – Dedupe consecutive special characters; defaults to True.

class snowmobile.core.Diff(sn: snowmobile.core.connection.Snowmobile = None, **kwargs)¶

Bases: snowmobile.core.qa.QA

QA class for comparison of values within a table based on partitioning on a field.

partition_on¶

Column name to partition data on before comparing the partitioned datasets; defaults to ‘src_description`.

Type: str

end_index_at¶

Column name that marks the last column to use as an index column when joining the partitioned datasets back together.

Type: str

compare_patterns¶

Regex patterns to match columns on that should be included in comparison (numeric columns you’re running QA on).

Type: list

ignore_patterns¶

Regex patterns to match columns on that should be ignored both for the comparison and the index.

Type: list

generic_metric_col_nm¶

Column name to use for the melted field names; defaults to ‘Metric’.

Type: str

compare_cols¶

Columns that are used in comparison once statement is executed and parsing is applied.

Type: list

drop_cols¶

Columns that are dropped once statement is executed and parsing is applied.

Type: list

idx_cols¶

Columns that are used for the index to join the data back together once statement is executed and parsing is applied.

Type: list

ub_raw¶

Maximum absolute raw difference (upper bound) that two fields that are being compared can differ from each other without causing a failure.

Type: float

ub_perc¶

Maximum absolute percentage difference (upper bound) that two comparison fields can differ from each other without causing a failure.

Type: float

Instantiates a qa-diff statement.

Parameters

delta_column_suffix (str) – Suffix to add to columns that comparison is being run on; defaults to ‘Delta’.
partition_on (str) – Column to partition the data on in order to compare.
end_index_at (str) – Column name that marks the last column to use as an index when joining the partitioned datasets back together.
compare_patterns (list) – Regex patterns matching columns to be included in comparison.
ignore_patterns (list) – Regex patterns to match columns on that should be ignored both for the comparison and the index.
generic_metric_col_nm (str) – Column name to use for the melted field names; defaults to ‘Metric’.
raw_upper_bound (float) – Maximum absolute raw difference that two fields that are being compared can differ from each other without causing a failure.
percentage_upper_bound (float) – Maximum absolute percentage difference that two comparison fields can differ from each other without causing a failure.

split_cols(self) → snowmobile.core.qa.Diff ¶

Post-processes results returned from a qa-diff statement.

Executes private methods to split columns into:

Index columns
Drop columns
Comparison columns

Then runs checks needed to ensure minimum requirements are met in order for a valid partition/comparison to be made.

property partitioned_by(self) → Set[Any]¶: Distinct values within the partition_on column that data is partitioned by.

static partitions_are_equal(partitions: Dict[str, pd.DataFrame], abs_tol: float, rel_tol: float) → bool ¶

Evaluates if a dictionary of DataFrames are identical.

Parameters

partitions (Dict[str, pd.DataFrame]) – A dictionary of DataFrames returned by snowmobile.DataFrame().
abs_tol (float) – Absolute tolerance for difference in any value amongst the DataFrames being compared.
rel_tol (float) – Relative tolerance for difference in any value amongst the DataFrames being compared.

Returns (bool):: Indication of equality amongst all the DataFrames contained in partitions.

process(self) → snowmobile.core.qa.Diff ¶: Post-processing for Diff-specific results.

class snowmobile.core.Empty(sn: snowmobile.core.connection.Snowmobile, **kwargs)¶

Bases: snowmobile.core.qa.QA

QA class for verification that a statement’s results are empty.

The most widely applicable use of Empty is for simple verification that a table’s dimensions are as expected.

Initialize self. See help(type(self)) for accurate signature.

process(self) → snowmobile.core.qa.QA ¶: Over-ride method; checks if results are empty and updates outcome

class snowmobile.core.SnowFrame(df: pandas.DataFrame)¶

Bases: snowmobile.core.Generic

Extends a DataFrame with a .snf entry point.

shared_cols(self, df2: pandas.DataFrame) → List[Tuple[pd.Series, pd.Series]]¶: Returns list of tuples containing column pairs that are common between two DataFrames.

static series_max_diff_abs(col1: pandas.Series, col2: pandas.Series, tolerance: float) → bool ¶: Determines if the max absolute difference between two pandas.Series is within a tolerance level.

static series_max_diff_rel(col1: pandas.Series, col2: pandas.Series, tolerance: float) → bool ¶: Determines if the maximum relative difference between two pandas.Series is within a tolerance level.

df_max_diff_abs(self, df2: pandas.DataFrame, tolerance: float) → bool ¶: Determines if the maximum absolute difference between any value in the shared columns of 2 DataFrames is within a tolerance level.

df_max_diff_rel(self, df2: pandas.DataFrame, tolerance: float) → bool ¶: Determines if the maximum relative difference between any value in the shared columns of 2 DataFrames is within a tolerance level.

df_diff(self, df2: pandas.DataFrame, abs_tol: Optional[float] = None, rel_tol: Optional[float] = None) → bool ¶

Determines if the column-wise difference between two DataFrames is within a relative or absolute tolerance level.

Note

df1 and df2 are assumed to have a shared, pre-defined index.
Exactly one of abs_tol and rel_tol is expected to be a a valid float; the other is expected to be None.
If valid float values are provided for both abs_tol and rel_tol, the outcome of the maximum absolute difference with respect to abs_tol will be returned regardless of the value of rel_tol.

Parameters

df2 (pd.DataFrame) – 2nd DataFrame for comparison.
abs_tol (float) – Absolute tolerance; default is None.
rel_tol (float) – Relative tolerance; default is None.

Returns (bool):: Boolean indicating whether or not difference is within tolerance.

partitions(self, on: str) → Dict[str, pd.DataFrame]¶

Returns a dictionary of DataFrames given a DataFrame and a partition column.

Note

The number of distinct values within partition_on column will be 1:1 with the number of partitions that are returned.
The partition_on column is dropped from the partitions that are returned.
The depth of a vertical concatenation of all partitions should equal the depth of the original DataFrame.

Parameters: on (str) – The column name to use for partitioning the data.

Returns (Dict[str, pd.DataFrame]):: Dictionary of {(str) partition_value: (pd.DataFrame) associated subset of df}

ddl(self, table: str) → str ¶: Returns a string containing ‘create table’ DDL given a table name

lower(self, col: Optional[str] = None) → pandas.DataFrame ¶: Lower cases all column names or all values within col if pr.

upper(self, col: Optional[str] = None) → pandas.DataFrame ¶: Upper cases all column names or all values within col if pr.

reformat(self)¶: Re-formats DataFrame’s columns via Column.reformat().

append_dupe_suffix(self)¶: Adds a trailing index number ‘_i’ to duplicate column names.

to_list(self, col: Optional[str] = None, n: Optional[int] = None) → List¶

Succinctly retrieves a column as a list.

Parameters

col (str) – Name of column.
n (int) – Number of records to return; defaults to full depth of column.

add_tmstmp(self, col_nm: Optional[str] = None) → pandas.DataFrame ¶

Adds a column containing the current timestamp to a DataFrame.

Parameters: col_nm (str) – Name for column; defaults to LOADED_TMSTMP.

property original(self) → pandas.DataFrame ¶: Returns the DataFrame in its original form (drops columns added by SnowFrame and reverts to original column names).

property has_dupes(self) → bool ¶: DataFrame has duplicate column names.

cols_matching(self, patterns: List[str], ignore_patterns: List[str] = None) → List[str]¶

Returns a list of columns given a list of patterns to find.

Parameters

patterns (List[str]) – List of regex patterns to match columns on.
ignore_patterns (List[str]) – Optional list of regex patterns to exclude.

Returns (List[str]):: List of columns found/excluded.

cols_ending(self, nm: str, ignore_patterns: Optional[List] = None) → List[str]¶

Returns all columns up to nm in a DataFrame.

Parameters

nm (str) – Name of column to end index at.
ignore_patterns (List[str]) – Optional list of regex patterns to exclude in the list that’s returned; primarily used to for getting end-index-at list while excluding src_description.

Returns (List[str]):: List of column names matching criterion.

class snowmobile.core.SQL(_query_func: Callable, _cfg: snowmobile.core.configuration.Configuration, nm: Optional[str] = None, schema: Optional[str] = None, obj: Optional[str] = None, auto_run: Optional[bool] = True)¶

Bases: snowmobile.core.Generic

SQL class for generation & execution of common sql commands.

Intended to be interacted with as a parent of Snowmobile.

Note

All arguments except for sn are optional.
The benefit of setting the other attributes on an instance of SQL is to (optionally) avoid passing the same information to multiple methods when generating a variety of statements around the same object.

nm¶

Object name to use in generated sql (e.g. ‘some_table_name’)

Type: str

obj¶

Object type to use in generated sql (e.g. ‘table’)

Type: str

schema¶

Schema to use when dot-prefixing sql; defaults to the schema with which the sn is connected to.

Type: str

auto_run¶

Indicates whether to automatically execute the sql generated by a given method; defaults to True

Type: bool

Initializes a snowmobile.SQL object.

info_schema(self, loc: str, where: Optional[List[str]] = None, fields: Optional[List[str]] = None, order_by: Optional[List] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶: Generic case of selecting from information schema location.

table_info(self, nm: Optional[str] = None, fields: List[str] = None, restrictions: Dict[str, str] = None, order_by: List[Optional[str, int]] = None, all_schemas: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶

Query information_schema.tables for a given table or view.

Parameters

nm (str) – Table name, including schema if creating a stage outside of the current schema.
fields (List[str]) – List of fields to include in returned results (e.g. [‘table_name’, ‘table_type’, ‘last_altered’])
restrictions (List[str]) – List of conditionals typed as literal components of a where clause (e.g. [“table_type = ‘base table’”, ‘last_altered::date = current_date()’] ).
order_by (List[str]) – List of fields or their ordinal positions to order the results by.
all_schemas (bool) – Include tables/views from all schemas; defaults to False.
run (bool) – Determines whether to run the generated sql or not; defaults to None which will reference the current value of the auto_run attribute which defaults to True.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

column_info(self, nm: Optional[str] = None, fields: Optional[List] = None, restrictions: Optional[Dict] = None, order_by: Optional[List] = None, all_schemas: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶

Query information_schema.columns for a given table or view.

Parameters

nm (str) – Table name, including schema if creating a stage outside of the current schema.
fields (List[str]) – List of fields to include in returned results (e.g. [‘ordinal_position’, ‘column_name’, ‘data_type’])
restrictions (List[str]) – List of conditionals typed as literal components of a where clause (e.g.[“regexp_count(lower(column_name), ‘tmstmp’) = 0”]).
order_by (List[str]) – List of fields or their ordinal positions to order the results by.
all_schemas (bool) – Include tables/views from all schemas; defaults to False.
run (bool) – Determines whether to run the generated sql or not; defaults to None which will reference the current value of the auto_run attribute which defaults to True.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

columns(self, nm: Optional[str] = None, from_info_schema: bool = False, lower: bool = False, run: Optional[bool] = None) → Union[str, List]¶

Returns an ordered list of columns for a table or view.

Note

Default behavior is to retrieve the columns for a table or view by selecting a single sample record and returning the column index from the DataFrame that’s returned which is much faster than selecting the column_names from information_schema.columns pulling column names from the information schema
This can be changed by passing from_info_schema=True.

Parameters

nm (str) – Name of table or view, including schema if the table or view is outside of the current schema.
from_info_schema (bool) – Indicates whether to retrieve columns via the information_schema.columns or by selecting a sample record from the table or view; defaults to False.
lower (bool) – Lower case each column in the list that’s returned.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, List]):

Either:

An ordered list of columns for the table or view, or
The query against the table or view as a str of sql.

select(self, nm: Optional[str] = None, fields: Optional[List[str]] = None, apply: Optional[List[Tuple[str, str]]] = None, n: Optional[int] = None, run: Optional[bool] = None, **kwargs) → Union[str, pd.DataFrame]¶

Generic select statement.

Parameters

nm (str) – Table to select from, including schema if the table is outside of the current schema
fields (Optional[List[str]]) – Select these fields (optional).
apply (Optional[List[Tuple[str, str]]]) –
Select aggregations of these fields.

apply [
(this_func, to_this_field, [as_alias]), (.., .., [..]),

]
- apply should be provided as a list of tuples, each containing a minimum of 2 items (respectively) representing the aggregate function to apply and the field to which it should be applied
- By default, the aggregated result inherits the name of the field being aggregated, including any qualifier (optionally) provided with the field name or an explicit alias included as a 3rd item within the tuple
  
  The following snippet exhaustively illustrates the functionality described above
```
sn.select(
   nm='sandbox.sample_table',
   apply=[
       ('count', 'col1'),
       ('count', 'distinct col1'),
       ('count', 'distinct col1', 'col1_dst'),
   ],
   run=False,
)
>>>
select
 count(col1)
   as col1
 ,count(distinct col1)
   as distinct_col1
 ,count(distinct col1)
   as col1_dst
from sandbox.sample_table
```
n (int) – Number of records to return, implemented as a ‘limit’ clause in the query; defaults to 1.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

exists(self, nm: Optional[str] = None) → bool ¶

Checks the existence of a table or view.

Parameters: nm (str) – Name of table or view, including schema if the table or view is outside of the current schema.

Returns (bool):: Boolean indication of whether or not the table or view exists.

is_distinct(self, nm: Optional[str] = None, field: Optional[str] = None) → bool ¶

Checks if table nm is distinct on column on_col

Parameters

nm (str) – Table name.
field (str) – Column name.

count(self, nm: Optional[str] = None, of: Optional[str] = None, dst_of: Optional[str] = None, as_perc: Optional[bool] = None, run: Optional[bool] = None) → Union[int, float]¶

Number of records within a table or view.

Parameters

nm (str) – Table name, including schema if querying outside current schema.
of (str) – Column name (indistinct).
dst_of (str) – Column name (distinct).
as_perc (bool) – Option to return distinct count of the dst_of column as a percentage of the namespace depth of the table or view.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

show(self, obj: str, in_loc: Optional[str] = None, names: bool = False, run: Optional[bool] = None, **kwargs) → Union[pd.DataFrame, List[str], str]¶

Show schema objects of typ ‘obj’, optionally ‘in_loc’.

Parameters

obj (str) – Schema object type (‘tables’, ‘file formats’, etc).
in_loc (str) – Snowflake location (‘in schema sandbox’, ‘in database prod’, etc).
names (bool) – Return a list of schema object names only (‘name’ field).
run (bool) – Execute the generated sql or return it as a string.

Returns (Union[pd.DataFrame, str]):

Either:

The results of the query as a pandas.DataFrame
The ‘names’ column of the results returned as a list
The generated query as a str of sql

ddl(self, nm: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None) → str ¶

Query the DDL for an schema object.

Parameters

nm (str) – Name of the object to get DDL for, including schema if object is outside of the current schema.
obj (str) – Type of object to get DDL for (e.g. ‘table’, ‘view’, ‘file-format’).
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (str):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

comment(self, nm: Optional[str] = None, obj: Optional[str] = None, set_as: Optional[str] = None, from_json: bool = False, as_json: bool = False, run: Optional[bool] = None, **kwargs) → Union[str, Dict]¶

Get or set comment on a schema object.

Parameters

nm (str) – Name of the schema object, including schema prefix if object is outside implicit scope of the current connection.
obj (str) – Type of schema object (e.g. ‘table’, ‘schema’, etc).
set_as (str) – Content to set as comment on schema object.
from_json (bool) – Parse schema object comment as a string of json and return it as a dictionary.
as_json (bool) – Dump contents of ‘set_as’ to a string of json prior to setting comment.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
**kwargs – Keyword argument to pass to json.loads(comment) if from_json=True.

Returns (Union[str, pd.DataFrame]):

Either:

The schema object comment as a str
The generated query as a str of sql.
The schema object comment as a dictionary if from_json=True

last_altered(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.Timestamp]¶

Last altered timestamp for a table or view.

Parameters

nm (str) – Table name, including schema if creating a stage outside of the current schema.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

truncate(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶

Truncate a table.

Parameters

nm (str) – Name of table, including schema if the table is outside of the current schema.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

drop(self, nm: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶

Drop a Snowflake object.

Parameters

nm (str) – Schema object’s name.
obj (str) – Type of schema object (e.g. ‘table’, ‘view’, or ‘schema’)
run (bool) – Execute generated statement; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

clone(self, nm: Optional[str] = None, to: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None, replace: bool = False) → Union[str, pd.DataFrame]¶

Clone a Snowflake object.

Warning

Make sure to read Snowflake’s documentation for restrictions and considerations when cloning objects.

Note

In this specific method, the value provided to nm and to can be a single object name, a single schema, or both in the form of obj_schema.obj_name depending on the desired outcome.
Additionally, at least one of the nm or to arguments must be pr.
The defaults for the target object are constructed such that users can either:
1. Clone objects to other schemas that inherit the source object’s name without specifying so in the to argument, or
2. Clone objects within the current schema that inherit the source object’s schema without specifying so in the to argument.
If providing a schema without a name to either argument, prefix the value provided with __ to signify it’s a schema and not a lower-level object to be cloned.
- e.g. providing nm=’sample_table’ and to=’__sandbox’ will clone sample_table from the current schema to sandbox.sample_table.
An assertion error will be raised raised if neither argument is specified as this would result in a command to clone an object and store it in an object that has the same name & schema as the object being cloned.

Parameters

nm (str) – Name of the object to clone, including schema if cloning an object outside of the current schema.
to (str) – Target name for cloned object, including schema if cloning an object outside of the current schema.
obj (str) – Type of object to clone (e.g. ‘table’, ‘view’, ‘file-format’); defaults to table.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
replace (bool) – Indicates whether to replace an existing stage if pre-existing; default is False.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

create_stage(self, nm_stage: str, nm_format: str, replace: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶

Create a staging table.

Parameters

nm_stage (str) – Name of stage to create, including schema if creating a stage outside of the current schema.
nm_format (str) – Name of file format to specify for the stage, including schema if using a format from outside of the current schema.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
replace (bool) – Indicates whether to replace an existing stage if pre-existing; default is False.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

put_file_from_stage(self, path: Union[Path, str], nm_stage: str, options: Optional[Dict] = None, ignore_defaults: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶

Generates a ‘put’ command into a staging table from a local file.

Parameters

path (Union[Path, str]) – Path to local data file as a pathlib.Path or string.
nm_stage (str) – Name of the staging table to load into.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
options (dict) – Optional arguments to add to put statement in addition to the values specified in the loading.put section of snowmobile.toml.
ignore_defaults (bool) – Option to ignore the values specified in snowmobile.toml; defaults to False.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

copy_into_table_from_stage(self, nm: str, nm_stage: str, options: Optional[Dict] = None, ignore_defaults: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶

Generates a command to copy data into a table from a staging table.

Parameters

nm (str) – Name of the object to drop, including schema if creating a stage outside of the current schema.
nm_stage (str) – Name of the staging table to load from.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
options (dict) – Optional arguments to add to put statement in addition to the values specified in the loading.put section of snowmobile.toml.
ignore_defaults (bool) – Option to ignore the values specified in snowmobile.toml; defaults to False.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

current(self, obj: str, run: Optional[bool] = None) → Union[str, Union[str, int]]¶

Generic implementation of ‘select current’ for session-based objects.

Parameters

obj (str) – Type of object to retrieve information for (schema, session, ..).
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

current_session(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶: Select the current session.

current_schema(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶: Select the current schema.

current_database(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶: Select the current database.

current_warehouse(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶: Select the current warehouse.

current_role(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶: Select the current role.

use(self, obj: str, nm: str, run: Optional[bool] = None)¶

Generic implementation of ‘use’ command for schema objects.

Parameters

nm (str) – Name of object to use (schema name, warehouse name, role name, ..).
obj (str) – Type of object to use (schema, warehouse, role, ..).
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):

Either:

The results of the query as a pandas.DataFrame, or
The generated query as a str of sql.

use_schema(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶: Use schema command.

use_database(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶: Use database command.

use_warehouse(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶: Use warehouse command.

use_role(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶: Use role command.

static order(by: List[Union[int, str]]) → str ¶: Generates ‘order by’ clause from a list of fields or field ordinal positions.

static where(restrictions: Dict) → str ¶

Generates a ‘where’ clause based on a dictionary of restrictions.

Parameters: restrictions (dict) – A dictionary of conditionals where each key/value pair respectively represents the left/right side of a condition within a ‘where’ clause.

Returns (str):: Formatted where clause.

static fields(fields: Optional[List[str]] = None) → str ¶: Utility to generate fields within a ‘select’ statement.

class snowmobile.core.Markup(sn: snowmobile.core.connection.Snowmobile, path: pathlib.Path, contents: Dict[int, Union[Statement, Marker]], nm: Optional[str] = None, prefix: Optional[str] = None, suffix: Optional[str] = None, root_dir: Optional[Union[str, Path]] = None, sub_dir: Optional[str] = None, incl_sql: bool = True, incl_markers: bool = True, incl_sql_tag: bool = False, incl_exp_ctx: bool = True, result_wrap: Optional[str] = None)¶

Bases: snowmobile.core.Generic

Contains all sections within the context of a Script.

Parameters

sn (Snowmobile) – A Snowmobile instance.
path (Path) – A full path to the sql file that script was instantiated from.
contents (Dict[int, Union[Statement, Marker]]) – A dictionary of the script’s contents (st and markers) by index position.
nm (Optional[str]) – Alternate file name to use; defaults to path.name.
prefix (Optional[str]) – Prefix to prepend to original file name when exporting.
suffix (Optional[str]) – Suffix to append to original file name when exporting.
root_dir (Optional[Union[str, Path]]) – Alternate target directory for exports; defaults to ./.snowmobile where . is the directory containing the sql file that the script was created from.
sub_dir (Optional[str]) – Alternate sub-directory name; defaults to path.name where path is a full Path to the sql file that the script was created from.
incl_sql (bool) – Include statements in export.
incl_markers (bool) – Include markers in export.
incl_sql_tag (bool) – Include the raw wrap in the sql that is rendered in the md export.
incl_exp_ctx (bool) – Include (configurable) disclaimer at the top of exported sql file.

exported¶

List of file paths that current instance has exported to.

Type: List[Path]

created¶

List of directory paths that current instance has created (should mostly apply for initial scaffolding build on first run only).

Type: List[Path]

property export_dir(self) → pathlib.Path ¶: Documentation sub-directory; .snowmobile by default.

property sections(self) → Dict[int, Section]¶: Dictionary of all sections by index position.

property markdown(self) → str ¶: Full markdown file as a string.

property sql(self)¶: SQL for save.

save(self, md: bool = True, sql: bool = True) → None ¶

Save files to disk.

Parameters

md (bool) – Export a generated markdown file.
sql (bool) – Export a generated sql file.

class snowmobile.core.Script(sn: Optional[Snowmobile] = None, path: Optional[Path, str] = None, sql: Optional[str] = None, as_generic: bool = False, delay: bool = True, **kwargs)¶

Bases: snowmobile.core.Generic

Parser and operator of local sql files.

Parameters

sn (snowmobile.core.connection.Snowmobile) – An instance of Snowmobile.
path (Optional[str]) – A full path to a sql file or readable text file containing valid sql code.
path – A raw string of valid sql code as opposed to reading from a path.
as_generic (bool) – Instantiate all statements as generic st; skips all checks for a mapping of a statement anchor to a derived statement class to instantiate in the place of a generic Statement.
delay (bool) – Delay connection of the Snowmobile; only applicable if the sn argument is omitted and Script is instantiating a Snowmobile in its absence.
**kwargs – Any keyword arguments to pass to Snowmobile; only applicable if the sn argument is omitted and Script is instantiating a Snowmobile in its absence

sn¶

An instance of Snowmobile

Type: snowmobile.core.connection.Snowmobile

patterns¶

Configured patterns from snowmobile.toml.

Type: snowmobile.core.cfg.script.Pattern

as_generic¶

Instantiate all statements as generic st; skips all checks for a mapping of a statement anchor to a derived statement class to instantiate in the place of a generic Statement.

Type: bool

filters¶

Dictionary of filters that have been passed to the current instance of snowmobile.core.Script.

Type: Dict[Any[str, int], Dict[str, Set]]

markers¶

Dictionary of all markers found in the script.

Type: Dict[int, cfg.Marker]

path¶

Path to sql file (e.g. full/path/to/script.sql).

Type: Path

name¶

Name of sql file (e.g. script.sql).

Type: str

source¶

Raw sql text of script; will be the text contained in the raw sql file when initially read from source and reflect any modifications to the script’s contents made post-instantiation.

Type: str

read(self, path: pathlib.Path = None) → snowmobile.core.script.Script ¶

Runs quick path validation and reads in a sql file as a string.

A valid path must be provided if the script.path attribute hasn’t been set; ValueErrors will be thrown if neither is valid.

Parameters: path (pathlib.Path) – Full path to a sql object.

from_str(self, sql: str, name: str, directory: pathlib.Path = Path.cwd()) → snowmobile.core.script.Script ¶: Instantiates a raw string of sql as a script.

source(self, original: bool = False) → str ¶: The script’s sql as a raw string.

parse_one(self, s: Union[sqlparse.sql.Statement, str], index: Optional[int] = None, nm: Optional[str] = None) → None ¶

Adds a statement object to the script.

Default behavior will only add sqlparse.sql.Statement objects returned from script.source_stream.

clean_parse() utility function is utilized so that generated sql within Python can be inserted back into the script as raw strings.

Parameters

s (Union[sqlparse.sql.Statement, str]) – A sqlparse.sql.Statement object or a raw string of SQL for an individual statement.
index (int) – Index position of the statement within the script; defaults to n + 1 if index is not provided where n is the number of statements within the script at the time parse_one() is called.
nm (Optional[str]) – Optionally provided the name of the statement being added; the script instance will treat this value as if it were provided within an in-script wrap.

parse_stream(self, stream: str) → None ¶: Parses a stream of sql and adds onto existing Script contents.

filter(self, incl_kw: Optional[List[str], str] = None, incl_obj: Optional[List[str], str] = None, incl_desc: Optional[List[str], str] = None, incl_anchor: Optional[List[str], str] = None, incl_nm: Optional[List[str], str] = None, excl_kw: Optional[List[str], str] = None, excl_obj: Optional[List[str], str] = None, excl_desc: Optional[List[str], str] = None, excl_anchor: Optional[List[str], str] = None, excl_nm: Optional[List[str], str] = None, as_id: Optional[Union[str, int]] = None, from_id: Optional[Union[str, int]] = None, last: bool = False) → ContextManager[Script]¶

Subset the script based on attributes of its st.

script.filter() returns a modified instance of script that can be operated on within the context defined.

Note

Keyword arguments beginning with incl or excl expect a string or a list of strings containing regex patterns with which to check for a match against the associated attribute of its st’ Name.

Parameters

incl_kw – Include only kw
incl_obj – Include only obj
incl_desc – Include only desc
incl_anchor – Include only anchor
incl_nm – Include only nm
excl_kw – Exclude kw
excl_obj – Exclude obj
excl_desc – Exclude desc
excl_anchor – Exclude anchor
excl_nm – Exclude nm
as_id – ID to assign the filters passed to method; used to populated the filters attribute
from_id – ID previously used on the same instance of Script from which to populate filtered arguments
last – Re-use the last set of filters passed to context manager.

Returns (Script):: The instance of script based on the context imposed by arguments pr.

property depth(self) → int ¶: Count of statements in the script.

property lines(self) → int ¶: Number of lines in the script

property excluded(self)¶: All statements by index position excluded from the current context.

property executed(self) → Dict[int, Statement]¶: Executed statements by index position included in the current context.

reset(self, index: bool = False, ctx_id: bool = False, in_context: bool = False, scope: bool = False, _filter: bool = False) → snowmobile.core.script.Script ¶

Resets indices and scope on all statements to their state as read from source.

Invoked before exiting filter() context manger to reverse the revised indices set by index_to() and inclusion/ exclusion scope set by Statement.Name.scope().

property duplicates(self) → Dict[str, int]¶: Dictionary of indistinct statement names/tags within script.

s(self, _id: Optional[str, int] = None) → Any[Statement, Empty, Diff]¶: Fetch a single statement by _id.

property st(self) → Dict[Union[int, str], Statement]¶: Accessor for all statements.

dtl(self, full: bool = False, excluded: bool = False, title: bool = True, r: bool = False) → Union[str, None]¶: Prints summary of statements within the current scope to console.

property first_s(self)¶: First statement by index position.

property last_s(self)¶: Last statement by index position

property first(self) → Union[Statement, Empty, Diff]¶: First statement executed.

property last(self) → Union[Statement, Empty, Diff]¶: Last statement executed.

doc(self, nm: Optional[str] = None, prefix: Optional[str] = None, suffix: Optional[str] = None, incl_markers: Optional[bool] = True, incl_sql: Optional[bool] = True, incl_exp_ctx: Optional[bool] = True, result_wrap: Optional[str] = None) → snowmobile.core.Markup ¶

Returns a Markup from the script.

Parameters

nm (Optional[str]) – Alternate file name to use.
prefix (Optional[str]) – Prefix for file name.
suffix (Optional[str]) – Suffix for file name.
incl_markers (Optional[bool]) – Include markers in exported files.
incl_sql (Optional[bool]) – Include sql in exported files.
incl_exp_ctx (Optional[bool]) – Include disclaimer of programmatic save in exported sql file.

Returns

A Markup instance based on the contents included in the script’s context.

ids(self, _id: Optional[Union[Tuple, List]] = None) → List[int]¶

Utility function to get a list of statement IDs given an _id.

Invoked within script.run() if the _id parameter is either a:

tuple of integers (lower and upper bound of statement indices to run)
list of integers or strings (statement names or indices to run)
default=None; returns all statement indices within scope if so

Parameters: _id (Union[Tuple, List]) – _id field provided to script.run() if it’s neither an integer or a string.

Returns (List[int]):: A list of statement indices to run.

run(self, _id: Optional[str, int, Tuple[int, int], List] = None, as_df: bool = True, on_error: Optional[str] = None, on_exception: Optional[str] = None, on_failure: Optional[str] = None, lower: bool = True, render: bool = False, **kwargs) → None ¶

Performs statement-by-statement execution of the script’s contents.

Executes script’s contents that are included within its current context and any (optional) value passed to the _id argument.

Note

Keyword arguments on_exception and on_failure are only applicable to derived classes of Statement (e.g., those within snowmobile.core.qa by default).

Parameters

_id (Optional[str, int, Tuple[int, int], List]) –
Identifier for statement(s) to execute, can be either:
- None (default); execute all statements
- A single statement’s nm
- A single statement’s index position
- A tuple of lower/upper index bounds of statements to execute
- A list of statement names or index positions to execute
as_df (bool) – Store statement’s results as a DataFrame; defaults to True
on_error (Optional[str]) – Action to take on execution error; providing c will continue execution as opposed to raising exception.
on_exception (Optional[str]) – Action to take on post-processing error from a derived Statement; providing c will continue execution as opposed to raising exception.
on_failure (Optional[str]) – Action to take on failure of post-processing assertion from a derived Statement; providing c will continue execution as opposed to raising exception.
lower (bool) – Lower-case columns in results returned if as_df=True.
render (bool) – Render sql executed as markdown; only applicable in Jupyter/iPython environments.
**kwargs –

items(self, by_index: bool = True, ignore_scope: bool = False, statements: bool = True, markers: bool = False, validate: bool = True) → ItemsView[Union[int, str], Union[Statement, Marker]]¶: Dunder items.

keys(self, **kwargs) → KeysView[Union[int, str]]¶: Access keys of items only.

values(self, **kwargs) → ValuesView[Union[int, str]]¶: Access values of items only.

dict(self, **kwargs) → Dict¶: Unpacking items view into an actual dictionary.

class snowmobile.core.Table(df: pandas.DataFrame, table: str, sn: Optional[Snowmobile] = None, if_exists: Optional[str] = None, as_is: bool = False, path_ddl: Optional[Union[str, Path]] = None, path_output: Optional[str, Path] = None, file_format: Optional[str] = None, incl_tmstmp: Optional[bool] = None, tmstmp_col_nm: Optional[str] = None, reformat_cols: Optional[bool] = None, validate_format: Optional[bool] = None, validate_table: Optional[bool] = None, upper_case_cols: Optional[bool] = None, lower_case_table: Optional[bool] = None, keep_local: Optional[bool] = None, on_error: Optional[str] = None, check_dupes: Optional[bool] = None, load_copy: Optional[bool] = None, **kwargs)¶

Bases: snowmobile.core.Generic

Constructed with a DataFrame and a table name to load into.

The df and table’s compatibility can be inspected prior to calling the Table.load() method or by providing as_is=True` when instantiating the object; the latter will kick off the loading process invoked by .load() based on the parameters provided to snowmobile.Table().

Parameters

df (DataFrame) – The DataFrame to load.
table (str) – The table name to load df into.
sn (Optional[Snowmobile]) – An instance of Snowmobile; can be used to load a table on a specific connection or from a specific snowmobile.toml file.
if_exists (Optional[str]) – Action to take if table already exists - options are fail, replace, append, and truncate; defaults to append.
as_is (bool) – Load df into table based on the parameters provided to Table without further pre-inspection by the user; defaults to False.
path_ddl (Optional[Path]) – Alternate path to file format DDL to use for load.
keep_local (Optional[bool]) – Keep local file that is written out as part of the bulk loading process; defaults to False.
path_output (Optional[str Path]) – Path to write output local file to; defaults to a generated file name exported in the current working directory.
file_format (Optional[str]) – The name of the file_format to use when loading df; defaults to snowmobile_default_psv.
incl_tmstmp (Optional[bool]) – Include timestamp of load as part of table; defaults to True.
tmstmp_col_nm (Optional[str]) – Name to use for load timestamp if incl_tmstmp=True; defaults to loaded_tmstmp.
upper_case_cols (Optional[bool]) – Upper case columns of df when loading into table; defaults to True.
reformat_cols (Optional[bool]) –
Reformat applicable columns of df to be DB-compliant; defaults to True.
Reformatting primarily entails:
- Replacing spaces and special characters with underscores
- De-duping consecutive special characters
- De-duping repeated column names; adds an _i suffix to duplicate fields where i is the nth duplicate name for a field
validate_format (Optional[bool]) –
Validate the file format being used prior to kicking off the load; defaults to True.
Validation entails:
- Checking if the file format being used already exists based on formats accessible to the current connection
- Executing DDL for the file format being used if not, pulled from the DDL ext-location and the statement name create file format~{format name}
Tip

Providing validate_format=False will speed up loading time when batch-loading into an existing table by skipping this step
validate_table (Optional[bool]) –
Perform validations of df against table prior to kicking off the loading process; defaults to True.
Validation entails:
- Checking the existence of table; no further validation is performed if it does not exist
- Compares the columns of df to the columns of table and stores results for use during loading process
Note

Table validation results are used in conjunction with the if_exists parameter to determine the desired behavior based on the (potential) existence of table and its compatibility with df.

Tip

Providing validate_table=False will speed up loading time time when batch-loading into an existing table
lower_case_table (Optional[bool]) – Lower case table name; defaults to False.
on_error (Optional[str]) – Action to take if an exception is encountered as part of the validating or loading process - providing on_error='c' will continue past an exception as opposed to raising it; defaults to None meaning any exception encountered will be raised
check_dupes (Optional[bool]) – Check for duplicate field names in df; defaults to True.
load_copy (Optional[bool]) – Alter and load a deep copy of df as opposed to the df in-memory as passed to the parameter; defaults to True.

db_responses¶

Responses from database during loading process.

Type: Dict[str, str]

loaded¶

Table was loaded successfully.

Type: bool

load(self, if_exists: Optional[str] = None, from_script: pathlib.Path = None, verbose: bool = True, **kwargs) → snowmobile.core.table.Table ¶

Loads df into table.

Parameters

if_exists (Optional[str]) – Determines behavior to take if the table being loaded into already exists; defaults to append; options are replace, append, truncate, and fail
from_script (Optional[Union[Path, str]]) – Path to sql file containing custom DDL for table; DDL is assumed to have a valid statement name as is parsed by Script and following the naming convention of create table~TABLE where TABLE is equal to the value provided to the table keyword argument
verbose (bool) – Verbose console output; defaults to True

Returns (Table):: The Table after attempting load of df into table; a successful load can be verified by inspecting loaded

property exists(self) → bool ¶: Indicates if the target table exists.

col_diff(self, mismatched: bool = False) → Dict[int, Tuple[str, str]]¶: Returns diff detail of local DataFrame to in-warehouse table.

property cols_match(self) → bool ¶: Indicates if columns match between DataFrame and table.

load_statements(self, from_script: pathlib.Path) → List[str]¶: Generates exhaustive list of the statements to execute for a given instance of loading a DataFrame.

to_local(self, quote_all: bool = True) → None ¶: Export to local file via configuration in snowmobile.toml.

property tm_load(self) → int ¶: Seconds elapsed during loading.

property tm_validate_load(self) → int ¶: Seconds elapsed during validation.

property tm_total(self) → int ¶: Total seconds elapsed for load.

validate(self, if_exists: str) → None ¶

Validates load based on current state through a variety of operations.

Parameters: if_exists (str) – Desired behavior if table already exists; intended to be passed in from table.load() by default.

snowmobile.core¶

Subpackages¶

Submodules¶

Package Contents¶

Classes¶

`snowmobile.core`¶