snowmobile.core
¶
snowmobile
lives in snowmobile.core
to keep from cluttering intellisense/autocomplete while interacting with the API.
Subpackages¶
Submodules¶
Package Contents¶
Classes¶
Generic dunder implementation for | |
All snowmobile classes contain a | |
A parsed snowmobile.toml file. | |
Primary method of statement execution and accessor to parsed snowmobile.toml. | |
Primary method of statement execution and accessor to parsed snowmobile.toml. | |
Represents any (1-6 level) header section within Script Name (doc).md. | |
Handles the scope/context for | |
Handles the decomposition/parsing of statement name. | |
Base class for all | |
A single column within a | |
QA class for comparison of values within a table based on | |
QA class for verification that a statement’s results are empty. | |
Extends a | |
SQL class for generation & execution of common sql commands. | |
Contains all sections within the context of a | |
Parser and operator of local sql files. | |
Constructed with a |
- class
snowmobile.core.
Generic
¶ Bases:
object
Generic dunder implementation for
snowmobile
objects.Base class for all
snowmobile
objects that do not inherit from pydantic’s BaseModel or configuration class,Config
.
- class
snowmobile.core.
ExceptionHandler
(within: snowmobile.core.errors.Optional[Any] = None, ctx_id: snowmobile.core.errors.Optional[int] = None, in_context: bool = False, children: snowmobile.core.errors.Dict[int, Any] = None, is_active_parent: bool = False, to_mirror: snowmobile.core.errors.Optional[List[Any]] = None)¶ Bases:
snowmobile.core.Generic
All snowmobile classes contain a
ExceptionHandler
.- Parameters
within (Optional[Any]) – Class for which the ExceptionHandler is intended.
ctx_id (Optional[int]) – Context ID; set/unset by methods when entering/exiting certain contexts.
in_context (bool) – Class is currently within a specific
ctx_id
children (Dict[int, Any]) – Attributes of the
within
class for which theExceptionHandler
should mirror the methods called on the parent class. # TODO: Refactor this out; it’s essentially janky multi inheritanceis_active_parent (bool) – The
within
class is currently enforcing the context rules on itschildren
to_mirror (Optional[List[Any]]) – Methods called in the attr:`within class that should be applied to its
children
(i.e. set/reset context ID, etc)
- property
current
(self)¶ All exceptions in the current context.
-
collect
(self, e: Any[snowmobile_errors])¶ Stores an exception.
- property
first
(self) → snowmobile.core.errors.Error¶ First exception encountered.
- property
last
(self) → snowmobile.core.errors.Error¶ Last exception encountered.
-
seen
(self, from_ctx: snowmobile.core.errors.Optional[int] = None, of_type: snowmobile.core.errors.Optional[Any[snowmobile_errors], List[snowmobile_errors]] = None, to_raise: snowmobile.core.errors.Optional[bool] = None, with_ids: snowmobile.core.errors.Optional[int, List[int], Set[int]] = None, all_time: bool = False) → bool¶ Boolean indicator of if an exception has been seen.
-
get
(self, from_ctx: snowmobile.core.errors.Optional[int] = None, of_type: snowmobile.core.errors.Optional[Any[snowmobile_errors], List[snowmobile_errors]] = None, to_raise: snowmobile.core.errors.Optional[bool] = None, with_ids: snowmobile.core.errors.Optional[int, List[int], Set[int]] = None, all_time: bool = False, last: bool = False, first: bool = False, _raise: bool = False)¶ Boolean indicator of if an exception has been seen.
- property
ctx_id
(self)¶ Current context id.
-
set
(self, ctx_id: snowmobile.core.errors.Optional[int] = None, in_context: bool = False, outcome: snowmobile.core.errors.Optional[int] = None)¶ Set attributes on self.
-
set_from
(self, other: snowmobile.core.exception_handler.ExceptionHandler) → snowmobile.core.exception_handler.ExceptionHandler¶ Updates attributes of self with those from ‘other’.
-
reset
(self, ctx_id: bool = False, in_context: bool = False, outcome: bool = False) → snowmobile.core.exception_handler.ExceptionHandler¶ Resets attributes on self.
- property
by_tmstmp
(self)¶ All exceptions by timestamp, ordered by most to least recent.
- class
snowmobile.core.
Configuration
(creds: Optional[str] = None, config_file_nm: Optional[str] = None, from_config: Optional[Path, str] = None, export_dir: Optional[Path, str] = None, silence: bool = False)¶ Bases:
snowmobile.core.base.Generic
A parsed snowmobile.toml file.
All keyword arguments optional.
- Parameters
config_file_nm (Optional[str]) – Name of configuration file to use; defaults to snowmobile.toml.
creds (Optional[str]) – Alias for the set of credentials to authenticate with; default behavior will fall back to the connection.default-creds specified in snowmobile.toml, or the first set of credentials stored if this configuration option is left blank.
from_config (Optional[str, Path]) – A full path to a specific configuration file to use; bypasses any checks for a cached file location and can be useful for container-based processes with restricted access to the local file system.
export_dir (Optional[Path]) – Path to save a template snowmobile.toml file to; if pr, the file will be exported within the __init__ method and nothing else will be instantiated.
-
location
¶ Full path to configuration file.
- Type
-
connection
:Optional[cfg.Connection]¶ [connection] from snowmobile.toml.
-
loading
:Optional[cfg.Loading]¶ [loading] from snowmobile.toml.
-
script
:Optional[cfg.Script]¶ [script] from snowmobile.toml.
-
sql
:Optional[cfg.SQL]¶ [sql] from snowmobile-ext.toml.
-
ext_sources
:Optional[cfg.Location]¶ [external-sources] from snowmobile.toml.
- property
markdown
(self) → snowmobile.core.cfg.Markup¶ Accessor for cfg.script.markdown.
- property
attrs
(self) → snowmobile.core.cfg.Attributes¶ Accessor for cfg.script.markdown.attributes.
- property
wildcards
(self) → snowmobile.core.cfg.Wildcard¶ Accessor for cfg.script.patterns.wildcards.
- static
batch_set_attrs
(obj: Any, attrs: dict, to_none: bool = False)¶ Batch sets attributes on an object from a dictionary.
- Parameters
- Returns (Any):
Object post-setting attributes.
- static
attrs_from_obj
(obj: Any, within: Optional[List[str]] = None) → Dict[str, MethodType]¶ Utility to return attributes/properties from an object as a dictionary.
- static
methods_from_obj
(obj: Any, within: Optional[List[str]] = None) → Dict[str, MethodType]¶ Returns callable components of an object as a dictionary.
- property
scopes
(self)¶ All combinations of scope type and scope attribute.
-
scopes_from_kwargs
(self, only_populated: bool = False, **kwargs) → Dict¶ Turns script.filter() arguments into a valid set of kwargs for
Scope
.Returns dictionary of all combinations of ‘arg’ (“kw”, “obj”, “desc”, “anchor” and “nm”), including empty sets for any ‘arg’ not included in the keyword arguments provided.
-
scopes_from_tag
(self, t: Any)¶ Generates list of keyword arguments to instantiate all scopes for a wrap.
- class
snowmobile.core.
Snowmobile
(creds: Optional[str] = None, delay: bool = False, ensure_alive: bool = True, config_file_nm: Optional[str] = None, from_config: Optional[str, Path] = None, silence: bool = False, **connect_kwargs)¶ Bases:
snowmobile.core.sql.SQL
Primary method of statement execution and accessor to parsed snowmobile.toml.
- Parameters
creds (Optional[str]) – Alias for the set of credentials to authenticate with; default behavior will fall back to the
connection.default-creds
specified in snowmobile.toml, or the first set of credentials stored if this configuration option is left blank.delay (bool) – Optionally delay establishing a connection when the object is instantiated, enabling access to the configuration object model through the
Connection.cfg
attribute; defaults to False.ensure_alive (bool) – Establish a new connection if a method requiring a connection against the database is called while
alive
is False; defaults to True.config_file_nm (Optional[str]) – Name of configuration file to use; defaults to snowmobile.toml.
from_config (Optional[str, Path]) – A full path to a specific configuration file to use; bypasses any checks for a cached file location and can be useful for container-based processes with restricted access to the local file system.
**connect_kwargs –
Additional arguments to provide to snowflake.connector.connect(); arguments provided here will over-ride connection arguments specified in snowmobile.toml, including:
Connection parameters in connection.default-arguments
Credentials parameters associated with a given alias
Connection parameters associated with a given alias
Initializes a
snowmobile.SQL
object.-
cfg
:Configuration¶ snowmobile.toml
-
con
:Optional[SnowflakeConnection]¶ Can be None until set by
Snowmobile.connect()
- Type
SnowflakeConnection
-
e
:ExceptionHandler¶ Exception / context management
-
connect
(self, **kwargs) → snowmobile.core.connection.Snowmobile¶ Establishes connection to Snowflake.
Re-implements snowflake.connector.connect() with connection arguments sourced from snowmobile’s object model, specifically:
Credentials from snowmobile.toml.
Default connection arguments from snowmobile.toml.
Optional keyword arguments either passed to
snowmobile.connect()
or directly to this method.
- kwargs:
Optional keyword arguments to pass to snowflake.connector.connect(); arguments passed here will over-ride
connection.default-arguments
specified insnowmobile.toml
.
-
disconnect
(self) → snowmobile.core.connection.Snowmobile¶ Disconnect from connection with which Connection() was instantiated.
- property
cursor
(self) → snowflake.connector.connection.SnowflakeCursor¶ SnowflakeCursor
accessor.
- property
dictcursor
(self) → snowflake.connector.DictCursor¶ DictCursor
accessor.
-
ex
(self, sql: str, on_error: Optional[str] = None, **kwargs) → snowflake.connector.connection.SnowflakeCursor¶ Executes a command via
SnowflakeCursor
.- Parameters
- Returns (SnowflakeCursor):
SnowflakeCursor
object that executed the command.
-
exd
(self, sql: str, on_error: Optional[str] = None, **kwargs) → snowflake.connector.DictCursor¶ Executes a command via
DictCursor
.- Parameters
- Returns (DictCursor):
DictCursor
object that executed the command.
-
query
(self, sql: str, as_df: bool = False, as_cur: bool = False, as_dcur: bool = False, as_scalar: bool = False, lower: bool = True, on_error: Optional[str] = None) → Union[pd.DataFrame, SnowflakeCursor]¶ Execute a query and return results.
Default behavior of results=True will return results as a
pandas.DataFrame
, otherwise will execute the sql provided with aSnowflakeCursor
and return the cursor object.- Parameters
sql (str) – Raw SQL to execute.
as_df (bool) – Return results in DataFrame.
as_cur (bool) – Return results in Cursor.
as_dcur (bool) – Return results in a DictCursor.
as_scalar (bool) – Return results as a single scalar value.
lower (bool) – Boolean value indicating whether or not to return results with columns lower-cased.
on_error (str) – String value to impose a specific behavior if an error occurs during the execution of
sql
.
- Returns (Union[pd.DataFrame, SnowflakeCursor]):
Results from
sql
as aDataFrame
by default or theSnowflakeCursor
object if results=False.
- class
snowmobile.core.
connect
(creds: Optional[str] = None, delay: bool = False, ensure_alive: bool = True, config_file_nm: Optional[str] = None, from_config: Optional[str, Path] = None, silence: bool = False, **connect_kwargs)¶ Bases:
snowmobile.core.sql.SQL
Primary method of statement execution and accessor to parsed snowmobile.toml.
- Parameters
creds (Optional[str]) – Alias for the set of credentials to authenticate with; default behavior will fall back to the
connection.default-creds
specified in snowmobile.toml, or the first set of credentials stored if this configuration option is left blank.delay (bool) – Optionally delay establishing a connection when the object is instantiated, enabling access to the configuration object model through the
Connection.cfg
attribute; defaults to False.ensure_alive (bool) – Establish a new connection if a method requiring a connection against the database is called while
alive
is False; defaults to True.config_file_nm (Optional[str]) – Name of configuration file to use; defaults to snowmobile.toml.
from_config (Optional[str, Path]) – A full path to a specific configuration file to use; bypasses any checks for a cached file location and can be useful for container-based processes with restricted access to the local file system.
**connect_kwargs –
Additional arguments to provide to snowflake.connector.connect(); arguments provided here will over-ride connection arguments specified in snowmobile.toml, including:
Connection parameters in connection.default-arguments
Credentials parameters associated with a given alias
Connection parameters associated with a given alias
Initializes a
snowmobile.SQL
object.-
cfg
:Configuration¶ snowmobile.toml
-
con
:Optional[SnowflakeConnection]¶ Can be None until set by
Snowmobile.connect()
- Type
SnowflakeConnection
-
e
:ExceptionHandler¶ Exception / context management
-
connect
(self, **kwargs) → snowmobile.core.connection.Snowmobile¶ Establishes connection to Snowflake.
Re-implements snowflake.connector.connect() with connection arguments sourced from snowmobile’s object model, specifically:
Credentials from snowmobile.toml.
Default connection arguments from snowmobile.toml.
Optional keyword arguments either passed to
snowmobile.connect()
or directly to this method.
- kwargs:
Optional keyword arguments to pass to snowflake.connector.connect(); arguments passed here will over-ride
connection.default-arguments
specified insnowmobile.toml
.
-
disconnect
(self) → snowmobile.core.connection.Snowmobile¶ Disconnect from connection with which Connection() was instantiated.
- property
cursor
(self) → snowflake.connector.connection.SnowflakeCursor¶ SnowflakeCursor
accessor.
- property
dictcursor
(self) → snowflake.connector.DictCursor¶ DictCursor
accessor.
-
ex
(self, sql: str, on_error: Optional[str] = None, **kwargs) → snowflake.connector.connection.SnowflakeCursor¶ Executes a command via
SnowflakeCursor
.- Parameters
- Returns (SnowflakeCursor):
SnowflakeCursor
object that executed the command.
-
exd
(self, sql: str, on_error: Optional[str] = None, **kwargs) → snowflake.connector.DictCursor¶ Executes a command via
DictCursor
.- Parameters
- Returns (DictCursor):
DictCursor
object that executed the command.
-
query
(self, sql: str, as_df: bool = False, as_cur: bool = False, as_dcur: bool = False, as_scalar: bool = False, lower: bool = True, on_error: Optional[str] = None) → Union[pd.DataFrame, SnowflakeCursor]¶ Execute a query and return results.
Default behavior of results=True will return results as a
pandas.DataFrame
, otherwise will execute the sql provided with aSnowflakeCursor
and return the cursor object.- Parameters
sql (str) – Raw SQL to execute.
as_df (bool) – Return results in DataFrame.
as_cur (bool) – Return results in Cursor.
as_dcur (bool) – Return results in a DictCursor.
as_scalar (bool) – Return results as a single scalar value.
lower (bool) – Boolean value indicating whether or not to return results with columns lower-cased.
on_error (str) – String value to impose a specific behavior if an error occurs during the execution of
sql
.
- Returns (Union[pd.DataFrame, SnowflakeCursor]):
Results from
sql
as aDataFrame
by default or theSnowflakeCursor
object if results=False.
- class
snowmobile.core.
Section
(cfg: snowmobile.core.Configuration, is_marker: bool = None, h_contents: Optional[str] = None, index: Optional[int] = None, parsed: Optional[Dict] = None, raw: Optional[str] = None, sql: Optional[str] = None, results: Optional[pd.DataFrame] = None, incl_sql_tag: bool = False, is_multiline: bool = False, result_wrap: Optional[str] = None)¶ Bases:
snowmobile.core.Generic
Represents any (1-6 level) header section within Script Name (doc).md.
Class is created with a call to the
as_section()
method or by thesnowmobile.core.markup.Markup
class in the case of aMarker
.In order to include execution metadata if available without sacrificing base-case parsing, the below implementation heavily relies properties over attributes to reconcile what’s populated in the
st
vsexecuted
attributes ofScript
.- Parameters
is_marker (bool) – Information provided is associated with a marker as opposed to a statement; defaults to False.
h_contents (str) – String representation of header contents.
index (int) – Statement index position or None if marker.
parsed (dict) – Parsed arguments from the statement or marker within the script.
raw (str) – Raw wrap as
parsed
was parsed from.sql (str) – Statement’s raw sql or None if marker.
results (pd.DataFrame) – Results returned by statement’s sql as a
DataFrame
; will be None if statement hasn’t been executed or if a marker.
-
hx
¶ String representation of header level (e.g. ‘#’ for h1), based on the script/statement header-level specifications in snowmobile.toml.
- Type
Instantiation of a
script.Section
object.-
reorder_attrs
(self, parsed: dict, cfg: snowmobile.core.Configuration) → Dict¶ Re-orders parsed attributes based on configuration.
-
parse_contents
(self, cfg: snowmobile.core.Configuration) → List[Item]¶ Unpacks sorted dictionary of parsed attributes into formatted Items.
- property
header
(self) → str¶ Constructs the header for a section.
- Uses specifications in snowmobile.toml to determine:
The level of the header depending on whether it’s a statement section or a script section.
Whether or not to include the statement index as part of the header.
- Returns
Formatted header line as a string.
- class
snowmobile.core.
Scope
(arg: str, base: str)¶ Bases:
snowmobile.core.Generic
Handles the scope/context for
Statement
objects and derived classes.Should never be interacted with from the user-facing API.
-
base
¶ The left-most word within a statement wrap. For generic st this will be the keyword and for QA statements this will be the literal word
qa
.- Type
-
component
¶ The component within a given wrap that is being evaluated; this will be exactly one of kw, obj, anchor, desc, or nm.
- Type
-
excl_arg
¶ The keyword argument that would be used to exclude a given component; this would be the same as the above example except the value would be
excl_kw
as opposed toincl_kw
.- Type
-
fallback_to
¶ - provided_args (dict):
The set of keyword arguments provided at the time of the last call to
eval()
.- check_against_args (dict):
The set of keyword arguments checked against at the time of the last call to
eval()
; will use provided arguments if they exist and the arguments fromfallback_to
otherwise.- is_included (bool):
Name is included based on the results of the last call to
eval()
.- is_excluded (bool):
Name is excluded based on the results of the last call to
eval()
.
Instantiates a
Scope
object.-
parse_kwargs
(self, **kwargs) → None¶ Parses all filter arguments looking for those that match its base.
Looks for include/exclude arguments within kwargs, populating
provided_args
with those that were provided and populatescheck_against_args
with the same values if they were provided and fills in defaults fromfallback_to
otherwise.- Parameters
**kwargs – Keyword arguments passed to
Script.filter()
(e.g. incl_kw, excl_kw, ..)
-
matches_patterns
(self, arg: str) → bool¶ Returns indication of if
base
matches a given set of patterns.- Returns (bool):
Indication of whether any matches were found.
-
eval
(self, **kwargs) → bool¶ Evaluates filter arguments and updates context accordingly.
Updates the values of
is_included
,is_excluded
, andincluded
.- Parameters
**kwargs – Keyword arguments passed to
Script.filter()
(e.g. incl_kw, excl_kw, ..)
- Returns (bool):
Indicator of whether or not the statement should be included/excluded based on the context/keyword arguments pr.
-
- class
snowmobile.core.
Name
(configuration: snowmobile.core.Configuration, nm_pr: Optional[str] = None, sql: Optional[str] = None, index: Optional[int] = None)¶ Bases:
snowmobile.core.Generic
Handles the decomposition/parsing of statement name.
Should never be instantiated directly by the user-facing API but its attributes are likely to be accessed often as part of
Statement
and derived classes.-
cfg
¶ snowmobile.Configuration
object; represents fully parsed snowmobile.toml file.
-
patt
¶ snowmobile.Schema.Pattern
object; representsscript.patterns
section of snowmobile.toml.- Type
snowmobile.Schema.Pattern
-
is_included
¶ Indicator of whether or not the combination of all scopes for this statement wrap is included within a given context.
- Type
-
incl_idx_in_desc
¶ Indicator of whether or not to include the statement index in the description component of the wrap; defaults to True so that all generated statement tags are guaranteed to be unique for a given script.
Mainly included for testing purposes where setting to False enables comparing generated to provided statement tags without having to change the index position of the hard-coded/pr statement wrap when adding/removing tests.
- Type
-
first_line_remainder
¶ The remainder of the first line once excluding the
first_keyword
and stripping repeating whitespace.- Type
-
scopes
¶ Combination of all scopes for a given wrap; this is essentially the all possible combinations of including/excluding any of the kw, nm, obj, desc, and anchor for a given instance of
Name
.
-
scope
(self, **kwargs) → bool¶ Evaluates all component’s of a wrap’s scope against a set of filter args.
- **kwargs:
Keyword arguments passed to
Script.filter()
(e.g. incl_kw, excl_kw, ..)
- Returns (bool):
Value indicating whether or not the statement should be included based on the outcome of the evaluation of all of its components.
-
nm
(self, ge: bool = False, pr: bool = False, og: bool = True) → str¶ The final statement’s name that is used by the API.
This will be the full statement name if a tag exists and a parsed/generated name otherwise.
-
kw
(self, ge: bool = False, pr: bool = False)¶ The final statement’s keyword that is used by the API.
This will be the provided keyword if a statement wrap exists and a parsed/ge keyword otherwise.
-
obj
(self, ge: bool = False, pr: bool = False)¶ The final statement’s object that is used by the API.
This will be the object within a wrap if a statement wrap exists and follows the correct structure and a parsed/ge object otherwise.
-
desc
(self, ge: bool = False, pr: bool = False)¶ The final statement’s description that is used by the API.
This will be the description within a wrap if a statement wrap exists and follows the correct structure and a parsed/ge description otherwise.
-
anchor
(self, ge: bool = False, pr: bool = False)¶ The final statement’s anchor that is used by the API.
This will be the anchor within a wrap if a statement wrap exists and follows the correct structure and a parsed/ge wrap name otherwise.
-
set
(self, key, value) → snowmobile.core.name.Name¶ Custom attribute setting.
-
- class
snowmobile.core.
Statement
(sn: snowmobile.core.connection.Snowmobile, statement: Union[sqlparse.sql.Statement, str], index: Optional[int] = None, attrs_raw: Optional[str] = None, e: Optional[ExceptionHandler] = None, **kwargs)¶ Bases:
snowmobile.core.tag.Attrs
,snowmobile.core.Name
,snowmobile.core.Generic
Base class for all
Statement
objects.Home for attributes and methods that are associated with all statement objects, generic or QA.
-
sn
¶ snowmobile.connect
object.- Type
-
statement
¶ A
sqlparse.sql.Statement
object.- Type
Union[sqlparse.sql.Statement, str]
-
patterns
¶ config.Pattern
object for more succinct access to values specified in snowmobile.toml.- Type
config.Pattern
-
results
¶ The results of the statement if executed as a
pandas.DataFrame
.- Type
pd.DataFrame
-
outcome
¶ Numeric indicator of outcome; defaults to 0 and is modified based on the outcome of statement execution and/or QA validation for derived classes.
- Type
-
outcome_html
¶ HTML text for the outcome as an admonition/information banner based on the following mapping of
outcome_txt
to admonition argument:failed ——> warning
completed –> info
passed —–> success
- Type
-
execution_time_txt
¶ Plain text description of execution time if executed; returned in seconds if execution time is less than 60 seconds, minutes otherwise.
- Type
-
first_keyword
¶ The first keyword within the statement as a
sqlparse.sql.Token
.- Type
Initialize self. See help(type(self)) for accurate signature.
-
sql
(self, set_as: Optional[str] = None, tag: bool = False) → Union[str, Statement]¶ Raw sql from statement, including result limit if enabled.
-
parse
(self) → Tuple[Dict, str]¶ Parses tag contents into a valid dictionary.
Uses the values specified in snowmobile.toml to parse a raw string of statement attributes into a valid dictionary.
Note
If
is_multiline
is True and name is not included within the arguments, an assertion error will be thrown.If
is_multiline
is False, the raw string within the wrap will be treated as the name.The
wrap
attribute is set once parsing is completed and name has been validated.
- Returns (dict):
Parsed wrap arguments as a dictionary.
-
start
(self)¶ Sets
start_time
attribute.
-
end
(self)¶ Updates execution time attributes.
- In namespace, sets:
-
trim
(self) → str¶ Statement as a string including only the sql and a single-line wrap name.
Note
The wrap name used here will be the user-pr wrap from the original script or a generated
Name.nm
if a wrap was not provided for a given statement.
- property
is_derived
(self)¶ Indicates whether or not it’s a generic or derived (QA) statement.
-
as_section
(self, incl_sql_tag: Optional[bool] = None, result_wrap: Optional[str] = None) → snowmobile.core.Section¶ Returns current statement as a
Section
object.
-
set_state
(self, index: Optional[int] = None, ctx_id: Optional[int] = None, in_context: Optional[bool] = None, filters: dict = None) → snowmobile.core.statement.Statement¶ Sets current state/context on a statement object.
-
reset
(self, index: bool = False, ctx_id: bool = False, in_context: bool = False, scope: bool = False) → snowmobile.core.statement.Statement¶ Resets attributes on the statement object to reflect as if read from source.
- In its current form, includes:
Resetting the statement/wrap’s index to their original values.
Resetting the
is_included
attribute of the statement’swrap
to True.Populating
error_last
with errors from current context.Caching current context’s timestamp and resetting back to None.
-
process
(self)¶ Used by derived classes for post-processing the returned results.
-
run
(self, as_df: bool = True, lower: bool = True, render: bool = False, on_error: Optional[str] = None, on_exception: Optional[str] = None, on_failure: Optional[str] = None, ctx_id: Optional[int] = None) → snowmobile.core.statement.Statement¶ Run method for all statement objects.
- Parameters
as_df (bool) – Store results of query as
pandas.DataFrame
orSnowflakeCursor
.lower (bool) – Lower case column names in
results
DataFrame if as_df=True.render (bool) – Render the sql executed as markdown.
on_error (str) –
- Behavior if an execution/database error is encountered
None: default behavior, exception will be raised
c: continue with execution
on_exception (str) –
Behavior if an exception is raised in the post-processing of results from a derived class of
Statement
(Empty
andDiff
).None: default behavior, exception will be raised
c: continue with execution
on_failure (str) –
Behavior if no error is encountered in execution or post-processing but the result of the post-processing has turned the statement’s
outcome
attribute to False, indicating the results returned by the statement have failed validation.None: default behavior, exception will be raised
c: continue with execution
- Returns (Statement):
Statement object post-executing query.
-
- class
snowmobile.core.
Column
(original: str, current: Optional[str] = None, prior: Optional[str] = None, src: Optional[str] = None)¶ Bases:
snowmobile.core.Generic
A single column within a
SnowFrame
.-
update
(self)¶ Migrate from prior to current context within this context.
- static
dedupe
(current: str, char: Optional[str] = None) → str¶ Dedupes consecutive characters within a string.
Note
Must iterate through matches and perform replacements in the order of the largest to the smallest by number of characters; this is to avoid altering the matches found before replacing them.
-
- class
snowmobile.core.
Diff
(sn: snowmobile.core.connection.Snowmobile = None, **kwargs)¶ Bases:
snowmobile.core.qa.QA
QA class for comparison of values within a table based on partitioning on a field.
-
partition_on
¶ Column name to partition data on before comparing the partitioned datasets; defaults to ‘src_description`.
- Type
-
end_index_at
¶ Column name that marks the last column to use as an index column when joining the partitioned datasets back together.
- Type
-
compare_patterns
¶ Regex patterns to match columns on that should be included in comparison (numeric columns you’re running QA on).
- Type
-
ignore_patterns
¶ Regex patterns to match columns on that should be ignored both for the comparison and the index.
- Type
-
generic_metric_col_nm
¶ Column name to use for the melted field names; defaults to ‘Metric’.
- Type
-
compare_cols
¶ Columns that are used in comparison once statement is executed and parsing is applied.
- Type
-
idx_cols
¶ Columns that are used for the index to join the data back together once statement is executed and parsing is applied.
- Type
-
ub_raw
¶ Maximum absolute raw difference (upper bound) that two fields that are being compared can differ from each other without causing a failure.
- Type
-
ub_perc
¶ Maximum absolute percentage difference (upper bound) that two comparison fields can differ from each other without causing a failure.
- Type
Instantiates a
qa-diff
statement.- Parameters
delta_column_suffix (str) – Suffix to add to columns that comparison is being run on; defaults to ‘Delta’.
partition_on (str) – Column to partition the data on in order to compare.
end_index_at (str) – Column name that marks the last column to use as an index when joining the partitioned datasets back together.
compare_patterns (list) – Regex patterns matching columns to be included in comparison.
ignore_patterns (list) – Regex patterns to match columns on that should be ignored both for the comparison and the index.
generic_metric_col_nm (str) – Column name to use for the melted field names; defaults to ‘Metric’.
raw_upper_bound (float) – Maximum absolute raw difference that two fields that are being compared can differ from each other without causing a failure.
percentage_upper_bound (float) – Maximum absolute percentage difference that two comparison fields can differ from each other without causing a failure.
-
split_cols
(self) → snowmobile.core.qa.Diff¶ Post-processes results returned from a
qa-diff
statement.- Executes private methods to split columns into:
Index columns
Drop columns
Comparison columns
Then runs checks needed to ensure minimum requirements are met in order for a valid partition/comparison to be made.
- property
partitioned_by
(self) → Set[Any]¶ Distinct values within the
partition_on
column that data is partitioned by.
- static
partitions_are_equal
(partitions: Dict[str, pd.DataFrame], abs_tol: float, rel_tol: float) → bool¶ Evaluates if a dictionary of DataFrames are identical.
- Parameters
partitions (Dict[str, pd.DataFrame]) – A dictionary of DataFrames returned by
snowmobile.DataFrame()
.abs_tol (float) – Absolute tolerance for difference in any value amongst the DataFrames being compared.
rel_tol (float) – Relative tolerance for difference in any value amongst the DataFrames being compared.
- Returns (bool):
Indication of equality amongst all the DataFrames contained in
partitions
.
-
process
(self) → snowmobile.core.qa.Diff¶ Post-processing for
Diff
-specific results.
-
- class
snowmobile.core.
Empty
(sn: snowmobile.core.connection.Snowmobile, **kwargs)¶ Bases:
snowmobile.core.qa.QA
QA class for verification that a statement’s results are empty.
The most widely applicable use of
Empty
is for simple verification that a table’s dimensions are as expected.Initialize self. See help(type(self)) for accurate signature.
-
process
(self) → snowmobile.core.qa.QA¶ Over-ride method; checks if results are empty and updates outcome
-
- class
snowmobile.core.
SnowFrame
(df: pandas.DataFrame)¶ Bases:
snowmobile.core.Generic
Extends a
DataFrame
with a.snf
entry point.Returns list of tuples containing column pairs that are common between two DataFrames.
- static
series_max_diff_abs
(col1: pandas.Series, col2: pandas.Series, tolerance: float) → bool¶ Determines if the max absolute difference between two
pandas.Series
is within a tolerance level.
- static
series_max_diff_rel
(col1: pandas.Series, col2: pandas.Series, tolerance: float) → bool¶ Determines if the maximum relative difference between two
pandas.Series
is within a tolerance level.
-
df_max_diff_abs
(self, df2: pandas.DataFrame, tolerance: float) → bool¶ Determines if the maximum absolute difference between any value in the shared columns of 2 DataFrames is within a tolerance level.
-
df_max_diff_rel
(self, df2: pandas.DataFrame, tolerance: float) → bool¶ Determines if the maximum relative difference between any value in the shared columns of 2 DataFrames is within a tolerance level.
-
df_diff
(self, df2: pandas.DataFrame, abs_tol: Optional[float] = None, rel_tol: Optional[float] = None) → bool¶ Determines if the column-wise difference between two DataFrames is within a relative or absolute tolerance level.
Note
df1
anddf2
are assumed to have a shared, pre-defined index.Exactly one of
abs_tol
andrel_tol
is expected to be a a valid float; the other is expected to be None.If valid float values are provided for both
abs_tol
andrel_tol
, the outcome of the maximum absolute difference with respect toabs_tol
will be returned regardless of the value ofrel_tol
.
- Parameters
- Returns (bool):
Boolean indicating whether or not difference is within tolerance.
-
partitions
(self, on: str) → Dict[str, pd.DataFrame]¶ Returns a dictionary of DataFrames given a DataFrame and a partition column.
Note
The number of distinct values within
partition_on
column will be 1:1 with the number of partitions that are returned.The
partition_on
column is dropped from the partitions that are returned.The depth of a vertical concatenation of all partitions should equal the depth of the original DataFrame.
- Parameters
on (str) – The column name to use for partitioning the data.
- Returns (Dict[str, pd.DataFrame]):
Dictionary of {(str) partition_value: (pd.DataFrame) associated subset of df}
-
lower
(self, col: Optional[str] = None) → pandas.DataFrame¶ Lower cases all column names or all values within col if pr.
-
upper
(self, col: Optional[str] = None) → pandas.DataFrame¶ Upper cases all column names or all values within col if pr.
-
reformat
(self)¶ Re-formats DataFrame’s columns via
Column.reformat()
.
-
append_dupe_suffix
(self)¶ Adds a trailing index number ‘_i’ to duplicate column names.
-
to_list
(self, col: Optional[str] = None, n: Optional[int] = None) → List¶ Succinctly retrieves a column as a list.
-
add_tmstmp
(self, col_nm: Optional[str] = None) → pandas.DataFrame¶ Adds a column containing the current timestamp to a DataFrame.
- Parameters
col_nm (str) – Name for column; defaults to LOADED_TMSTMP.
- property
original
(self) → pandas.DataFrame¶ Returns the DataFrame in its original form (drops columns added by
SnowFrame
and reverts to original column names).
-
cols_matching
(self, patterns: List[str], ignore_patterns: List[str] = None) → List[str]¶ Returns a list of columns given a list of patterns to find.
- Parameters
- Returns (List[str]):
List of columns found/excluded.
- class
snowmobile.core.
SQL
(_query_func: Callable, _cfg: snowmobile.core.configuration.Configuration, nm: Optional[str] = None, schema: Optional[str] = None, obj: Optional[str] = None, auto_run: Optional[bool] = True)¶ Bases:
snowmobile.core.Generic
SQL class for generation & execution of common sql commands.
Intended to be interacted with as a parent of
Snowmobile
.Note
All arguments except for
sn
are optional.The benefit of setting the other attributes on an instance of
SQL
is to (optionally) avoid passing the same information to multiple methods when generating a variety of statements around the same object.
-
schema
¶ Schema to use when dot-prefixing sql; defaults to the schema with which the
sn
is connected to.- Type
-
auto_run
¶ Indicates whether to automatically execute the sql generated by a given method; defaults to True
- Type
Initializes a
snowmobile.SQL
object.-
info_schema
(self, loc: str, where: Optional[List[str]] = None, fields: Optional[List[str]] = None, order_by: Optional[List] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Generic case of selecting from information schema location.
-
table_info
(self, nm: Optional[str] = None, fields: List[str] = None, restrictions: Dict[str, str] = None, order_by: List[Optional[str, int]] = None, all_schemas: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Query
information_schema.tables
for a given table or view.- Parameters
nm (str) – Table name, including schema if creating a stage outside of the current schema.
fields (List[str]) – List of fields to include in returned results (e.g. [‘table_name’, ‘table_type’, ‘last_altered’])
restrictions (List[str]) – List of conditionals typed as literal components of a where clause (e.g. [“table_type = ‘base table’”, ‘last_altered::date = current_date()’] ).
order_by (List[str]) – List of fields or their ordinal positions to order the results by.
all_schemas (bool) – Include tables/views from all schemas; defaults to False.
run (bool) – Determines whether to run the generated sql or not; defaults to None which will reference the current value of the
auto_run
attribute which defaults to True.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
column_info
(self, nm: Optional[str] = None, fields: Optional[List] = None, restrictions: Optional[Dict] = None, order_by: Optional[List] = None, all_schemas: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Query
information_schema.columns
for a given table or view.- Parameters
nm (str) – Table name, including schema if creating a stage outside of the current schema.
fields (List[str]) – List of fields to include in returned results (e.g. [‘ordinal_position’, ‘column_name’, ‘data_type’])
restrictions (List[str]) – List of conditionals typed as literal components of a where clause (e.g.[“regexp_count(lower(column_name), ‘tmstmp’) = 0”]).
order_by (List[str]) – List of fields or their ordinal positions to order the results by.
all_schemas (bool) – Include tables/views from all schemas; defaults to False.
run (bool) – Determines whether to run the generated sql or not; defaults to None which will reference the current value of the
auto_run
attribute which defaults to True.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
columns
(self, nm: Optional[str] = None, from_info_schema: bool = False, lower: bool = False, run: Optional[bool] = None) → Union[str, List]¶ Returns an ordered list of columns for a table or view.
Note
Default behavior is to retrieve the columns for a table or view by selecting a single sample record and returning the column index from the DataFrame that’s returned which is much faster than selecting the column_names from
information_schema.columns
pulling column names from the information schemaThis can be changed by passing from_info_schema=True.
- Parameters
nm (str) – Name of table or view, including schema if the table or view is outside of the current schema.
from_info_schema (bool) – Indicates whether to retrieve columns via the
information_schema.columns
or by selecting a sample record from the table or view; defaults to False.lower (bool) – Lower case each column in the list that’s returned.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
- Returns (Union[str, List]):
- Either:
An ordered list of columns for the table or view, or
The query against the table or view as a
str
of sql.
-
select
(self, nm: Optional[str] = None, fields: Optional[List[str]] = None, apply: Optional[List[Tuple[str, str]]] = None, n: Optional[int] = None, run: Optional[bool] = None, **kwargs) → Union[str, pd.DataFrame]¶ Generic select statement.
- Parameters
nm (str) – Table to select from, including schema if the table is outside of the current schema
fields (Optional[List[str]]) – Select these fields (optional).
apply (Optional[List[Tuple[str, str]]]) –
Select aggregations of these fields.
- apply [
(this_func, to_this_field, [as_alias]), (.., .., [..]),
]
apply
should be provided as a list of tuples, each containing a minimum of 2 items (respectively) representing the aggregate function to apply and the field to which it should be appliedBy default, the aggregated result inherits the name of the field being aggregated, including any qualifier (optionally) provided with the field name or an explicit alias included as a 3rd item within the tuple
The following snippet exhaustively illustrates the functionality described above
sn.select( nm='sandbox.sample_table', apply=[ ('count', 'col1'), ('count', 'distinct col1'), ('count', 'distinct col1', 'col1_dst'), ], run=False, ) >>> select count(col1) as col1 ,count(distinct col1) as distinct_col1 ,count(distinct col1) as col1_dst from sandbox.sample_table
n (int) – Number of records to return, implemented as a ‘limit’ clause in the query; defaults to 1.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
- Returns (Union[str, pd.DataFrame]):
Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
exists
(self, nm: Optional[str] = None) → bool¶ Checks the existence of a table or view.
- Parameters
nm (str) – Name of table or view, including schema if the table or view is outside of the current schema.
- Returns (bool):
Boolean indication of whether or not the table or view exists.
-
is_distinct
(self, nm: Optional[str] = None, field: Optional[str] = None) → bool¶ Checks if table nm is distinct on column on_col
-
count
(self, nm: Optional[str] = None, of: Optional[str] = None, dst_of: Optional[str] = None, as_perc: Optional[bool] = None, run: Optional[bool] = None) → Union[int, float]¶ Number of records within a table or view.
- Parameters
nm (str) – Table name, including schema if querying outside current schema.
of (str) – Column name (indistinct).
dst_of (str) – Column name (distinct).
as_perc (bool) – Option to return distinct count of the dst_of column as a percentage of the namespace depth of the table or view.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
show
(self, obj: str, in_loc: Optional[str] = None, names: bool = False, run: Optional[bool] = None, **kwargs) → Union[pd.DataFrame, List[str], str]¶ Show schema objects of typ ‘obj’, optionally ‘in_loc’.
- Parameters
- Returns (Union[pd.DataFrame, str]):
- Either:
The results of the query as a
pandas.DataFrame
The ‘names’ column of the results returned as a list
The generated query as a
str
of sql
-
ddl
(self, nm: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None) → str¶ Query the DDL for an schema object.
- Parameters
- Returns (str):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
comment
(self, nm: Optional[str] = None, obj: Optional[str] = None, set_as: Optional[str] = None, from_json: bool = False, as_json: bool = False, run: Optional[bool] = None, **kwargs) → Union[str, Dict]¶ Get or set comment on a schema object.
- Parameters
nm (str) – Name of the schema object, including schema prefix if object is outside implicit scope of the current connection.
obj (str) – Type of schema object (e.g. ‘table’, ‘schema’, etc).
set_as (str) – Content to set as comment on schema object.
from_json (bool) – Parse schema object comment as a string of json and return it as a dictionary.
as_json (bool) – Dump contents of ‘set_as’ to a string of json prior to setting comment.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
**kwargs – Keyword argument to pass to json.loads(comment) if from_json=True.
-
last_altered
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.Timestamp]¶ Last altered timestamp for a table or view.
- Parameters
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
truncate
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Truncate a table.
- Parameters
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
drop
(self, nm: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Drop a
Snowflake
object.- Parameters
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
clone
(self, nm: Optional[str] = None, to: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None, replace: bool = False) → Union[str, pd.DataFrame]¶ Clone a
Snowflake
object.Warning
Make sure to read Snowflake’s documentation for restrictions and considerations when cloning objects.
Note
In this specific method, the value provided to
nm
andto
can be a single object name, a single schema, or both in the form of obj_schema.obj_name depending on the desired outcome.Additionally, at least one of the
nm
orto
arguments must be pr.The defaults for the target object are constructed such that users can either:
Clone objects to other schemas that inherit the source object’s name without specifying so in the
to
argument, orClone objects within the current schema that inherit the source object’s schema without specifying so in the
to
argument.
If providing a schema without a name to either argument, prefix the value provided with __ to signify it’s a schema and not a lower-level object to be cloned.
e.g. providing nm=’sample_table’ and to=’__sandbox’ will clone sample_table from the current schema to sandbox.sample_table.
An assertion error will be raised raised if neither argument is specified as this would result in a command to clone an object and store it in an object that has the same name & schema as the object being cloned.
- Parameters
nm (str) – Name of the object to clone, including schema if cloning an object outside of the current schema.
to (str) – Target name for cloned object, including schema if cloning an object outside of the current schema.
obj (str) – Type of object to clone (e.g. ‘table’, ‘view’, ‘file-format’); defaults to table.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
replace (bool) – Indicates whether to replace an existing stage if pre-existing; default is False.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
create_stage
(self, nm_stage: str, nm_format: str, replace: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Create a staging table.
- Parameters
nm_stage (str) – Name of stage to create, including schema if creating a stage outside of the current schema.
nm_format (str) – Name of file format to specify for the stage, including schema if using a format from outside of the current schema.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
replace (bool) – Indicates whether to replace an existing stage if pre-existing; default is False.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
put_file_from_stage
(self, path: Union[Path, str], nm_stage: str, options: Optional[Dict] = None, ignore_defaults: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Generates a ‘put’ command into a staging table from a local file.
- Parameters
path (Union[Path, str]) – Path to local data file as a
pathlib.Path
or string.nm_stage (str) – Name of the staging table to load into.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
options (dict) – Optional arguments to add to put statement in addition to the values specified in the
loading.put
section of snowmobile.toml.ignore_defaults (bool) – Option to ignore the values specified in snowmobile.toml; defaults to False.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
copy_into_table_from_stage
(self, nm: str, nm_stage: str, options: Optional[Dict] = None, ignore_defaults: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Generates a command to copy data into a table from a staging table.
- Parameters
nm (str) – Name of the object to drop, including schema if creating a stage outside of the current schema.
nm_stage (str) – Name of the staging table to load from.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
options (dict) – Optional arguments to add to put statement in addition to the values specified in the
loading.put
section of snowmobile.toml.ignore_defaults (bool) – Option to ignore the values specified in snowmobile.toml; defaults to False.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
current
(self, obj: str, run: Optional[bool] = None) → Union[str, Union[str, int]]¶ Generic implementation of ‘select current’ for session-based objects.
- Parameters
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
current_session
(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Select the current session.
-
current_schema
(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Select the current schema.
-
current_database
(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Select the current database.
-
current_warehouse
(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Select the current warehouse.
-
use
(self, obj: str, nm: str, run: Optional[bool] = None)¶ Generic implementation of ‘use’ command for schema objects.
- Parameters
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
use_schema
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Use schema command.
-
use_database
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Use database command.
-
use_warehouse
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Use warehouse command.
-
use_role
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Use role command.
- static
order
(by: List[Union[int, str]]) → str¶ Generates ‘order by’ clause from a list of fields or field ordinal positions.
- static
where
(restrictions: Dict) → str¶ Generates a ‘where’ clause based on a dictionary of restrictions.
- Parameters
restrictions (dict) – A dictionary of conditionals where each key/value pair respectively represents the left/right side of a condition within a ‘where’ clause.
- Returns (str):
Formatted where clause.
- class
snowmobile.core.
Markup
(sn: snowmobile.core.connection.Snowmobile, path: pathlib.Path, contents: Dict[int, Union[Statement, Marker]], nm: Optional[str] = None, prefix: Optional[str] = None, suffix: Optional[str] = None, root_dir: Optional[Union[str, Path]] = None, sub_dir: Optional[str] = None, incl_sql: bool = True, incl_markers: bool = True, incl_sql_tag: bool = False, incl_exp_ctx: bool = True, result_wrap: Optional[str] = None)¶ Bases:
snowmobile.core.Generic
Contains all sections within the context of a
Script
.- Parameters
sn (Snowmobile) – A
Snowmobile
instance.path (Path) – A full path to the sql file that script was instantiated from.
contents (Dict[int, Union[Statement, Marker]]) – A dictionary of the script’s contents (st and markers) by index position.
nm (Optional[str]) – Alternate file name to use; defaults to
path.name
.prefix (Optional[str]) – Prefix to prepend to original file name when exporting.
suffix (Optional[str]) – Suffix to append to original file name when exporting.
root_dir (Optional[Union[str, Path]]) – Alternate target directory for exports; defaults to
./.snowmobile
where.
is the directory containing the sql file that thescript
was created from.sub_dir (Optional[str]) – Alternate sub-directory name; defaults to
path.name
wherepath
is a fullPath
to the sql file that thescript
was created from.incl_sql (bool) – Include statements in export.
incl_markers (bool) – Include markers in export.
incl_sql_tag (bool) – Include the raw wrap in the sql that is rendered in the md export.
incl_exp_ctx (bool) – Include (configurable) disclaimer at the top of exported sql file.
-
exported
¶ List of file paths that current instance has exported to.
- Type
List[Path]
-
created
¶ List of directory paths that current instance has created (should mostly apply for initial scaffolding build on first run only).
- Type
List[Path]
- property
export_dir
(self) → pathlib.Path¶ Documentation sub-directory; .snowmobile by default.
- property
sql
(self)¶ SQL for save.
- class
snowmobile.core.
Script
(sn: Optional[Snowmobile] = None, path: Optional[Path, str] = None, sql: Optional[str] = None, as_generic: bool = False, delay: bool = True, **kwargs)¶ Bases:
snowmobile.core.Generic
Parser and operator of local sql files.
- Parameters
sn (snowmobile.core.connection.Snowmobile) – An instance of
Snowmobile
.path (Optional[str]) – A full path to a sql file or readable text file containing valid sql code.
path – A raw string of valid sql code as opposed to reading from a
path
.as_generic (bool) – Instantiate all statements as generic st; skips all checks for a mapping of a statement anchor to a derived statement class to instantiate in the place of a generic
Statement
.delay (bool) – Delay connection of the
Snowmobile
; only applicable if thesn
argument is omitted andScript
is instantiating aSnowmobile
in its absence.**kwargs – Any keyword arguments to pass to
Snowmobile
; only applicable if thesn
argument is omitted andScript
is instantiating aSnowmobile
in its absence
-
sn
¶ An instance of
Snowmobile
-
patterns
¶ Configured patterns from snowmobile.toml.
-
as_generic
¶ Instantiate all statements as generic st; skips all checks for a mapping of a statement anchor to a derived statement class to instantiate in the place of a generic
Statement
.- Type
-
filters
¶ Dictionary of filters that have been passed to the current instance of
snowmobile.core.Script
.
-
markers
¶ Dictionary of all markers found in the script.
- Type
Dict[int, cfg.Marker]
-
path
¶ Path to sql file (e.g. full/path/to/script.sql).
- Type
Path
-
source
¶ Raw sql text of script; will be the text contained in the raw sql file when initially read from source and reflect any modifications to the script’s contents made post-instantiation.
- Type
-
read
(self, path: pathlib.Path = None) → snowmobile.core.script.Script¶ Runs quick path validation and reads in a sql file as a string.
A valid path must be provided if the script.path attribute hasn’t been set;
ValueErrors
will be thrown if neither is valid.- Parameters
path (pathlib.Path) – Full path to a sql object.
-
from_str
(self, sql: str, name: str, directory: pathlib.Path = Path.cwd()) → snowmobile.core.script.Script¶ Instantiates a raw string of sql as a script.
-
parse_one
(self, s: Union[sqlparse.sql.Statement, str], index: Optional[int] = None, nm: Optional[str] = None) → None¶ Adds a statement object to the script.
Default behavior will only add
sqlparse.sql.Statement
objects returned fromscript.source_stream
.clean_parse()
utility function is utilized so that generated sql within Python can be inserted back into the script as raw strings.- Parameters
s (Union[sqlparse.sql.Statement, str]) – A sqlparse.sql.Statement object or a raw string of SQL for an individual statement.
index (int) – Index position of the statement within the script; defaults to
n + 1
if index is not provided wheren
is the number of statements within the script at the timeparse_one()
is called.nm (Optional[str]) – Optionally provided the name of the statement being added; the script instance will treat this value as if it were provided within an in-script wrap.
-
parse_stream
(self, stream: str) → None¶ Parses a stream of sql and adds onto existing Script contents.
-
filter
(self, incl_kw: Optional[List[str], str] = None, incl_obj: Optional[List[str], str] = None, incl_desc: Optional[List[str], str] = None, incl_anchor: Optional[List[str], str] = None, incl_nm: Optional[List[str], str] = None, excl_kw: Optional[List[str], str] = None, excl_obj: Optional[List[str], str] = None, excl_desc: Optional[List[str], str] = None, excl_anchor: Optional[List[str], str] = None, excl_nm: Optional[List[str], str] = None, as_id: Optional[Union[str, int]] = None, from_id: Optional[Union[str, int]] = None, last: bool = False) → ContextManager[Script]¶ Subset the script based on attributes of its st.
script.filter()
returns a modified instance of script that can be operated on within the context defined.Note
Keyword arguments beginning with
incl
orexcl
expect a string or a list of strings containing regex patterns with which to check for a match against the associated attribute of its st’Name
.- Parameters
incl_kw – Include only
kw
incl_obj – Include only
obj
incl_desc – Include only
desc
incl_anchor – Include only
anchor
incl_nm – Include only
nm
excl_kw – Exclude
kw
excl_obj – Exclude
obj
excl_desc – Exclude
desc
excl_anchor – Exclude
anchor
excl_nm – Exclude
nm
as_id – ID to assign the filters passed to method; used to populated the
filters
attributefrom_id – ID previously used on the same instance of
Script
from which to populate filtered argumentslast – Re-use the last set of filters passed to context manager.
- Returns (Script):
The instance of script based on the context imposed by arguments pr.
- property
excluded
(self)¶ All statements by index position excluded from the current context.
- property
executed
(self) → Dict[int, Statement]¶ Executed statements by index position included in the current context.
-
reset
(self, index: bool = False, ctx_id: bool = False, in_context: bool = False, scope: bool = False, _filter: bool = False) → snowmobile.core.script.Script¶ Resets indices and scope on all statements to their state as read from source.
Invoked before exiting
filter()
context manger to reverse the revised indices set byindex_to()
and inclusion/ exclusion scope set byStatement.Name.scope()
.
- property
duplicates
(self) → Dict[str, int]¶ Dictionary of indistinct statement names/tags within script.
-
s
(self, _id: Optional[str, int] = None) → Any[Statement, Empty, Diff]¶ Fetch a single statement by _id.
-
dtl
(self, full: bool = False, excluded: bool = False, title: bool = True, r: bool = False) → Union[str, None]¶ Prints summary of statements within the current scope to console.
- property
first_s
(self)¶ First statement by index position.
- property
last_s
(self)¶ Last statement by index position
-
doc
(self, nm: Optional[str] = None, prefix: Optional[str] = None, suffix: Optional[str] = None, incl_markers: Optional[bool] = True, incl_sql: Optional[bool] = True, incl_exp_ctx: Optional[bool] = True, result_wrap: Optional[str] = None) → snowmobile.core.Markup¶ Returns a
Markup
from the script.- Parameters
nm (Optional[str]) – Alternate file name to use.
prefix (Optional[str]) – Prefix for file name.
suffix (Optional[str]) – Suffix for file name.
incl_markers (Optional[bool]) – Include markers in exported files.
incl_sql (Optional[bool]) – Include sql in exported files.
incl_exp_ctx (Optional[bool]) – Include disclaimer of programmatic save in exported sql file.
- Returns
A
Markup
instance based on the contents included in the script’s context.
-
ids
(self, _id: Optional[Union[Tuple, List]] = None) → List[int]¶ Utility function to get a list of statement IDs given an _id.
- Invoked within script.run() if the _id parameter is either a:
tuple of integers (lower and upper bound of statement indices to run)
list of integers or strings (statement names or indices to run)
default=None; returns all statement indices within scope if so
- Parameters
_id (Union[Tuple, List]) – _id field provided to script.run() if it’s neither an integer or a string.
- Returns (List[int]):
A list of statement indices to run.
-
run
(self, _id: Optional[str, int, Tuple[int, int], List] = None, as_df: bool = True, on_error: Optional[str] = None, on_exception: Optional[str] = None, on_failure: Optional[str] = None, lower: bool = True, render: bool = False, **kwargs) → None¶ Performs statement-by-statement execution of the script’s contents.
Executes script’s contents that are included within its current context and any (optional) value passed to the
_id
argument.Note
Keyword arguments
on_exception
andon_failure
are only applicable to derived classes ofStatement
(e.g., those withinsnowmobile.core.qa
by default).- Parameters
_id (Optional[str, int, Tuple[int, int], List]) –
- Identifier for statement(s) to execute, can be either:
None (default); execute all statements
A single statement’s
nm
A single statement’s index position
A tuple of lower/upper index bounds of statements to execute
A list of statement names or index positions to execute
as_df (bool) – Store statement’s results as a
DataFrame
; defaults to Trueon_error (Optional[str]) – Action to take on execution error; providing c will continue execution as opposed to raising exception.
on_exception (Optional[str]) – Action to take on post-processing error from a derived
Statement
; providing c will continue execution as opposed to raising exception.on_failure (Optional[str]) – Action to take on failure of post-processing assertion from a derived
Statement
; providing c will continue execution as opposed to raising exception.lower (bool) – Lower-case columns in results returned if
as_df=True
.render (bool) – Render sql executed as markdown; only applicable in Jupyter/iPython environments.
**kwargs –
-
items
(self, by_index: bool = True, ignore_scope: bool = False, statements: bool = True, markers: bool = False, validate: bool = True) → ItemsView[Union[int, str], Union[Statement, Marker]]¶ Dunder items.
-
dict
(self, **kwargs) → Dict¶ Unpacking items view into an actual dictionary.
- class
snowmobile.core.
Table
(df: pandas.DataFrame, table: str, sn: Optional[Snowmobile] = None, if_exists: Optional[str] = None, as_is: bool = False, path_ddl: Optional[Union[str, Path]] = None, path_output: Optional[str, Path] = None, file_format: Optional[str] = None, incl_tmstmp: Optional[bool] = None, tmstmp_col_nm: Optional[str] = None, reformat_cols: Optional[bool] = None, validate_format: Optional[bool] = None, validate_table: Optional[bool] = None, upper_case_cols: Optional[bool] = None, lower_case_table: Optional[bool] = None, keep_local: Optional[bool] = None, on_error: Optional[str] = None, check_dupes: Optional[bool] = None, load_copy: Optional[bool] = None, **kwargs)¶ Bases:
snowmobile.core.Generic
Constructed with a
DataFrame
and a table name to load into.The
df
andtable
’s compatibility can be inspected prior to calling theTable.load()
method or by providing as_is=True` when instantiating the object; the latter will kick off the loading process invoked by.load()
based on the parameters provided tosnowmobile.Table()
.- Parameters
df (DataFrame) – The
DataFrame
to load.table (str) – The table name to load
df
into.sn (Optional[Snowmobile]) – An instance of
Snowmobile
; can be used to load a table on a specific connection or from a specificsnowmobile.toml
file.if_exists (Optional[str]) – Action to take if
table
already exists - options are fail, replace, append, and truncate; defaults toappend
.as_is (bool) – Load
df
intotable
based on the parameters provided toTable
without further pre-inspection by the user; defaults to False.path_ddl (Optional[Path]) – Alternate path to file format DDL to use for load.
keep_local (Optional[bool]) – Keep local file that is written out as part of the bulk loading process; defaults to False.
path_output (Optional[str Path]) – Path to write output local file to; defaults to a generated file name exported in the current working directory.
file_format (Optional[str]) – The name of the file_format to use when loading
df
; defaults tosnowmobile_default_psv
.incl_tmstmp (Optional[bool]) – Include timestamp of load as part of
table
; defaults to True.tmstmp_col_nm (Optional[str]) – Name to use for load timestamp if
incl_tmstmp=True
; defaults to loaded_tmstmp.upper_case_cols (Optional[bool]) – Upper case columns of
df
when loading intotable
; defaults to True.reformat_cols (Optional[bool]) –
Reformat applicable columns of
df
to be DB-compliant; defaults to True.- Reformatting primarily entails:
Replacing spaces and special characters with underscores
De-duping consecutive special characters
De-duping repeated column names; adds an
_i
suffix to duplicate fields wherei
is the nth duplicate name for a field
validate_format (Optional[bool]) –
Validate the file format being used prior to kicking off the load; defaults to True.
- Validation entails:
Checking if the file format being used already exists based on formats accessible to the current connection
Executing DDL for the file format being used if not, pulled from the
DDL
ext-location and the statement namecreate file format~{format name}
Tip
Providing validate_format=False will speed up loading time when batch-loading into an existing table by skipping this step
validate_table (Optional[bool]) –
Perform validations of
df
againsttable
prior to kicking off the loading process; defaults to True.- Validation entails:
Checking the existence of
table
; no further validation is performed if it does not existCompares the columns of
df
to the columns oftable
and stores results for use during loading process
Note
Table validation results are used in conjunction with the
if_exists
parameter to determine the desired behavior based on the (potential) existence oftable
and its compatibility withdf
.Tip
Providing validate_table=False will speed up loading time time when batch-loading into an existing table
lower_case_table (Optional[bool]) – Lower case
table
name; defaults to False.on_error (Optional[str]) – Action to take if an exception is encountered as part of the validating or loading process - providing
on_error='c'
will continue past an exception as opposed to raising it; defaults to None meaning any exception encountered will be raisedcheck_dupes (Optional[bool]) – Check for duplicate field names in
df
; defaults to True.load_copy (Optional[bool]) – Alter and load a deep copy of
df
as opposed to thedf
in-memory as passed to the parameter; defaults to True.
-
load
(self, if_exists: Optional[str] = None, from_script: pathlib.Path = None, verbose: bool = True, **kwargs) → snowmobile.core.table.Table¶ Loads
df
intotable
.- Parameters
if_exists (Optional[str]) – Determines behavior to take if the table being loaded into already exists; defaults to append; options are replace, append, truncate, and fail
from_script (Optional[Union[Path, str]]) – Path to sql file containing custom DDL for
table
; DDL is assumed to have a valid statement name as is parsed byScript
and following the naming convention ofcreate table~TABLE
whereTABLE
is equal to the value provided to thetable
keyword argumentverbose (bool) – Verbose console output; defaults to True
-
col_diff
(self, mismatched: bool = False) → Dict[int, Tuple[str, str]]¶ Returns diff detail of local DataFrame to in-warehouse table.
-
load_statements
(self, from_script: pathlib.Path) → List[str]¶ Generates exhaustive list of the statements to execute for a given instance of loading a DataFrame.