snowmobile.core.sql
¶
SQL
contains utility methods to generate common SQL commands; Snowmobile
inherits everything from this object and passes along its query()
method for statement execution.
Note
The auto_run
attribute defaults to True, meaning that the generated sql will execute when a method is called; if set to False the method will return the sql as a string without executing.
The SQL
object is primarily interacted with as a pre-instantiated attribute of Snowmobile
; in these instances users can fetch the generated sql as a string either by:
Providing run=False to any method called; this will override all behavior set by the current value of
auto_run
Setting the
auto_run
attribute to False on an existing instance ofSQL
, which will replicate the behavior of (1) without needing to provide run=False to each method called on that instance
Module Contents¶
Classes¶
SQL class for generation & execution of common sql commands. |
- class
snowmobile.core.sql.
SQL
(_query_func: Callable, _cfg: snowmobile.core.configuration.Configuration, nm: Optional[str] = None, schema: Optional[str] = None, obj: Optional[str] = None, auto_run: Optional[bool] = True)¶ Bases:
snowmobile.core.Generic
SQL class for generation & execution of common sql commands.
Intended to be interacted with as a parent of
Snowmobile
.Note
All arguments except for
sn
are optional.The benefit of setting the other attributes on an instance of
SQL
is to (optionally) avoid passing the same information to multiple methods when generating a variety of statements around the same object.
-
schema
¶ Schema to use when dot-prefixing sql; defaults to the schema with which the
sn
is connected to.- Type
-
auto_run
¶ Indicates whether to automatically execute the sql generated by a given method; defaults to True
- Type
Initializes a
snowmobile.SQL
object.-
info_schema
(self, loc: str, where: Optional[List[str]] = None, fields: Optional[List[str]] = None, order_by: Optional[List] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Generic case of selecting from information schema location.
-
table_info
(self, nm: Optional[str] = None, fields: List[str] = None, restrictions: Dict[str, str] = None, order_by: List[Optional[str, int]] = None, all_schemas: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Query
information_schema.tables
for a given table or view.- Parameters
nm (str) – Table name, including schema if creating a stage outside of the current schema.
fields (List[str]) – List of fields to include in returned results (e.g. [‘table_name’, ‘table_type’, ‘last_altered’])
restrictions (List[str]) – List of conditionals typed as literal components of a where clause (e.g. [“table_type = ‘base table’”, ‘last_altered::date = current_date()’] ).
order_by (List[str]) – List of fields or their ordinal positions to order the results by.
all_schemas (bool) – Include tables/views from all schemas; defaults to False.
run (bool) – Determines whether to run the generated sql or not; defaults to None which will reference the current value of the
auto_run
attribute which defaults to True.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
column_info
(self, nm: Optional[str] = None, fields: Optional[List] = None, restrictions: Optional[Dict] = None, order_by: Optional[List] = None, all_schemas: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Query
information_schema.columns
for a given table or view.- Parameters
nm (str) – Table name, including schema if creating a stage outside of the current schema.
fields (List[str]) – List of fields to include in returned results (e.g. [‘ordinal_position’, ‘column_name’, ‘data_type’])
restrictions (List[str]) – List of conditionals typed as literal components of a where clause (e.g.[“regexp_count(lower(column_name), ‘tmstmp’) = 0”]).
order_by (List[str]) – List of fields or their ordinal positions to order the results by.
all_schemas (bool) – Include tables/views from all schemas; defaults to False.
run (bool) – Determines whether to run the generated sql or not; defaults to None which will reference the current value of the
auto_run
attribute which defaults to True.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
columns
(self, nm: Optional[str] = None, from_info_schema: bool = False, lower: bool = False, run: Optional[bool] = None) → Union[str, List]¶ Returns an ordered list of columns for a table or view.
Note
Default behavior is to retrieve the columns for a table or view by selecting a single sample record and returning the column index from the DataFrame that’s returned which is much faster than selecting the column_names from
information_schema.columns
pulling column names from the information schemaThis can be changed by passing from_info_schema=True.
- Parameters
nm (str) – Name of table or view, including schema if the table or view is outside of the current schema.
from_info_schema (bool) – Indicates whether to retrieve columns via the
information_schema.columns
or by selecting a sample record from the table or view; defaults to False.lower (bool) – Lower case each column in the list that’s returned.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
- Returns (Union[str, List]):
- Either:
An ordered list of columns for the table or view, or
The query against the table or view as a
str
of sql.
-
select
(self, nm: Optional[str] = None, fields: Optional[List[str]] = None, apply: Optional[List[Tuple[str, str]]] = None, n: Optional[int] = None, run: Optional[bool] = None, **kwargs) → Union[str, pd.DataFrame]¶ Generic select statement.
- Parameters
nm (str) – Table to select from, including schema if the table is outside of the current schema
fields (Optional[List[str]]) – Select these fields (optional).
apply (Optional[List[Tuple[str, str]]]) –
Select aggregations of these fields.
- apply [
(this_func, to_this_field, [as_alias]), (.., .., [..]),
]
apply
should be provided as a list of tuples, each containing a minimum of 2 items (respectively) representing the aggregate function to apply and the field to which it should be appliedBy default, the aggregated result inherits the name of the field being aggregated, including any qualifier (optionally) provided with the field name or an explicit alias included as a 3rd item within the tuple
The following snippet exhaustively illustrates the functionality described above
sn.select( nm='sandbox.sample_table', apply=[ ('count', 'col1'), ('count', 'distinct col1'), ('count', 'distinct col1', 'col1_dst'), ], run=False, ) >>> select count(col1) as col1 ,count(distinct col1) as distinct_col1 ,count(distinct col1) as col1_dst from sandbox.sample_table
n (int) – Number of records to return, implemented as a ‘limit’ clause in the query; defaults to 1.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
- Returns (Union[str, pd.DataFrame]):
Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
exists
(self, nm: Optional[str] = None) → bool¶ Checks the existence of a table or view.
- Parameters
nm (str) – Name of table or view, including schema if the table or view is outside of the current schema.
- Returns (bool):
Boolean indication of whether or not the table or view exists.
-
is_distinct
(self, nm: Optional[str] = None, field: Optional[str] = None) → bool¶ Checks if table nm is distinct on column on_col
-
count
(self, nm: Optional[str] = None, of: Optional[str] = None, dst_of: Optional[str] = None, as_perc: Optional[bool] = None, run: Optional[bool] = None) → Union[int, float]¶ Number of records within a table or view.
- Parameters
nm (str) – Table name, including schema if querying outside current schema.
of (str) – Column name (indistinct).
dst_of (str) – Column name (distinct).
as_perc (bool) – Option to return distinct count of the dst_of column as a percentage of the namespace depth of the table or view.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
show
(self, obj: str, in_loc: Optional[str] = None, names: bool = False, run: Optional[bool] = None, **kwargs) → Union[pd.DataFrame, List[str], str]¶ Show schema objects of typ ‘obj’, optionally ‘in_loc’.
- Parameters
- Returns (Union[pd.DataFrame, str]):
- Either:
The results of the query as a
pandas.DataFrame
The ‘names’ column of the results returned as a list
The generated query as a
str
of sql
-
ddl
(self, nm: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None) → str¶ Query the DDL for an schema object.
- Parameters
- Returns (str):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
comment
(self, nm: Optional[str] = None, obj: Optional[str] = None, set_as: Optional[str] = None, from_json: bool = False, as_json: bool = False, run: Optional[bool] = None, **kwargs) → Union[str, Dict]¶ Get or set comment on a schema object.
- Parameters
nm (str) – Name of the schema object, including schema prefix if object is outside implicit scope of the current connection.
obj (str) – Type of schema object (e.g. ‘table’, ‘schema’, etc).
set_as (str) – Content to set as comment on schema object.
from_json (bool) – Parse schema object comment as a string of json and return it as a dictionary.
as_json (bool) – Dump contents of ‘set_as’ to a string of json prior to setting comment.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
**kwargs – Keyword argument to pass to json.loads(comment) if from_json=True.
-
last_altered
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.Timestamp]¶ Last altered timestamp for a table or view.
- Parameters
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
truncate
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Truncate a table.
- Parameters
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
drop
(self, nm: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Drop a
Snowflake
object.- Parameters
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
clone
(self, nm: Optional[str] = None, to: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None, replace: bool = False) → Union[str, pd.DataFrame]¶ Clone a
Snowflake
object.Warning
Make sure to read Snowflake’s documentation for restrictions and considerations when cloning objects.
Note
In this specific method, the value provided to
nm
andto
can be a single object name, a single schema, or both in the form of obj_schema.obj_name depending on the desired outcome.Additionally, at least one of the
nm
orto
arguments must be pr.The defaults for the target object are constructed such that users can either:
Clone objects to other schemas that inherit the source object’s name without specifying so in the
to
argument, orClone objects within the current schema that inherit the source object’s schema without specifying so in the
to
argument.
If providing a schema without a name to either argument, prefix the value provided with __ to signify it’s a schema and not a lower-level object to be cloned.
e.g. providing nm=’sample_table’ and to=’__sandbox’ will clone sample_table from the current schema to sandbox.sample_table.
An assertion error will be raised raised if neither argument is specified as this would result in a command to clone an object and store it in an object that has the same name & schema as the object being cloned.
- Parameters
nm (str) – Name of the object to clone, including schema if cloning an object outside of the current schema.
to (str) – Target name for cloned object, including schema if cloning an object outside of the current schema.
obj (str) – Type of object to clone (e.g. ‘table’, ‘view’, ‘file-format’); defaults to table.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
replace (bool) – Indicates whether to replace an existing stage if pre-existing; default is False.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
create_stage
(self, nm_stage: str, nm_format: str, replace: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Create a staging table.
- Parameters
nm_stage (str) – Name of stage to create, including schema if creating a stage outside of the current schema.
nm_format (str) – Name of file format to specify for the stage, including schema if using a format from outside of the current schema.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
replace (bool) – Indicates whether to replace an existing stage if pre-existing; default is False.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
put_file_from_stage
(self, path: Union[Path, str], nm_stage: str, options: Optional[Dict] = None, ignore_defaults: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Generates a ‘put’ command into a staging table from a local file.
- Parameters
path (Union[Path, str]) – Path to local data file as a
pathlib.Path
or string.nm_stage (str) – Name of the staging table to load into.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
options (dict) – Optional arguments to add to put statement in addition to the values specified in the
loading.put
section of snowmobile.toml.ignore_defaults (bool) – Option to ignore the values specified in snowmobile.toml; defaults to False.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
copy_into_table_from_stage
(self, nm: str, nm_stage: str, options: Optional[Dict] = None, ignore_defaults: bool = False, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Generates a command to copy data into a table from a staging table.
- Parameters
nm (str) – Name of the object to drop, including schema if creating a stage outside of the current schema.
nm_stage (str) – Name of the staging table to load from.
run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.
options (dict) – Optional arguments to add to put statement in addition to the values specified in the
loading.put
section of snowmobile.toml.ignore_defaults (bool) – Option to ignore the values specified in snowmobile.toml; defaults to False.
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
current
(self, obj: str, run: Optional[bool] = None) → Union[str, Union[str, int]]¶ Generic implementation of ‘select current’ for session-based objects.
- Parameters
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
current_session
(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Select the current session.
-
current_schema
(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Select the current schema.
-
current_database
(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Select the current database.
-
current_warehouse
(self, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Select the current warehouse.
-
use
(self, obj: str, nm: str, run: Optional[bool] = None)¶ Generic implementation of ‘use’ command for schema objects.
- Parameters
- Returns (Union[str, pd.DataFrame]):
- Either:
The results of the query as a
pandas.DataFrame
, orThe generated query as a
str
of sql.
-
use_schema
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Use schema command.
-
use_database
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Use database command.
-
use_warehouse
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Use warehouse command.
-
use_role
(self, nm: Optional[str] = None, run: Optional[bool] = None) → Union[str, pd.DataFrame]¶ Use role command.
- static
order
(by: List[Union[int, str]]) → str¶ Generates ‘order by’ clause from a list of fields or field ordinal positions.
- static
where
(restrictions: Dict) → str¶ Generates a ‘where’ clause based on a dictionary of restrictions.
- Parameters
restrictions (dict) – A dictionary of conditionals where each key/value pair respectively represents the left/right side of a condition within a ‘where’ clause.
- Returns (str):
Formatted where clause.