snowmobile.core.sql

SQL contains utility methods to generate common SQL commands; Snowmobile inherits everything from this object and passes along its query() method for statement execution.

Note

The auto_run attribute defaults to True, meaning that the generated sql will execute when a method is called; if set to False the method will return the sql as a string without executing.

The SQL object is primarily interacted with as a pre-instantiated attribute of Snowmobile; in these instances users can fetch the generated sql as a string either by:

  1. Providing run=False to any method called; this will override all behavior set by the current value of auto_run

  2. Setting the auto_run attribute to False on an existing instance of SQL, which will replicate the behavior of (1) without needing to provide run=False to each method called on that instance

Module Contents

Classes

SQL

SQL class for generation & execution of common sql commands.

class snowmobile.core.sql.SQL(_query_func: Callable, _cfg: snowmobile.core.configuration.Configuration, nm: Optional[str] = None, schema: Optional[str] = None, obj: Optional[str] = None, auto_run: Optional[bool] = True)

Bases: snowmobile.core.Generic

SQL class for generation & execution of common sql commands.

Intended to be interacted with as a parent of Snowmobile.

Note

  • All arguments except for sn are optional.

  • The benefit of setting the other attributes on an instance of SQL is to (optionally) avoid passing the same information to multiple methods when generating a variety of statements around the same object.

nm

Object name to use in generated sql (e.g. ‘some_table_name’)

Type

str

obj

Object type to use in generated sql (e.g. ‘table’)

Type

str

schema

Schema to use when dot-prefixing sql; defaults to the schema with which the sn is connected to.

Type

str

auto_run

Indicates whether to automatically execute the sql generated by a given method; defaults to True

Type

bool

Initializes a snowmobile.SQL object.

info_schema(self, loc: str, where: Optional[List[str]] = None, fields: Optional[List[str]] = None, order_by: Optional[List] = None, run: Optional[bool] = None)Union[str, pd.DataFrame]

Generic case of selecting from information schema location.

table_info(self, nm: Optional[str] = None, fields: List[str] = None, restrictions: Dict[str, str] = None, order_by: List[Optional[str, int]] = None, all_schemas: bool = False, run: Optional[bool] = None)Union[str, pd.DataFrame]

Query information_schema.tables for a given table or view.

Parameters
  • nm (str) – Table name, including schema if creating a stage outside of the current schema.

  • fields (List[str]) – List of fields to include in returned results (e.g. [‘table_name’, ‘table_type’, ‘last_altered’])

  • restrictions (List[str]) – List of conditionals typed as literal components of a where clause (e.g. [“table_type = ‘base table’”, ‘last_altered::date = current_date()’] ).

  • order_by (List[str]) – List of fields or their ordinal positions to order the results by.

  • all_schemas (bool) – Include tables/views from all schemas; defaults to False.

  • run (bool) – Determines whether to run the generated sql or not; defaults to None which will reference the current value of the auto_run attribute which defaults to True.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

column_info(self, nm: Optional[str] = None, fields: Optional[List] = None, restrictions: Optional[Dict] = None, order_by: Optional[List] = None, all_schemas: bool = False, run: Optional[bool] = None)Union[str, pd.DataFrame]

Query information_schema.columns for a given table or view.

Parameters
  • nm (str) – Table name, including schema if creating a stage outside of the current schema.

  • fields (List[str]) – List of fields to include in returned results (e.g. [‘ordinal_position’, ‘column_name’, ‘data_type’])

  • restrictions (List[str]) – List of conditionals typed as literal components of a where clause (e.g.[“regexp_count(lower(column_name), ‘tmstmp’) = 0”]).

  • order_by (List[str]) – List of fields or their ordinal positions to order the results by.

  • all_schemas (bool) – Include tables/views from all schemas; defaults to False.

  • run (bool) – Determines whether to run the generated sql or not; defaults to None which will reference the current value of the auto_run attribute which defaults to True.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

columns(self, nm: Optional[str] = None, from_info_schema: bool = False, lower: bool = False, run: Optional[bool] = None)Union[str, List]

Returns an ordered list of columns for a table or view.

Note

  • Default behavior is to retrieve the columns for a table or view by selecting a single sample record and returning the column index from the DataFrame that’s returned which is much faster than selecting the column_names from information_schema.columns pulling column names from the information schema

  • This can be changed by passing from_info_schema=True.

Parameters
  • nm (str) – Name of table or view, including schema if the table or view is outside of the current schema.

  • from_info_schema (bool) – Indicates whether to retrieve columns via the information_schema.columns or by selecting a sample record from the table or view; defaults to False.

  • lower (bool) – Lower case each column in the list that’s returned.

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, List]):
Either:
  1. An ordered list of columns for the table or view, or

  2. The query against the table or view as a str of sql.

select(self, nm: Optional[str] = None, fields: Optional[List[str]] = None, apply: Optional[List[Tuple[str, str]]] = None, n: Optional[int] = None, run: Optional[bool] = None, **kwargs)Union[str, pd.DataFrame]

Generic select statement.

Parameters
  • nm (str) – Table to select from, including schema if the table is outside of the current schema

  • fields (Optional[List[str]]) – Select these fields (optional).

  • apply (Optional[List[Tuple[str, str]]]) –

    Select aggregations of these fields.

    apply [

    (this_func, to_this_field, [as_alias]), (.., .., [..]),

    ]

    • apply should be provided as a list of tuples, each containing a minimum of 2 items (respectively) representing the aggregate function to apply and the field to which it should be applied

    • By default, the aggregated result inherits the name of the field being aggregated, including any qualifier (optionally) provided with the field name or an explicit alias included as a 3rd item within the tuple

      The following snippet exhaustively illustrates the functionality described above

    sn.select(
       nm='sandbox.sample_table',
       apply=[
           ('count', 'col1'),
           ('count', 'distinct col1'),
           ('count', 'distinct col1', 'col1_dst'),
       ],
       run=False,
    )
    >>>
    select
     count(col1)
       as col1
     ,count(distinct col1)
       as distinct_col1
     ,count(distinct col1)
       as col1_dst
    from sandbox.sample_table
    

  • n (int) – Number of records to return, implemented as a ‘limit’ clause in the query; defaults to 1.

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):

Either:

  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

exists(self, nm: Optional[str] = None)bool

Checks the existence of a table or view.

Parameters

nm (str) – Name of table or view, including schema if the table or view is outside of the current schema.

Returns (bool):

Boolean indication of whether or not the table or view exists.

is_distinct(self, nm: Optional[str] = None, field: Optional[str] = None)bool

Checks if table nm is distinct on column on_col

Parameters
  • nm (str) – Table name.

  • field (str) – Column name.

count(self, nm: Optional[str] = None, of: Optional[str] = None, dst_of: Optional[str] = None, as_perc: Optional[bool] = None, run: Optional[bool] = None)Union[int, float]

Number of records within a table or view.

Parameters
  • nm (str) – Table name, including schema if querying outside current schema.

  • of (str) – Column name (indistinct).

  • dst_of (str) – Column name (distinct).

  • as_perc (bool) – Option to return distinct count of the dst_of column as a percentage of the namespace depth of the table or view.

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

show(self, obj: str, in_loc: Optional[str] = None, names: bool = False, run: Optional[bool] = None, **kwargs)Union[pd.DataFrame, List[str], str]

Show schema objects of typ ‘obj’, optionally ‘in_loc’.

Parameters
  • obj (str) – Schema object type (‘tables’, ‘file formats’, etc).

  • in_loc (str) – Snowflake location (‘in schema sandbox’, ‘in database prod’, etc).

  • names (bool) – Return a list of schema object names only (‘name’ field).

  • run (bool) – Execute the generated sql or return it as a string.

Returns (Union[pd.DataFrame, str]):
Either:
  1. The results of the query as a pandas.DataFrame

  2. The ‘names’ column of the results returned as a list

  3. The generated query as a str of sql

ddl(self, nm: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None)str

Query the DDL for an schema object.

Parameters
  • nm (str) – Name of the object to get DDL for, including schema if object is outside of the current schema.

  • obj (str) – Type of object to get DDL for (e.g. ‘table’, ‘view’, ‘file-format’).

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (str):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

comment(self, nm: Optional[str] = None, obj: Optional[str] = None, set_as: Optional[str] = None, from_json: bool = False, as_json: bool = False, run: Optional[bool] = None, **kwargs)Union[str, Dict]

Get or set comment on a schema object.

Parameters
  • nm (str) – Name of the schema object, including schema prefix if object is outside implicit scope of the current connection.

  • obj (str) – Type of schema object (e.g. ‘table’, ‘schema’, etc).

  • set_as (str) – Content to set as comment on schema object.

  • from_json (bool) – Parse schema object comment as a string of json and return it as a dictionary.

  • as_json (bool) – Dump contents of ‘set_as’ to a string of json prior to setting comment.

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

  • **kwargs – Keyword argument to pass to json.loads(comment) if from_json=True.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The schema object comment as a str

  2. The generated query as a str of sql.

  3. The schema object comment as a dictionary if from_json=True

last_altered(self, nm: Optional[str] = None, run: Optional[bool] = None)Union[str, pd.Timestamp]

Last altered timestamp for a table or view.

Parameters
  • nm (str) – Table name, including schema if creating a stage outside of the current schema.

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

truncate(self, nm: Optional[str] = None, run: Optional[bool] = None)Union[str, pd.DataFrame]

Truncate a table.

Parameters
  • nm (str) – Name of table, including schema if the table is outside of the current schema.

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

drop(self, nm: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None)Union[str, pd.DataFrame]

Drop a Snowflake object.

Parameters
  • nm (str) – Schema object’s name.

  • obj (str) – Type of schema object (e.g. ‘table’, ‘view’, or ‘schema’)

  • run (bool) – Execute generated statement; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

clone(self, nm: Optional[str] = None, to: Optional[str] = None, obj: Optional[str] = None, run: Optional[bool] = None, replace: bool = False)Union[str, pd.DataFrame]

Clone a Snowflake object.

Warning

Note

  • In this specific method, the value provided to nm and to can be a single object name, a single schema, or both in the form of obj_schema.obj_name depending on the desired outcome.

  • Additionally, at least one of the nm or to arguments must be pr.

  • The defaults for the target object are constructed such that users can either:

    1. Clone objects to other schemas that inherit the source object’s name without specifying so in the to argument, or

    2. Clone objects within the current schema that inherit the source object’s schema without specifying so in the to argument.

  • If providing a schema without a name to either argument, prefix the value provided with __ to signify it’s a schema and not a lower-level object to be cloned.

    • e.g. providing nm=’sample_table’ and to=’__sandbox’ will clone sample_table from the current schema to sandbox.sample_table.

  • An assertion error will be raised raised if neither argument is specified as this would result in a command to clone an object and store it in an object that has the same name & schema as the object being cloned.

Parameters
  • nm (str) – Name of the object to clone, including schema if cloning an object outside of the current schema.

  • to (str) – Target name for cloned object, including schema if cloning an object outside of the current schema.

  • obj (str) – Type of object to clone (e.g. ‘table’, ‘view’, ‘file-format’); defaults to table.

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

  • replace (bool) – Indicates whether to replace an existing stage if pre-existing; default is False.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

create_stage(self, nm_stage: str, nm_format: str, replace: bool = False, run: Optional[bool] = None)Union[str, pd.DataFrame]

Create a staging table.

Parameters
  • nm_stage (str) – Name of stage to create, including schema if creating a stage outside of the current schema.

  • nm_format (str) – Name of file format to specify for the stage, including schema if using a format from outside of the current schema.

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

  • replace (bool) – Indicates whether to replace an existing stage if pre-existing; default is False.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

put_file_from_stage(self, path: Union[Path, str], nm_stage: str, options: Optional[Dict] = None, ignore_defaults: bool = False, run: Optional[bool] = None)Union[str, pd.DataFrame]

Generates a ‘put’ command into a staging table from a local file.

Parameters
  • path (Union[Path, str]) – Path to local data file as a pathlib.Path or string.

  • nm_stage (str) – Name of the staging table to load into.

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

  • options (dict) – Optional arguments to add to put statement in addition to the values specified in the loading.put section of snowmobile.toml.

  • ignore_defaults (bool) – Option to ignore the values specified in snowmobile.toml; defaults to False.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

copy_into_table_from_stage(self, nm: str, nm_stage: str, options: Optional[Dict] = None, ignore_defaults: bool = False, run: Optional[bool] = None)Union[str, pd.DataFrame]

Generates a command to copy data into a table from a staging table.

Parameters
  • nm (str) – Name of the object to drop, including schema if creating a stage outside of the current schema.

  • nm_stage (str) – Name of the staging table to load from.

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

  • options (dict) – Optional arguments to add to put statement in addition to the values specified in the loading.put section of snowmobile.toml.

  • ignore_defaults (bool) – Option to ignore the values specified in snowmobile.toml; defaults to False.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

current(self, obj: str, run: Optional[bool] = None)Union[str, Union[str, int]]

Generic implementation of ‘select current’ for session-based objects.

Parameters
  • obj (str) – Type of object to retrieve information for (schema, session, ..).

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

current_session(self, run: Optional[bool] = None)Union[str, pd.DataFrame]

Select the current session.

current_schema(self, run: Optional[bool] = None)Union[str, pd.DataFrame]

Select the current schema.

current_database(self, run: Optional[bool] = None)Union[str, pd.DataFrame]

Select the current database.

current_warehouse(self, run: Optional[bool] = None)Union[str, pd.DataFrame]

Select the current warehouse.

current_role(self, run: Optional[bool] = None)Union[str, pd.DataFrame]

Select the current role.

use(self, obj: str, nm: str, run: Optional[bool] = None)

Generic implementation of ‘use’ command for schema objects.

Parameters
  • nm (str) – Name of object to use (schema name, warehouse name, role name, ..).

  • obj (str) – Type of object to use (schema, warehouse, role, ..).

  • run (bool) – Execute generated sql; defaults to True, otherwise returns sql as a string.

Returns (Union[str, pd.DataFrame]):
Either:
  1. The results of the query as a pandas.DataFrame, or

  2. The generated query as a str of sql.

use_schema(self, nm: Optional[str] = None, run: Optional[bool] = None)Union[str, pd.DataFrame]

Use schema command.

use_database(self, nm: Optional[str] = None, run: Optional[bool] = None)Union[str, pd.DataFrame]

Use database command.

use_warehouse(self, nm: Optional[str] = None, run: Optional[bool] = None)Union[str, pd.DataFrame]

Use warehouse command.

use_role(self, nm: Optional[str] = None, run: Optional[bool] = None)Union[str, pd.DataFrame]

Use role command.

static order(by: List[Union[int, str]])str

Generates ‘order by’ clause from a list of fields or field ordinal positions.

static where(restrictions: Dict)str

Generates a ‘where’ clause based on a dictionary of restrictions.

Parameters

restrictions (dict) – A dictionary of conditionals where each key/value pair respectively represents the left/right side of a condition within a ‘where’ clause.

Returns (str):

Formatted where clause.

static fields(fields: Optional[List[str]] = None)str

Utility to generate fields within a ‘select’ statement.