Script


snowmobile.core.script

snowmobile.Script parses a raw sql file into a composition of objects that can be leveraged for:

Documentation and standardization of sql

Access to individual statements within a script

Lightweight control flow and QA

Code generation and warehouse cleanup




Overview


Note: If you’re just wanting to run some sql

The most straight-forward way to execute a local sql file is through the SnowflakeConnection.execute_stream() method, the API for which can be accessed from an instance of with:

import snowmobile

from codecs import open

sn = snowmobile.connect()
with open(sqlfile, 'r', encoding='utf-8') as f:
   for cur in sn.con.execute_stream(f):
       for ret in cur:
           print(ret)

Model Intro



intro1.sql

This section creates a Script from the following file, intro1.sql, containing 3 bare sql statements:

-- ./docs/snippets/script/intro/intro1.sql

create or replace table sample_table (
  col1 number(18,0),
  col2 number(18,0)
);

insert into sample_table (col1, col2) values(1, 2);

select * from sample_table;

Crash Course




Creating a Script

snowmobile.Script identifies sql and metadata in a sql file; assuming path is a full path to intro1.sql , script can be created with:

import snowmobile

script = snowmobile.Script(path=path)

Each command is instantiated as its own Statement and stored according to its position in the original script; script.dtl() is used to send a summary of the contents parsed by script to the console:

script.dtl()
intro1.sql
==========
1: Statement('create table~s1')
2: Statement('insert into~s2')
3: Statement('select data~s3')

FYI

script.dtl() is generating its
output with something like:


for i, s in script.items():
    print(f"{i}: {s}")
1: Statement('create table~s1')
2: Statement('insert into~s2')
3: Statement('select data~s3')

Missing Content


Because these are bare sql statements..

s3 = script(3)
print(s3.index)     #> 3
print(s3.sql())     #> select * from sample_table
print(s3.kw())      #> select
print(s3.anchor())  #> select data
print(s3.desc())    #> s3
print(s3.nm())      #> select data~s3


Core Objects



When Script parses a string of sql, it identifies and stores statements, tags, and markers:


Statement
A valid sql command, a standard set of attributes, and any information (optionally) provided in a tag

Tag
An arbitrary amount of information wrapped in a pre-defined, sql-compliant pattern that is parsable by snowmobile

Marker
A collection of information within a tag that is associated with the script (or a subset of it) as opposed to an individual statement


Note

The simple zen is to enable the consistent, clear annotation of sql in a way that is:
(1) easily human-readable / writable
(2) syntactically (& idiomatically) compliant
(3) identifiable and parsable by snowmobile

To that end, snowmobile.Script intentionally ignores all comments that are not part of a tag.


Sections & Markup



A Section can be instantiated from a Statement or a Marker, and the Markup class combines multiple sections into a single document:


Section
Performs additional operations on the attributes from a Statement or a Marker, typically to generate a ‘headered’ section in a markdown file or a sql statement stripped of surrounding comments

Markup
A context-specific collection of all sections within a script; capable of exporting markdown and tidied sql files


Calling the doc() method on a Script will return a Markup of its contents.

The Markup.save() method will (by default) export a pair of files into a .snowmobile folder directly adjacent to the file with which the Script was instantiated.

A base case for this in practice is outlined in Example: intro.sql

Note

The following options can be configured on an instance of Markup prior to calling save():

  • Target location

  • File names

  • File types

  • File contents



Example: intro.sql


/*-
__intro.sql__
__authored-by: Some Chap or Lass
__authored-on: Some Day or Year
__p*_***:
**Impetus**: *SQL is older than time and isn't going anywhere; might we allow a simple markup syntax?*
-*/

/*-
create table~sample_table; DDL
__description: This is an example statement description
-*/
create or replace table sample_table (
	col1 number(18,0),
	col2 number(18,0)
);

With a path to intro.sql, the following can be run:

import snowmobile

script = snowmobile.Script(path=path)
markup = script.doc()

print(script)  #> snowmobile.Script('intro.sql')
print(markup)  #> snowmobile.core.Markup('intro.sql')

markup.save()

Given intro.sql is here:

sql/
└─ intro.sql

markup.save() created the .snowmobile directory and exported the following files:

sql/
├─ intro.sql
└─ .snowmobile/
   ├─ intro.md
   └─ intro.sql

intro.sql

  • Authored-By: Some Chap or Lass

  • Authored-On: Some Day or Year

Impetus: SQL is older than time and isn’t going anywhere; might we allow a simple markup syntax?


(1) create table~sample_table; DDL

  • Description: This is an example statement description

create or replace table sample_table (
    col1 number(18,0),
    col2 number(18,0)
);
/*: -----------------------------------------------------------------------
    ** This file was stripped of all comments and exported by Snowmobile **
    ----------------------------------------------------------------------- :*/

/*-
__intro.sql__
__authored-by: Some Chap or Lass
__authored-on: Some Day or Year
__p*_***:
**Impetus**: *SQL is older than time and isn't going anywhere; might we allow a simple markup syntax?*
-*/

/*-create table~sample_table; DDL-*/
create or replace table sample_table (
	col1 number(18,0),
	col2 number(18,0)
);

Statements



script

This section performs operations on the following :

script = snowmobile.Script(path=path)

Where path (pathlib.Path or str) is a full path to overview.sql.


The 7 generic sql statements within overview.sql are arbitrary and chosen based only on the loose criteria of:

  1. Includes the minimum variety of Statements and Markup to demonstrate the fundamentals of how Script parses sql

  2. Is executable from top to bottom without requiring external setup


create or replace table sample_table (
	col1 number(18,0),
	col2 number(18,0)
);

insert into sample_table with
sample_data as (
  select
    uniform(1, 10, random(1)) as rand_int
  from table(generator(rowcount => 3)) v
)
  select
    row_number() over (order by a.rand_int) as col1
    ,(col1 * col1) as col2
  from sample_data a;

select * from sample_table;

/*-select all~sample_table-*/
select * from sample_table;

create or replace transient table any_other_table clone sample_table;

insert into any_other_table (
  select
    a.*
  from sample_table a
);

drop table if exists sample_table;

Intro



When a sql file is parsed by Script, each statement is identified and instantiated as its own Statement.

An overview of the statements within a script’s context can be sent to the console with script.dtl(); in the case of , this looks like:

script.dtl()
overview.sql
============
1: Statement('create table~s1')
2: Statement('insert into~s2')
3: Statement('select data~s3')
4: Statement('select all~sample_table')
5: Statement('create transient table~s5')
6: Statement('insert into~s6')
7: Statement('drop table~s7')

Accessing the first and last statements of and inspecting a few of their attributes can be done with:

# Store a few st, accessed by index position
s_first, s_last = script(1), script(-1)

# first sql keyword
print(s_first.kw)  #> create
print(s_last.kw)   #> drop

# position within `script`
print(s_first.index)  #> 1
print(s_last.index)   #> 7

A Statement can be interacted with off the Script or stored and used independently; for example, here are two ways that the first statement in overview.sql can be executed:

script.run(1)    # .run() from `script`
script(1).run()  # .run() from `statement`

Those above are several amongst a set of Statement attributes that can be used to alter the scope of a Script.

For example, the following snippet filters out drop and select statements based on their kw attribute and returns a modified , s, that can be operated on within that context:

print(script.depth)   #> 7
print(script(1).nm)   #> create table~s1
print(script(-1).nm)  #> drop table~s7

with script.filter(excl_kw=['select', 'drop']) as s:
    print(s.depth)   #> 4
    print(s(1).nm)   #> create table~s1
    print(s(-1).nm)  #> insert into~s4
    s.dtl()
overview.sql
============
1: Statement('create table~s1')
2: Statement('insert into~s2')
3: Statement('create transient table~s3')
4: Statement('insert into~s4')

The following section outlines how these components are constructed.


Statement Names



The intent of the following taxonomy is to define a standard such that the name for a given statement is:

  1. Constructed from attributes that can be unambiguously parsed from a piece of raw sql

  2. Structured such that user provided names can be easily implemented and loosely parsed into the same set of attributes as those generated from (1)


Every statement has a Name with a set of underlying properties that are used by the rest of the API; for each property, there is a generated (_ge) and provided (_pr) attribute from which its final value is sourced.

Generated attributes are populated for all st, whereas only those with a name specified in a tag have populated provided attributes; consequently, a provided value takes precedent over its generated counterpart.

Example: nm

The nm value for a given statement will be equivalent to its nm_pr if present and its nm_ge otherwise.

This resolution order is repeated across the underlying components of nm, documented in the following sections.



s1 & s4

The below statements, s1 and s4, from are used throughout the remaining examples in this section.

# Store statements 1 and 4 for inspection
s1, s4 = script(1), script(4)

nm

{anchor}{delimiter}{desc}

anchor
what operation is a statement performing

on what kind of object is it operating

delimiter

a configured value with which to delimit the anchor and desc

desc
A free-form piece of text associated with the statement


nm is the highest-level accessor for a Statement.

Its values for s1 & s4 (for example) can be inspected with:

print(s1.nm)      #> create table~s1
print(s4.nm)      #> select all~sample_table

In determining the nm for s1 specifically, is considering the following two lines of overview.sql:

/*-select all~sample_table-*/
select * from sample_table;

Each of these two lines above is the respective source for provided and generated information about the statement called out in Example: nm , the underlying values for which can be inspected in the same way:

print(s4.anchor_ge)  #> select data
print(s4.anchor_pr)  #> select all
print(s4.anchor)     #> select all

print(s4.desc_ge)    #> s4
print(s4.desc_pr)    #> sample_table
print(s4.desc)       #> sample_table

print(s4.nm_ge)      #> select data~s4
print(s4.nm_pr)      #> select all~sample_table
print(s4.nm)         #> select all~sample_table

anchor

{kw} {obj}

kw
the literal first sql keyword the statement contains

obj
the in-warehouse object found in the first line of the statement


anchor represents all text to the left of the first delimiter and when generated will fit the above structure to a varying degree depending on the sql being parsed and configurations in snowmobile.toml.

For s1 & s4 :

print(s1.anchor)  #> create table
print(s4.anchor)  #> select all

kw

kw is the literal first keyword within the command being executed by a statement’s sql.

For s1 & s4:

print(s1.kw)  #> create
print(s4.kw)  #> select

The keyword-exceptions section in the [sql] block of snowmobile-ext.toml enables specifying an alternate keyword for a literal keyword parsed from a statement’s sql; alternate keywords will populate the statement’s
kw_ge as opposed to the literal keyword identified at the start of the statement:

   [sql.keyword-exceptions]
       "with" = "select"

The default included above is the reason that the kw for both the following statements is select as opposed to select and with respectively:

-- kw = 'select'
select * from any_table;

-- kw = 'select'
with some_cte as (
 select * from any_table
)
 select * from some_cte;

obj

obj is determined by a case-insensitive, full (‘word-boundaried’) search through the first line of a statement’s sql for a match within a pre-defined set of values.

The values for which a match is checked are configured in the named-objects section within the [sql] block of snowmobile-ext.toml, included below.

Matching is peformed against values in the literal order as they are configured in snowmobile-ext.toml until a match is found or the list is exhausted; it is enforced that the object found cannot be equal to the kw for the statement.

   named-objects = [
       # 'grant' statements         "select",
       "all",
       "drop",

       # base objects
       "temp table",
       "transient table",
       "table",
       "view",
       "schema",
       "warehouse",
       "file format",

       # plural bases
       "tables",
       "views",
       "schemas",
   ]

Note

The above order is as such so that table qualifiers for the following three (types of) statements are reflected in the obj for each.


-- obj = 'table'
create table any_table as 
select 1 as any_col;

-- obj = 'transient table'
create transient table any_table2 as 
select 1 as any_col;

-- obj = 'temp table'
create temp table any_table3 as 
select 1 as any_col;

A mapping of sql keywords to generic anchor names can be configured in the generic-anchors block within the [sql] section of snowmobile-ext.toml, included below.

[sql.generic-anchors]
    "select" = "select data"
    "set" = "set param"
    "unset" = "unset param"
    "insert" = "insert into"
    "delete" = "delete from"

delimiter

delimiter is a literal constant specified in the description-delimiter field within the [script.patterns.core] section of snowmobile.toml, the value for which can be accessed directly off with:

print(sn.cfg.script.patterns.core.delimiter)  #> ~

desc

desc is a free-form text field loosely intended to be short-hand description for the statement.


The generated description for a statement, desc_ge, is a concatenation of a constant prefix and its index position within the script.

The prefix used is configurable in the description-index-prefix field within the [script.patterns.core] section of snowmobile.toml, the value for which can be accessed directly off with:

print(sn.cfg.script.patterns.core.prefix)  #> s

The provided description for a statement, desc_pr, is all text to the right of the first character found matching the delimiter within a statement’s nm_pr.



Warning

The functionality outlined below is experimental and not under test.

Using parsed values for the obj_ge and desc_ge can be enabled by setting the desc-is-simple field to true in snowmobile-ext.toml or by modifying the attribute’s value on an instance of .

In the case of , this looks like:

# alter default value of 'desc_is_simple'
sn.cfg.sql.desc_is_simple = False

# re-inspect the script's contents
script.dtl()
overview.sql
============
1: Statement('create table~sample_table: s1')
2: Statement('insert into~sample_table: s2')
3: Statement('select data~sample_table: s3')
4: Statement('select all~sample_table')
5: Statement('create transient table~any_other_table clone sample_table: s5')
6: Statement('insert into~any_other_table: s6')
7: Statement('drop table~sample_table: s7')



Tags


A tag contains an abitrary amount of information wrapped in a pre-defined opening/closing pattern. It can be associated with a Statement, identified by its literal position relative to the statement’s sql, or with a Marker, identified by its contents.

The default pattern, highlighted in the below snippet from snowmobile.toml, mirrors that of a standard sql block comment with an additional dash (-) on the inside of each component:

64
65
    export-dir-name = '.snowmobile'
    result-limit = -1


Markers


TODO

Missing

MORE CONTENT GOES HERE



Markup


Using markup within a script enables:

  • Defining accessors for individual statements - Adding descriptive information to individual statements or to the script itself

  • Finer-grained control of the script’s execution

  • Generating documentation and cleansed sql files from the working version of a script

snowmobile introduces two sql-compliant forms of adding markup to a sql file:

  1. Tags enable constructing collections of attributes amidst sql st, including those directly associated with a particular statement

  2. Markers are a collection of attributes that are not associated with a particular statement

The following sections outline the different ways that Tags and Markers are implemented and utilized.



Single-Line Tags


Single-line tags are the simplest form of markup and can be used to succinctly denote a name for a given statement.

When a single-line string directly precedes a statement and is wrapped in a valid open/close pattern, it will be recognized as the provided name (nm_pr) and used as the statement’s name (nm) as opposed to its generated name (nm_ge).

Consider the sql file, tags_single-line.sql, containing two st, the first and second of which have valid and invalid single-line tags respectively:

-- ..docs/snippets/script/tags_single-line.sql

/*-I am a wrap-*/
select * from sample_table;

/*-I am a wrap that isn't positioned correctly-*/

select * from sample_table;

Given a path to tags_single-line.sql and , the following script can be created:

# Instantiate a Script from sql file
script = snowmobile.Script(path=path, sn=sn)

# Store individual statements for inspection
s1, s2 = script(1), script(2)

print(s1)        #> Statement('I am a tag')
print(s1.nm_ge)  #> select data~s1
print(s1.nm_pr)  #> I am a tag
print(s1.nm)     #> I am a tag

print(s2)        #> Statement('select data~s2')
print(s2.nm_ge)  #> select data~s2
print(s2.nm_pr)  #> ''
print(s2.nm)     #> select data~s2

Note

The first statement has a valid tag directly preceding it, so its name (nm) is populated by the provided name within the tag (nm_pr) as opposed to the name that was generated for the statement (nm_ge).

The second statement does not have a valid tag directly preceding it, so its generated name, select data~s2, is used and the line /*-I am a tag that isn't positioned correctly-*/ is ignored.



Multi-Line Tags


Multi-line tags provide a method of associating multiple attributes with a Statement according to the following syntax:

  • Attribute names must:

    1. Start at the beginning of a new line

    2. Have leading double underscores (__)

    3. End with a single colon (:)

  • Attribute values have no restrictions except for several reserved attributes documented in the reserved attributes (LINK NEEDED) section below

In practice, this looks something like the following:

-- ..docs/snippets/script/tags_multi-line.sql

/*-
__name: I am a wrap
__description: This is an example of a wrap with the name explicitly declared.
-*/
select * from sample_table;

/*-
I am another wrap
__description: This is an example of a wrap with the name implicitly declared.
-*/
select * from sample_table;

Tip

Trailing wildcards can be appended to attribute names to denote how information will be rendered in generated documentation; this is covered in Patterns - Wildcards below.



Patterns


TODO

Missing

MORE CONTENT GOES HERE



Core

TODO

Missing

MORE CONTENT GOES HERE



Wildcards

TODO

Missing

MORE CONTENT GOES HERE