Script¶
snowmobile.core.script
snowmobile.Script
parses a raw sql file into a composition of objects that can be leveraged for:
Documentation and standardization of sql
Access to individual statements within a script
Lightweight control flow and QA
Code generation and warehouse cleanup
Overview¶
Note: If you’re just wanting to run some sql
The most straight-forward way to execute a local sql file is through the SnowflakeConnection.execute_stream() method, the API for which can be accessed from an instance of with:
import snowmobile
from codecs import open
sn = snowmobile.connect()
with open(sqlfile, 'r', encoding='utf-8') as f:
for cur in sn.con.execute_stream(f):
for ret in cur:
print(ret)
Model Intro¶
intro1.sql
This section creates a Script from the following file, intro1.sql, containing 3 bare sql statements:
-- ./docs/snippets/script/intro/intro1.sql
create or replace table sample_table (
col1 number(18,0),
col2 number(18,0)
);
insert into sample_table (col1, col2) values(1, 2);
select * from sample_table;
Crash Course¶
Creating a Script¶
snowmobile.Script identifies sql and metadata in a sql file; assuming path is a full path to intro1.sql , script can be created with:
import snowmobile
script = snowmobile.Script(path=path)
Each command is instantiated as its own Statement
and stored according to its position in the original script; script.dtl()
is used to send a summary of the contents parsed by script to the console:
script.dtl()
intro1.sql
==========
1: Statement('create table~s1')
2: Statement('insert into~s2')
3: Statement('select data~s3')
FYI
script.dtl()
is generating its
output with something like:
for i, s in script.items():
print(f"{i}: {s}")
1: Statement('create table~s1')
2: Statement('insert into~s2')
3: Statement('select data~s3')
Missing Content
Because these are bare sql statements..
s3 = script(3)
print(s3.index) #> 3
print(s3.sql()) #> select * from sample_table
print(s3.kw()) #> select
print(s3.anchor()) #> select data
print(s3.desc()) #> s3
print(s3.nm()) #> select data~s3
Core Objects¶
When Script parses a string of sql, it identifies and stores statements, tags, and markers:
Statement
A valid sql command, a standard set of attributes, and any information (optionally) provided in a tag
Tag
An arbitrary amount of information wrapped in a pre-defined, sql-compliant pattern that is parsable by snowmobile
Marker
A collection of information within a tag that is associated with the script (or a subset of it) as opposed to an individual statement
Note
The simple zen is to enable the consistent, clear annotation of sql in a way that is:
(1) easily human-readable / writable
(2) syntactically (& idiomatically) compliant
(3) identifiable and parsable by snowmobile
To that end, snowmobile.Script intentionally ignores all comments that are not part of a tag.
Sections & Markup¶
A Section
can be instantiated from a Statement
or a Marker, and the Markup
class combines multiple sections into a single document:
Section
Performs additional operations on the attributes from a Statement
or a Marker, typically to generate a ‘headered’ section in a markdown file or a sql statement stripped of surrounding comments
Markup
A context-specific collection of all sections within a script; capable of exporting markdown and tidied sql files
Calling the doc()
method on a Script will return a Markup
of its contents.
The Markup.save()
method will (by default) export a pair of files into a .snowmobile
folder directly adjacent to the file with which the Script was instantiated.
A base case for this in practice is outlined in Example: intro.sql
Example: intro.sql
/*-
__intro.sql__
__authored-by: Some Chap or Lass
__authored-on: Some Day or Year
__p*_***:
**Impetus**: *SQL is older than time and isn't going anywhere; might we allow a simple markup syntax?*
-*/
/*-
create table~sample_table; DDL
__description: This is an example statement description
-*/
create or replace table sample_table (
col1 number(18,0),
col2 number(18,0)
);
With a path
to intro.sql, the following can be run:
import snowmobile
script = snowmobile.Script(path=path)
markup = script.doc()
print(script) #> snowmobile.Script('intro.sql')
print(markup) #> snowmobile.core.Markup('intro.sql')
markup.save()
Given intro.sql is here:
sql/
└─ intro.sql
markup.save()
created the .snowmobile
directory and exported the following files:
sql/
├─ intro.sql
└─ .snowmobile/
├─ intro.md
└─ intro.sql
intro.sql
Authored-By: Some Chap or Lass
Authored-On: Some Day or Year
Impetus: SQL is older than time and isn’t going anywhere; might we allow a simple markup syntax?
(1) create table~sample_table; DDL
Description: This is an example statement description
create or replace table sample_table (
col1 number(18,0),
col2 number(18,0)
);
/*: -----------------------------------------------------------------------
** This file was stripped of all comments and exported by Snowmobile **
----------------------------------------------------------------------- :*/
/*-
__intro.sql__
__authored-by: Some Chap or Lass
__authored-on: Some Day or Year
__p*_***:
**Impetus**: *SQL is older than time and isn't going anywhere; might we allow a simple markup syntax?*
-*/
/*-create table~sample_table; DDL-*/
create or replace table sample_table (
col1 number(18,0),
col2 number(18,0)
);
Statements¶
script
This section performs operations on the following :
script = snowmobile.Script(path=path)
Where path
(pathlib.Path
or str
) is a full path to overview.sql.
The 7 generic sql statements within overview.sql are arbitrary and chosen based only on the loose criteria of:
create or replace table sample_table (
col1 number(18,0),
col2 number(18,0)
);
insert into sample_table with
sample_data as (
select
uniform(1, 10, random(1)) as rand_int
from table(generator(rowcount => 3)) v
)
select
row_number() over (order by a.rand_int) as col1
,(col1 * col1) as col2
from sample_data a;
select * from sample_table;
/*-select all~sample_table-*/
select * from sample_table;
create or replace transient table any_other_table clone sample_table;
insert into any_other_table (
select
a.*
from sample_table a
);
drop table if exists sample_table;
Intro¶
When a sql file is parsed by Script
, each statement is identified and instantiated as its own Statement
.
An overview of the statements within a script’s context can be sent to the console with script.dtl()
; in the case of , this looks like:
script.dtl()
overview.sql
============
1: Statement('create table~s1')
2: Statement('insert into~s2')
3: Statement('select data~s3')
4: Statement('select all~sample_table')
5: Statement('create transient table~s5')
6: Statement('insert into~s6')
7: Statement('drop table~s7')
Accessing the first and last statements of and inspecting a few of their attributes can be done with:
# Store a few st, accessed by index position
s_first, s_last = script(1), script(-1)
# first sql keyword
print(s_first.kw) #> create
print(s_last.kw) #> drop
# position within `script`
print(s_first.index) #> 1
print(s_last.index) #> 7
A Statement
can be interacted with off the Script or stored and used independently; for example, here are two ways that the first statement in overview.sql can be executed:
script.run(1) # .run() from `script`
script(1).run() # .run() from `statement`
Those above are several amongst a set of Statement
attributes that can be used to alter the scope of a Script.
For example, the following snippet filters out drop
and select
statements based on their kw
attribute and returns a modified , s
, that can be operated on within that context:
print(script.depth) #> 7
print(script(1).nm) #> create table~s1
print(script(-1).nm) #> drop table~s7
with script.filter(excl_kw=['select', 'drop']) as s:
print(s.depth) #> 4
print(s(1).nm) #> create table~s1
print(s(-1).nm) #> insert into~s4
s.dtl()
overview.sql
============
1: Statement('create table~s1')
2: Statement('insert into~s2')
3: Statement('create transient table~s3')
4: Statement('insert into~s4')
The following section outlines how these components are constructed.
Statement Names¶
The intent of the following taxonomy is to define a standard such that the name for a given statement is:
Constructed from attributes that can be unambiguously parsed from a piece of raw sql
Structured such that user provided names can be easily implemented and loosely parsed into the same set of attributes as those generated from (1)
Every statement has a Name
with a set of underlying properties that are used by the rest of the API; for each property, there is a generated (_ge) and provided (_pr) attribute from which its final value is sourced.
Generated attributes are populated for all st, whereas only those with a name specified in a tag have populated provided attributes; consequently, a provided value takes precedent over its generated counterpart.
Example: nm
The nm
value for a given statement will be equivalent to its nm_pr
if present and its nm_ge
otherwise.
This resolution order is repeated across the underlying components of nm
, documented in the following sections.
s1 & s4
The below statements, s1
and s4
, from are used throughout the remaining examples in this section.
# Store statements 1 and 4 for inspection
s1, s4 = script(1), script(4)
nm¶
{anchor}{delimiter}{desc}
anchor
what operation is a statement performing
on what kind of object is it operating
a configured value with which to delimit the anchor
and desc
desc
A free-form piece of text associated with the statement
nm
is the highest-level accessor for a Statement
.
Its values for s1 & s4 (for example) can be inspected with:
print(s1.nm) #> create table~s1
print(s4.nm) #> select all~sample_table
In determining the nm
for s1 specifically, is considering the following two lines of overview.sql:
/*-select all~sample_table-*/
select * from sample_table;
Each of these two lines above is the respective source for provided and generated information about the statement called out in Example: nm , the underlying values for which can be inspected in the same way:
print(s4.anchor_ge) #> select data
print(s4.anchor_pr) #> select all
print(s4.anchor) #> select all
print(s4.desc_ge) #> s4
print(s4.desc_pr) #> sample_table
print(s4.desc) #> sample_table
print(s4.nm_ge) #> select data~s4
print(s4.nm_pr) #> select all~sample_table
print(s4.nm) #> select all~sample_table
anchor¶
{kw} {obj}
kw
the literal first sql keyword the statement contains
obj
the in-warehouse object found in the first line of the statement
anchor
represents all text to the left of the first delimiter
and when generated will fit the above structure to a varying degree depending on the sql being parsed and configurations in snowmobile.toml.
For s1 & s4 :
print(s1.anchor) #> create table
print(s4.anchor) #> select all
kw¶
kw
is the literal first keyword within the command being executed by a statement’s sql.
For s1 & s4:
print(s1.kw) #> create
print(s4.kw) #> select
The keyword-exceptions section in the [sql] block of snowmobile-ext.toml enables specifying an alternate keyword for a literal keyword parsed from a statement’s sql; alternate keywords will populate the statement’s
kw_ge
as opposed to the literal keyword identified at the start of the statement:
[sql.keyword-exceptions]
"with" = "select"
The default included above is the reason that the kw
for both the following statements is select
as opposed to select
and with
respectively:
-- kw = 'select'
select * from any_table;
-- kw = 'select'
with some_cte as (
select * from any_table
)
select * from some_cte;
obj¶
obj
is determined by a case-insensitive, full (‘word-boundaried’) search through the first line of a statement’s sql for a match within a pre-defined set of values.
The values for which a match is checked are configured in the named-objects section within the [sql] block of snowmobile-ext.toml, included below.
Matching is peformed against values in the literal order as they are configured in snowmobile-ext.toml until a match is found or the list is exhausted; it is enforced that the object found cannot be equal to the kw
for the statement.
named-objects = [
# 'grant' statements "select",
"all",
"drop",
# base objects
"temp table",
"transient table",
"table",
"view",
"schema",
"warehouse",
"file format",
# plural bases
"tables",
"views",
"schemas",
]
Note
The above order is as such so that table qualifiers for the following three (types of) statements are reflected in the obj
for each.
-- obj = 'table'
create table any_table as
select 1 as any_col;
-- obj = 'transient table'
create transient table any_table2 as
select 1 as any_col;
-- obj = 'temp table'
create temp table any_table3 as
select 1 as any_col;
A mapping of sql keywords to generic anchor names can be configured in the generic-anchors block within the [sql] section of snowmobile-ext.toml, included below.
[sql.generic-anchors]
"select" = "select data"
"set" = "set param"
"unset" = "unset param"
"insert" = "insert into"
"delete" = "delete from"
delimiter¶
delimiter
is a literal constant specified in the description-delimiter
field within the [script.patterns.core] section of snowmobile.toml, the value for which can be accessed directly off with:
print(sn.cfg.script.patterns.core.delimiter) #> ~
desc¶
desc
is a free-form text field loosely intended to be short-hand description for the statement.
The generated description for a statement, desc_ge
, is a concatenation of a constant prefix and its index position within the script.
The prefix used is configurable in the description-index-prefix
field within the [script.patterns.core] section of snowmobile.toml, the value for which can be accessed directly off with:
print(sn.cfg.script.patterns.core.prefix) #> s
The provided description for a statement, desc_pr
, is all text to the right of the first character found matching the delimiter within a statement’s nm_pr
.
Warning
The functionality outlined below is experimental and not under test.
Using parsed values for the obj_ge
and desc_ge
can be enabled by setting the desc-is-simple field to true
in snowmobile-ext.toml or by modifying the attribute’s value on an instance of .
In the case of , this looks like:
# alter default value of 'desc_is_simple'
sn.cfg.sql.desc_is_simple = False
# re-inspect the script's contents
script.dtl()
overview.sql
============
1: Statement('create table~sample_table: s1')
2: Statement('insert into~sample_table: s2')
3: Statement('select data~sample_table: s3')
4: Statement('select all~sample_table')
5: Statement('create transient table~any_other_table clone sample_table: s5')
6: Statement('insert into~any_other_table: s6')
7: Statement('drop table~sample_table: s7')
Tags¶
A tag contains an abitrary amount of information wrapped in a pre-defined opening/closing pattern. It can be associated with a Statement
, identified by its literal position relative to the statement’s sql, or with a Marker
, identified by its contents.
The default pattern, highlighted in the below snippet from snowmobile.toml, mirrors that of a standard sql block comment with an additional dash (-
) on the inside of each component:
64 65 | export-dir-name = '.snowmobile'
result-limit = -1
|
Markers¶
TODO
Missing
MORE CONTENT GOES HERE
Markup¶
Using markup within a script enables:
Defining accessors for individual statements - Adding descriptive information to individual statements or to the script itself
Finer-grained control of the script’s execution
Generating documentation and cleansed sql files from the working version of a script
snowmobile introduces two sql-compliant forms of adding markup to a sql file:
Tags enable constructing collections of attributes amidst sql st, including those directly associated with a particular statement
Markers are a collection of attributes that are not associated with a particular statement
The following sections outline the different ways that Tags and Markers are implemented and utilized.
Single-Line Tags¶
Single-line tags are the simplest form of markup and can be used to succinctly denote a name for a given statement.
When a single-line string directly precedes a statement and is wrapped in a valid open/close pattern, it will be recognized as the provided name (nm_pr
) and used as the statement’s name (nm
) as opposed to its generated name (nm_ge
).
Consider the sql file, tags_single-line.sql, containing two st, the first and second of which have valid and invalid single-line tags respectively:
-- ..docs/snippets/script/tags_single-line.sql
/*-I am a wrap-*/
select * from sample_table;
/*-I am a wrap that isn't positioned correctly-*/
select * from sample_table;
Given a path
to tags_single-line.sql and , the following script
can be created:
# Instantiate a Script from sql file
script = snowmobile.Script(path=path, sn=sn)
# Store individual statements for inspection
s1, s2 = script(1), script(2)
print(s1) #> Statement('I am a tag')
print(s1.nm_ge) #> select data~s1
print(s1.nm_pr) #> I am a tag
print(s1.nm) #> I am a tag
print(s2) #> Statement('select data~s2')
print(s2.nm_ge) #> select data~s2
print(s2.nm_pr) #> ''
print(s2.nm) #> select data~s2
Note
The first statement has a valid tag directly preceding it, so its name (nm
) is populated by the provided name within the tag (nm_pr
) as opposed to the name that was generated for the statement (nm_ge
).
The second statement does not have a valid tag directly preceding it, so its generated name, select data~s2
, is used and the line /*-I am a tag that isn't positioned correctly-*/
is ignored.
Multi-Line Tags¶
Multi-line tags provide a method of associating multiple attributes with a Statement
according to the following syntax:
Attribute names must:
Start at the beginning of a new line
Have leading double underscores (
__
)End with a single colon (
:
)
Attribute values have no restrictions except for several reserved attributes documented in the reserved attributes (LINK NEEDED) section below
In practice, this looks something like the following:
-- ..docs/snippets/script/tags_multi-line.sql
/*-
__name: I am a wrap
__description: This is an example of a wrap with the name explicitly declared.
-*/
select * from sample_table;
/*-
I am another wrap
__description: This is an example of a wrap with the name implicitly declared.
-*/
select * from sample_table;
Tip
Trailing wildcards can be appended to attribute names to denote how information will be rendered in generated documentation; this is covered in Patterns - Wildcards below.
Patterns¶
TODO
Missing
MORE CONTENT GOES HERE
Core¶
TODO
Missing
MORE CONTENT GOES HERE
Wildcards¶
TODO
Missing
MORE CONTENT GOES HERE