snowmobile.core.table
¶
snowmobile.Table
is a canned implementation of the Bulk Loading from a Local File System standard and is intended to provide a predictable, no-nonsense method of loading a DataFrame, df
, into a table
(str
).
Note
Core functionality includes:
Generating and executing generic DDL for
df
if the table doesn’t yet existExecuting DDL for the file format being used if it doesn’t yet exist in the current schema, or (optionally) specifying an alias for a file format in its
file_format
argument; in the case of the latter:An absolute
path
to an independent, user-defined sql file must be specified within the external-sources.ddl field ofsnowmobile.toml
Prior to attempting the load of
df
,snowmobile.Table
will create aScript
from the configuredpath
and execute the (file format DDL) statement whose tagged name maps to the value provided to itsfile_format
argumentAn error will be thrown during the creation of the
Table
if theScript
associated with the configuredpath
does not contain a statement whose tagged name matches the value offile_format
or if an error is raised when the file is parsedBypassed by creating the
Table
with:snowmobile.Table(validate_format=False, **kwargs)
Dimensional compatibility checks between
df
and the table being loaded intoBypassed by creating the
Table
with:snowmobile.Table(validate_table=False, **kwargs)
Coercing column names of
df
into a generic database standard prior to loading, including de-duplication of field names when applicableArgument or configuration based handling of action to take if table being loaded into already exists (respectively) via the
if_exists
argument tosnowmobile.Table()
or its associated section insnowmobile.toml
; valid values are replace, truncate, append, fail
Module Contents¶
Classes¶
Constructed with a |
- class
snowmobile.core.table.
Table
(df: pandas.DataFrame, table: str, sn: Optional[Snowmobile] = None, if_exists: Optional[str] = None, as_is: bool = False, path_ddl: Optional[Union[str, Path]] = None, path_output: Optional[str, Path] = None, file_format: Optional[str] = None, incl_tmstmp: Optional[bool] = None, tmstmp_col_nm: Optional[str] = None, reformat_cols: Optional[bool] = None, validate_format: Optional[bool] = None, validate_table: Optional[bool] = None, upper_case_cols: Optional[bool] = None, lower_case_table: Optional[bool] = None, keep_local: Optional[bool] = None, on_error: Optional[str] = None, check_dupes: Optional[bool] = None, load_copy: Optional[bool] = None, **kwargs)¶ Bases:
snowmobile.core.Generic
Constructed with a
DataFrame
and a table name to load into.The
df
andtable
’s compatibility can be inspected prior to calling theTable.load()
method or by providing as_is=True` when instantiating the object; the latter will kick off the loading process invoked by.load()
based on the parameters provided tosnowmobile.Table()
.- Parameters
df (DataFrame) – The
DataFrame
to load.table (str) – The table name to load
df
into.sn (Optional[Snowmobile]) – An instance of
Snowmobile
; can be used to load a table on a specific connection or from a specificsnowmobile.toml
file.if_exists (Optional[str]) – Action to take if
table
already exists - options are fail, replace, append, and truncate; defaults toappend
.as_is (bool) – Load
df
intotable
based on the parameters provided toTable
without further pre-inspection by the user; defaults to False.path_ddl (Optional[Path]) – Alternate path to file format DDL to use for load.
keep_local (Optional[bool]) – Keep local file that is written out as part of the bulk loading process; defaults to False.
path_output (Optional[str Path]) – Path to write output local file to; defaults to a generated file name exported in the current working directory.
file_format (Optional[str]) – The name of the file_format to use when loading
df
; defaults tosnowmobile_default_psv
.incl_tmstmp (Optional[bool]) – Include timestamp of load as part of
table
; defaults to True.tmstmp_col_nm (Optional[str]) – Name to use for load timestamp if
incl_tmstmp=True
; defaults to loaded_tmstmp.upper_case_cols (Optional[bool]) – Upper case columns of
df
when loading intotable
; defaults to True.reformat_cols (Optional[bool]) –
Reformat applicable columns of
df
to be DB-compliant; defaults to True.- Reformatting primarily entails:
Replacing spaces and special characters with underscores
De-duping consecutive special characters
De-duping repeated column names; adds an
_i
suffix to duplicate fields wherei
is the nth duplicate name for a field
validate_format (Optional[bool]) –
Validate the file format being used prior to kicking off the load; defaults to True.
- Validation entails:
Checking if the file format being used already exists based on formats accessible to the current connection
Executing DDL for the file format being used if not, pulled from the
DDL
ext-location and the statement namecreate file format~{format name}
Tip
Providing validate_format=False will speed up loading time when batch-loading into an existing table by skipping this step
validate_table (Optional[bool]) –
Perform validations of
df
againsttable
prior to kicking off the loading process; defaults to True.- Validation entails:
Checking the existence of
table
; no further validation is performed if it does not existCompares the columns of
df
to the columns oftable
and stores results for use during loading process
Note
Table validation results are used in conjunction with the
if_exists
parameter to determine the desired behavior based on the (potential) existence oftable
and its compatibility withdf
.Tip
Providing validate_table=False will speed up loading time time when batch-loading into an existing table
lower_case_table (Optional[bool]) – Lower case
table
name; defaults to False.on_error (Optional[str]) – Action to take if an exception is encountered as part of the validating or loading process - providing
on_error='c'
will continue past an exception as opposed to raising it; defaults to None meaning any exception encountered will be raisedcheck_dupes (Optional[bool]) – Check for duplicate field names in
df
; defaults to True.load_copy (Optional[bool]) – Alter and load a deep copy of
df
as opposed to thedf
in-memory as passed to the parameter; defaults to True.
-
load
(self, if_exists: Optional[str] = None, from_script: pathlib.Path = None, verbose: bool = True, **kwargs) → snowmobile.core.table.Table¶ Loads
df
intotable
.- Parameters
if_exists (Optional[str]) – Determines behavior to take if the table being loaded into already exists; defaults to append; options are replace, append, truncate, and fail
from_script (Optional[Union[Path, str]]) – Path to sql file containing custom DDL for
table
; DDL is assumed to have a valid statement name as is parsed byScript
and following the naming convention ofcreate table~TABLE
whereTABLE
is equal to the value provided to thetable
keyword argumentverbose (bool) – Verbose console output; defaults to True
-
col_diff
(self, mismatched: bool = False) → Dict[int, Tuple[str, str]]¶ Returns diff detail of local DataFrame to in-warehouse table.
-
load_statements
(self, from_script: pathlib.Path) → List[str]¶ Generates exhaustive list of the statements to execute for a given instance of loading a DataFrame.