snowmobile.core.table¶
snowmobile.Table is a canned implementation of the Bulk Loading from a Local File System standard and is intended to provide a predictable, no-nonsense method of loading a DataFrame, df, into a table (str).
Note
Core functionality includes:
Generating and executing generic DDL for
dfif the table doesn’t yet existExecuting DDL for the file format being used if it doesn’t yet exist in the current schema, or (optionally) specifying an alias for a file format in its
file_formatargument; in the case of the latter:An absolute
pathto an independent, user-defined sql file must be specified within the external-sources.ddl field ofsnowmobile.tomlPrior to attempting the load of
df,snowmobile.Tablewill create aScriptfrom the configuredpathand execute the (file format DDL) statement whose tagged name maps to the value provided to itsfile_formatargumentAn error will be thrown during the creation of the
Tableif theScriptassociated with the configuredpathdoes not contain a statement whose tagged name matches the value offile_formator if an error is raised when the file is parsedBypassed by creating the
Tablewith:snowmobile.Table(validate_format=False, **kwargs)
Dimensional compatibility checks between
dfand the table being loaded intoBypassed by creating the
Tablewith:snowmobile.Table(validate_table=False, **kwargs)
Coercing column names of
dfinto a generic database standard prior to loading, including de-duplication of field names when applicableArgument or configuration based handling of action to take if table being loaded into already exists (respectively) via the
if_existsargument tosnowmobile.Table()or its associated section insnowmobile.toml; valid values are replace, truncate, append, fail
Module Contents¶
Classes¶
Constructed with a |
- class
snowmobile.core.table.Table(df: pandas.DataFrame, table: str, sn: Optional[Snowmobile] = None, if_exists: Optional[str] = None, as_is: bool = False, path_ddl: Optional[Union[str, Path]] = None, path_output: Optional[str, Path] = None, file_format: Optional[str] = None, incl_tmstmp: Optional[bool] = None, tmstmp_col_nm: Optional[str] = None, reformat_cols: Optional[bool] = None, validate_format: Optional[bool] = None, validate_table: Optional[bool] = None, upper_case_cols: Optional[bool] = None, lower_case_table: Optional[bool] = None, keep_local: Optional[bool] = None, on_error: Optional[str] = None, check_dupes: Optional[bool] = None, load_copy: Optional[bool] = None, **kwargs)¶ Bases:
snowmobile.core.GenericConstructed with a
DataFrameand a table name to load into.The
dfandtable’s compatibility can be inspected prior to calling theTable.load()method or by providing as_is=True` when instantiating the object; the latter will kick off the loading process invoked by.load()based on the parameters provided tosnowmobile.Table().- Parameters
df (DataFrame) – The
DataFrameto load.table (str) – The table name to load
dfinto.sn (Optional[Snowmobile]) – An instance of
Snowmobile; can be used to load a table on a specific connection or from a specificsnowmobile.tomlfile.if_exists (Optional[str]) – Action to take if
tablealready exists - options are fail, replace, append, and truncate; defaults toappend.as_is (bool) – Load
dfintotablebased on the parameters provided toTablewithout further pre-inspection by the user; defaults to False.path_ddl (Optional[Path]) – Alternate path to file format DDL to use for load.
keep_local (Optional[bool]) – Keep local file that is written out as part of the bulk loading process; defaults to False.
path_output (Optional[str Path]) – Path to write output local file to; defaults to a generated file name exported in the current working directory.
file_format (Optional[str]) – The name of the file_format to use when loading
df; defaults tosnowmobile_default_psv.incl_tmstmp (Optional[bool]) – Include timestamp of load as part of
table; defaults to True.tmstmp_col_nm (Optional[str]) – Name to use for load timestamp if
incl_tmstmp=True; defaults to loaded_tmstmp.upper_case_cols (Optional[bool]) – Upper case columns of
dfwhen loading intotable; defaults to True.reformat_cols (Optional[bool]) –
Reformat applicable columns of
dfto be DB-compliant; defaults to True.- Reformatting primarily entails:
Replacing spaces and special characters with underscores
De-duping consecutive special characters
De-duping repeated column names; adds an
_isuffix to duplicate fields whereiis the nth duplicate name for a field
validate_format (Optional[bool]) –
Validate the file format being used prior to kicking off the load; defaults to True.
- Validation entails:
Checking if the file format being used already exists based on formats accessible to the current connection
Executing DDL for the file format being used if not, pulled from the
DDLext-location and the statement namecreate file format~{format name}
Tip
Providing validate_format=False will speed up loading time when batch-loading into an existing table by skipping this step
validate_table (Optional[bool]) –
Perform validations of
dfagainsttableprior to kicking off the loading process; defaults to True.- Validation entails:
Checking the existence of
table; no further validation is performed if it does not existCompares the columns of
dfto the columns oftableand stores results for use during loading process
Note
Table validation results are used in conjunction with the
if_existsparameter to determine the desired behavior based on the (potential) existence oftableand its compatibility withdf.Tip
Providing validate_table=False will speed up loading time time when batch-loading into an existing table
lower_case_table (Optional[bool]) – Lower case
tablename; defaults to False.on_error (Optional[str]) – Action to take if an exception is encountered as part of the validating or loading process - providing
on_error='c'will continue past an exception as opposed to raising it; defaults to None meaning any exception encountered will be raisedcheck_dupes (Optional[bool]) – Check for duplicate field names in
df; defaults to True.load_copy (Optional[bool]) – Alter and load a deep copy of
dfas opposed to thedfin-memory as passed to the parameter; defaults to True.
-
load(self, if_exists: Optional[str] = None, from_script: pathlib.Path = None, verbose: bool = True, **kwargs) → snowmobile.core.table.Table¶ Loads
dfintotable.- Parameters
if_exists (Optional[str]) – Determines behavior to take if the table being loaded into already exists; defaults to append; options are replace, append, truncate, and fail
from_script (Optional[Union[Path, str]]) – Path to sql file containing custom DDL for
table; DDL is assumed to have a valid statement name as is parsed byScriptand following the naming convention ofcreate table~TABLEwhereTABLEis equal to the value provided to thetablekeyword argumentverbose (bool) – Verbose console output; defaults to True
-
col_diff(self, mismatched: bool = False) → Dict[int, Tuple[str, str]]¶ Returns diff detail of local DataFrame to in-warehouse table.
-
load_statements(self, from_script: pathlib.Path) → List[str]¶ Generates exhaustive list of the statements to execute for a given instance of loading a DataFrame.