snowmobile.core.table

snowmobile.Table is a canned implementation of the Bulk Loading from a Local File System standard and is intended to provide a predictable, no-nonsense method of loading a DataFrame, df, into a table (str).


Note

Core functionality includes:

  1. Generating and executing generic DDL for df if the table doesn’t yet exist

  2. Executing DDL for the file format being used if it doesn’t yet exist in the current schema, or (optionally) specifying an alias for a file format in its file_format argument; in the case of the latter:

    • An absolute path to an independent, user-defined sql file must be specified within the external-sources.ddl field of snowmobile.toml

    • Prior to attempting the load of df, snowmobile.Table will create a Script from the configured path and execute the (file format DDL) statement whose tagged name maps to the value provided to its file_format argument

    • An error will be thrown during the creation of the Table if the Script associated with the configured path does not contain a statement whose tagged name matches the value of file_format or if an error is raised when the file is parsed

    • Bypassed by creating the Table with:

      snowmobile.Table(validate_format=False, **kwargs)
      
  3. Dimensional compatibility checks between df and the table being loaded into

    • Bypassed by creating the Table with:

      snowmobile.Table(validate_table=False, **kwargs)
      
  4. Coercing column names of df into a generic database standard prior to loading, including de-duplication of field names when applicable

  5. Argument or configuration based handling of action to take if table being loaded into already exists (respectively) via the if_exists argument to snowmobile.Table() or its associated section in snowmobile.toml; valid values are replace, truncate, append, fail

Module Contents

Classes

Table

Constructed with a DataFrame and a table name to load into.

class snowmobile.core.table.Table(df: pandas.DataFrame, table: str, sn: Optional[Snowmobile] = None, if_exists: Optional[str] = None, as_is: bool = False, path_ddl: Optional[Union[str, Path]] = None, path_output: Optional[str, Path] = None, file_format: Optional[str] = None, incl_tmstmp: Optional[bool] = None, tmstmp_col_nm: Optional[str] = None, reformat_cols: Optional[bool] = None, validate_format: Optional[bool] = None, validate_table: Optional[bool] = None, upper_case_cols: Optional[bool] = None, lower_case_table: Optional[bool] = None, keep_local: Optional[bool] = None, on_error: Optional[str] = None, check_dupes: Optional[bool] = None, load_copy: Optional[bool] = None, **kwargs)

Bases: snowmobile.core.Generic

Constructed with a DataFrame and a table name to load into.

The df and table’s compatibility can be inspected prior to calling the Table.load() method or by providing as_is=True` when instantiating the object; the latter will kick off the loading process invoked by .load() based on the parameters provided to snowmobile.Table().

Parameters
  • df (DataFrame) – The DataFrame to load.

  • table (str) – The table name to load df into.

  • sn (Optional[Snowmobile]) – An instance of Snowmobile; can be used to load a table on a specific connection or from a specific snowmobile.toml file.

  • if_exists (Optional[str]) – Action to take if table already exists - options are fail, replace, append, and truncate; defaults to append.

  • as_is (bool) – Load df into table based on the parameters provided to Table without further pre-inspection by the user; defaults to False.

  • path_ddl (Optional[Path]) – Alternate path to file format DDL to use for load.

  • keep_local (Optional[bool]) – Keep local file that is written out as part of the bulk loading process; defaults to False.

  • path_output (Optional[str Path]) – Path to write output local file to; defaults to a generated file name exported in the current working directory.

  • file_format (Optional[str]) – The name of the file_format to use when loading df; defaults to snowmobile_default_psv.

  • incl_tmstmp (Optional[bool]) – Include timestamp of load as part of table; defaults to True.

  • tmstmp_col_nm (Optional[str]) – Name to use for load timestamp if incl_tmstmp=True; defaults to loaded_tmstmp.

  • upper_case_cols (Optional[bool]) – Upper case columns of df when loading into table; defaults to True.

  • reformat_cols (Optional[bool]) –

    Reformat applicable columns of df to be DB-compliant; defaults to True.

    Reformatting primarily entails:
    • Replacing spaces and special characters with underscores

    • De-duping consecutive special characters

    • De-duping repeated column names; adds an _i suffix to duplicate fields where i is the nth duplicate name for a field

  • validate_format (Optional[bool]) –

    Validate the file format being used prior to kicking off the load; defaults to True.

    Validation entails:
    • Checking if the file format being used already exists based on formats accessible to the current connection

    • Executing DDL for the file format being used if not, pulled from the DDL ext-location and the statement name create file format~{format name}

    Tip

    Providing validate_format=False will speed up loading time when batch-loading into an existing table by skipping this step

  • validate_table (Optional[bool]) –

    Perform validations of df against table prior to kicking off the loading process; defaults to True.

    Validation entails:
    • Checking the existence of table; no further validation is performed if it does not exist

    • Compares the columns of df to the columns of table and stores results for use during loading process

    Note

    Table validation results are used in conjunction with the if_exists parameter to determine the desired behavior based on the (potential) existence of table and its compatibility with df.

    Tip

    Providing validate_table=False will speed up loading time time when batch-loading into an existing table

  • lower_case_table (Optional[bool]) – Lower case table name; defaults to False.

  • on_error (Optional[str]) – Action to take if an exception is encountered as part of the validating or loading process - providing on_error='c' will continue past an exception as opposed to raising it; defaults to None meaning any exception encountered will be raised

  • check_dupes (Optional[bool]) – Check for duplicate field names in df; defaults to True.

  • load_copy (Optional[bool]) – Alter and load a deep copy of df as opposed to the df in-memory as passed to the parameter; defaults to True.

db_responses

Responses from database during loading process.

Type

Dict[str, str]

loaded

Table was loaded successfully.

Type

bool

load(self, if_exists: Optional[str] = None, from_script: pathlib.Path = None, verbose: bool = True, **kwargs)snowmobile.core.table.Table

Loads df into table.

Parameters
  • if_exists (Optional[str]) – Determines behavior to take if the table being loaded into already exists; defaults to append; options are replace, append, truncate, and fail

  • from_script (Optional[Union[Path, str]]) – Path to sql file containing custom DDL for table; DDL is assumed to have a valid statement name as is parsed by Script and following the naming convention of create table~TABLE where TABLE is equal to the value provided to the table keyword argument

  • verbose (bool) – Verbose console output; defaults to True

Returns (Table):

The Table after attempting load of df into table; a successful load can be verified by inspecting loaded

property exists(self)bool

Indicates if the target table exists.

col_diff(self, mismatched: bool = False)Dict[int, Tuple[str, str]]

Returns diff detail of local DataFrame to in-warehouse table.

property cols_match(self)bool

Indicates if columns match between DataFrame and table.

load_statements(self, from_script: pathlib.Path)List[str]

Generates exhaustive list of the statements to execute for a given instance of loading a DataFrame.

to_local(self, quote_all: bool = True)None

Export to local file via configuration in snowmobile.toml.

property tm_load(self)int

Seconds elapsed during loading.

property tm_validate_load(self)int

Seconds elapsed during validation.

property tm_total(self)int

Total seconds elapsed for load.

validate(self, if_exists: str)None

Validates load based on current state through a variety of operations.

Parameters

if_exists (str) – Desired behavior if table already exists; intended to be passed in from table.load() by default.