:orphan:

:mod:`snowmobile.core.snowframe`
================================

.. py:module:: snowmobile.core.snowframe

.. autoapi-nested-parse::

   :class:`~pandas.DataFrame` extensions; primarily includes comparison operators.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   snowmobile.core.snowframe.SnowFrame


.. class:: SnowFrame(df: pandas.DataFrame)


   Bases: :class:`snowmobile.core.Generic`

   Extends a :class:`~pandas.DataFrame` with a ``.snf`` entry point.

   .. method:: shared_cols(self, df2: pandas.DataFrame) -> List[Tuple[pd.Series, pd.Series]]

      Returns list of tuples containing column pairs that are common between two DataFrames.


   .. method:: series_max_diff_abs(col1: pandas.Series, col2: pandas.Series, tolerance: float) -> bool
      :staticmethod:

      Determines if the max **absolute** difference between two
      :class:`pandas.Series` is within a tolerance level.


   .. method:: series_max_diff_rel(col1: pandas.Series, col2: pandas.Series, tolerance: float) -> bool
      :staticmethod:

      Determines if the maximum **relative** difference between two
      :class:`pandas.Series` is within a tolerance level.


   .. method:: df_max_diff_abs(self, df2: pandas.DataFrame, tolerance: float) -> bool

      Determines if the maximum **absolute** difference between any value
      in the shared columns of 2 DataFrames is within a tolerance level.


   .. method:: df_max_diff_rel(self, df2: pandas.DataFrame, tolerance: float) -> bool

      Determines if the maximum **relative** difference between any value
      in the shared columns of 2 DataFrames is within a tolerance level.


   .. method:: df_diff(self, df2: pandas.DataFrame, abs_tol: Optional[float] = None, rel_tol: Optional[float] = None) -> bool

      Determines if the column-wise difference between two DataFrames is
      within a relative **or** absolute tolerance level.

      .. note::

         *   ``df1`` and ``df2`` are assumed to have a shared, pre-defined index.
         *   Exactly **one** of ``abs_tol`` and ``rel_tol`` is expected to be a
             a valid float; the other is expected to be **None**.
         *   If valid float values are provided for both ``abs_tol`` and ``rel_tol``,
             the outcome of the maximum **absolute** difference with respect to
             ``abs_tol`` will be returned regardless of the value of ``rel_tol``.

      :param df2: 2nd DataFrame for comparison.
      :type df2: pd.DataFrame
      :param abs_tol: Absolute tolerance; default is None.
      :type abs_tol: float
      :param rel_tol: Relative tolerance; default is None.
      :type rel_tol: float

      Returns (bool):
          Boolean indicating whether or not difference is within tolerance.


   .. method:: partitions(self, on: str) -> Dict[(str, pd.DataFrame)]

      Returns a dictionary of DataFrames given a DataFrame and a partition column.

      .. note::

         *   The number of distinct values within ``partition_on`` column will be
             1:1 with the number of partitions that are returned.
         *   The ``partition_on`` column is dropped from the partitions that are returned.
         *   The depth of a vertical concatenation of all partitions should equal the
             depth of the original DataFrame.

      :param on: The column name to use for partitioning the data.
      :type on: str

      Returns (Dict[str, pd.DataFrame]):
          Dictionary of {(str) partition_value: (pd.DataFrame) associated subset of df}


   .. method:: ddl(self, table: str) -> str

      Returns a string containing 'create table' DDL given a table name


   .. method:: lower(self, col: Optional[str] = None) -> pandas.DataFrame

      Lower cases all column names **or** all values within `col` if pr.


   .. method:: upper(self, col: Optional[str] = None) -> pandas.DataFrame

      Upper cases all column names **or** all values within `col` if pr.


   .. method:: reformat(self)

      Re-formats DataFrame's columns via :class:`Column.reformat()`.


   .. method:: append_dupe_suffix(self)

      Adds a trailing index number '_i' to duplicate column names.


   .. method:: to_list(self, col: Optional[str] = None, n: Optional[int] = None) -> List

      Succinctly retrieves a column as a list.

      :param col: Name of column.
      :type col: str
      :param n: Number of records to return; defaults to full depth of column.
      :type n: int


   .. method:: add_tmstmp(self, col_nm: Optional[str] = None) -> pandas.DataFrame

      Adds a column containing the current timestamp to a DataFrame.

      :param col_nm: Name for column; defaults to `LOADED_TMSTMP`.
      :type col_nm: str


   .. method:: original(self) -> pandas.DataFrame
      :property:

      Returns the DataFrame in its original form (drops columns added by
      :class:`SnowFrame` and reverts to original column names).


   .. method:: has_dupes(self) -> bool
      :property:

      DataFrame has duplicate column names.


   .. method:: cols_matching(self, patterns: List[str], ignore_patterns: List[str] = None) -> List[str]

      Returns a list of columns given a list of patterns to find.

      :param patterns: List of regex patterns to match columns on.
      :type patterns: List[str]
      :param ignore_patterns: Optional list of regex patterns to exclude.
      :type ignore_patterns: List[str]

      Returns (List[str]):
          List of columns found/excluded.


   .. method:: cols_ending(self, nm: str, ignore_patterns: Optional[List] = None) -> List[str]

      Returns all columns up to ``nm`` in a DataFrame.

      :param nm: Name of column to end index at.
      :type nm: str
      :param ignore_patterns: Optional list of regex patterns to exclude in the list that's
                              returned; primarily used to for getting `end-index-at` list
                              while excluding `src_description`.
      :type ignore_patterns: List[str]

      Returns (List[str]):
          List of column names matching criterion.