:orphan: :mod:`snowmobile.core.snowframe` ================================ .. py:module:: snowmobile.core.snowframe .. autoapi-nested-parse:: :class:`~pandas.DataFrame` extensions; primarily includes comparison operators. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: snowmobile.core.snowframe.SnowFrame .. class:: SnowFrame(df: pandas.DataFrame) Bases: :class:`snowmobile.core.Generic` Extends a :class:`~pandas.DataFrame` with a ``.snf`` entry point. .. method:: shared_cols(self, df2: pandas.DataFrame) -> List[Tuple[pd.Series, pd.Series]] Returns list of tuples containing column pairs that are common between two DataFrames. .. method:: series_max_diff_abs(col1: pandas.Series, col2: pandas.Series, tolerance: float) -> bool :staticmethod: Determines if the max **absolute** difference between two :class:`pandas.Series` is within a tolerance level. .. method:: series_max_diff_rel(col1: pandas.Series, col2: pandas.Series, tolerance: float) -> bool :staticmethod: Determines if the maximum **relative** difference between two :class:`pandas.Series` is within a tolerance level. .. method:: df_max_diff_abs(self, df2: pandas.DataFrame, tolerance: float) -> bool Determines if the maximum **absolute** difference between any value in the shared columns of 2 DataFrames is within a tolerance level. .. method:: df_max_diff_rel(self, df2: pandas.DataFrame, tolerance: float) -> bool Determines if the maximum **relative** difference between any value in the shared columns of 2 DataFrames is within a tolerance level. .. method:: df_diff(self, df2: pandas.DataFrame, abs_tol: Optional[float] = None, rel_tol: Optional[float] = None) -> bool Determines if the column-wise difference between two DataFrames is within a relative **or** absolute tolerance level. .. note:: * ``df1`` and ``df2`` are assumed to have a shared, pre-defined index. * Exactly **one** of ``abs_tol`` and ``rel_tol`` is expected to be a a valid float; the other is expected to be **None**. * If valid float values are provided for both ``abs_tol`` and ``rel_tol``, the outcome of the maximum **absolute** difference with respect to ``abs_tol`` will be returned regardless of the value of ``rel_tol``. :param df2: 2nd DataFrame for comparison. :type df2: pd.DataFrame :param abs_tol: Absolute tolerance; default is None. :type abs_tol: float :param rel_tol: Relative tolerance; default is None. :type rel_tol: float Returns (bool): Boolean indicating whether or not difference is within tolerance. .. method:: partitions(self, on: str) -> Dict[(str, pd.DataFrame)] Returns a dictionary of DataFrames given a DataFrame and a partition column. .. note:: * The number of distinct values within ``partition_on`` column will be 1:1 with the number of partitions that are returned. * The ``partition_on`` column is dropped from the partitions that are returned. * The depth of a vertical concatenation of all partitions should equal the depth of the original DataFrame. :param on: The column name to use for partitioning the data. :type on: str Returns (Dict[str, pd.DataFrame]): Dictionary of {(str) partition_value: (pd.DataFrame) associated subset of df} .. method:: ddl(self, table: str) -> str Returns a string containing 'create table' DDL given a table name .. method:: lower(self, col: Optional[str] = None) -> pandas.DataFrame Lower cases all column names **or** all values within `col` if pr. .. method:: upper(self, col: Optional[str] = None) -> pandas.DataFrame Upper cases all column names **or** all values within `col` if pr. .. method:: reformat(self) Re-formats DataFrame's columns via :class:`Column.reformat()`. .. method:: append_dupe_suffix(self) Adds a trailing index number '_i' to duplicate column names. .. method:: to_list(self, col: Optional[str] = None, n: Optional[int] = None) -> List Succinctly retrieves a column as a list. :param col: Name of column. :type col: str :param n: Number of records to return; defaults to full depth of column. :type n: int .. method:: add_tmstmp(self, col_nm: Optional[str] = None) -> pandas.DataFrame Adds a column containing the current timestamp to a DataFrame. :param col_nm: Name for column; defaults to `LOADED_TMSTMP`. :type col_nm: str .. method:: original(self) -> pandas.DataFrame :property: Returns the DataFrame in its original form (drops columns added by :class:`SnowFrame` and reverts to original column names). .. method:: has_dupes(self) -> bool :property: DataFrame has duplicate column names. .. method:: cols_matching(self, patterns: List[str], ignore_patterns: List[str] = None) -> List[str] Returns a list of columns given a list of patterns to find. :param patterns: List of regex patterns to match columns on. :type patterns: List[str] :param ignore_patterns: Optional list of regex patterns to exclude. :type ignore_patterns: List[str] Returns (List[str]): List of columns found/excluded. .. method:: cols_ending(self, nm: str, ignore_patterns: Optional[List] = None) -> List[str] Returns all columns up to ``nm`` in a DataFrame. :param nm: Name of column to end index at. :type nm: str :param ignore_patterns: Optional list of regex patterns to exclude in the list that's returned; primarily used to for getting `end-index-at` list while excluding `src_description`. :type ignore_patterns: List[str] Returns (List[str]): List of column names matching criterion.