snowmobile.core.snowframe
¶
DataFrame
extensions; primarily includes comparison operators.
Module Contents¶
Classes¶
Extends a |
- class
snowmobile.core.snowframe.
SnowFrame
(df: pandas.DataFrame)¶ Bases:
snowmobile.core.Generic
Extends a
DataFrame
with a.snf
entry point.Returns list of tuples containing column pairs that are common between two DataFrames.
- static
series_max_diff_abs
(col1: pandas.Series, col2: pandas.Series, tolerance: float) → bool¶ Determines if the max absolute difference between two
pandas.Series
is within a tolerance level.
- static
series_max_diff_rel
(col1: pandas.Series, col2: pandas.Series, tolerance: float) → bool¶ Determines if the maximum relative difference between two
pandas.Series
is within a tolerance level.
-
df_max_diff_abs
(self, df2: pandas.DataFrame, tolerance: float) → bool¶ Determines if the maximum absolute difference between any value in the shared columns of 2 DataFrames is within a tolerance level.
-
df_max_diff_rel
(self, df2: pandas.DataFrame, tolerance: float) → bool¶ Determines if the maximum relative difference between any value in the shared columns of 2 DataFrames is within a tolerance level.
-
df_diff
(self, df2: pandas.DataFrame, abs_tol: Optional[float] = None, rel_tol: Optional[float] = None) → bool¶ Determines if the column-wise difference between two DataFrames is within a relative or absolute tolerance level.
Note
df1
anddf2
are assumed to have a shared, pre-defined index.Exactly one of
abs_tol
andrel_tol
is expected to be a a valid float; the other is expected to be None.If valid float values are provided for both
abs_tol
andrel_tol
, the outcome of the maximum absolute difference with respect toabs_tol
will be returned regardless of the value ofrel_tol
.
- Parameters
- Returns (bool):
Boolean indicating whether or not difference is within tolerance.
-
partitions
(self, on: str) → Dict[str, pd.DataFrame]¶ Returns a dictionary of DataFrames given a DataFrame and a partition column.
Note
The number of distinct values within
partition_on
column will be 1:1 with the number of partitions that are returned.The
partition_on
column is dropped from the partitions that are returned.The depth of a vertical concatenation of all partitions should equal the depth of the original DataFrame.
- Parameters
on (str) – The column name to use for partitioning the data.
- Returns (Dict[str, pd.DataFrame]):
Dictionary of {(str) partition_value: (pd.DataFrame) associated subset of df}
-
lower
(self, col: Optional[str] = None) → pandas.DataFrame¶ Lower cases all column names or all values within col if pr.
-
upper
(self, col: Optional[str] = None) → pandas.DataFrame¶ Upper cases all column names or all values within col if pr.
-
reformat
(self)¶ Re-formats DataFrame’s columns via
Column.reformat()
.
-
append_dupe_suffix
(self)¶ Adds a trailing index number ‘_i’ to duplicate column names.
-
to_list
(self, col: Optional[str] = None, n: Optional[int] = None) → List¶ Succinctly retrieves a column as a list.
-
add_tmstmp
(self, col_nm: Optional[str] = None) → pandas.DataFrame¶ Adds a column containing the current timestamp to a DataFrame.
- Parameters
col_nm (str) – Name for column; defaults to LOADED_TMSTMP.
- property
original
(self) → pandas.DataFrame¶ Returns the DataFrame in its original form (drops columns added by
SnowFrame
and reverts to original column names).
-
cols_matching
(self, patterns: List[str], ignore_patterns: List[str] = None) → List[str]¶ Returns a list of columns given a list of patterns to find.
- Parameters
- Returns (List[str]):
List of columns found/excluded.