snowmobile.core.snowframe¶
DataFrame extensions; primarily includes comparison operators.
Module Contents¶
Classes¶
Extends a |
- class
snowmobile.core.snowframe.SnowFrame(df: pandas.DataFrame)¶ Bases:
snowmobile.core.GenericExtends a
DataFramewith a.snfentry point.Returns list of tuples containing column pairs that are common between two DataFrames.
- static
series_max_diff_abs(col1: pandas.Series, col2: pandas.Series, tolerance: float) → bool¶ Determines if the max absolute difference between two
pandas.Seriesis within a tolerance level.
- static
series_max_diff_rel(col1: pandas.Series, col2: pandas.Series, tolerance: float) → bool¶ Determines if the maximum relative difference between two
pandas.Seriesis within a tolerance level.
-
df_max_diff_abs(self, df2: pandas.DataFrame, tolerance: float) → bool¶ Determines if the maximum absolute difference between any value in the shared columns of 2 DataFrames is within a tolerance level.
-
df_max_diff_rel(self, df2: pandas.DataFrame, tolerance: float) → bool¶ Determines if the maximum relative difference between any value in the shared columns of 2 DataFrames is within a tolerance level.
-
df_diff(self, df2: pandas.DataFrame, abs_tol: Optional[float] = None, rel_tol: Optional[float] = None) → bool¶ Determines if the column-wise difference between two DataFrames is within a relative or absolute tolerance level.
Note
df1anddf2are assumed to have a shared, pre-defined index.Exactly one of
abs_tolandrel_tolis expected to be a a valid float; the other is expected to be None.If valid float values are provided for both
abs_tolandrel_tol, the outcome of the maximum absolute difference with respect toabs_tolwill be returned regardless of the value ofrel_tol.
- Parameters
- Returns (bool):
Boolean indicating whether or not difference is within tolerance.
-
partitions(self, on: str) → Dict[str, pd.DataFrame]¶ Returns a dictionary of DataFrames given a DataFrame and a partition column.
Note
The number of distinct values within
partition_oncolumn will be 1:1 with the number of partitions that are returned.The
partition_oncolumn is dropped from the partitions that are returned.The depth of a vertical concatenation of all partitions should equal the depth of the original DataFrame.
- Parameters
on (str) – The column name to use for partitioning the data.
- Returns (Dict[str, pd.DataFrame]):
Dictionary of {(str) partition_value: (pd.DataFrame) associated subset of df}
-
lower(self, col: Optional[str] = None) → pandas.DataFrame¶ Lower cases all column names or all values within col if pr.
-
upper(self, col: Optional[str] = None) → pandas.DataFrame¶ Upper cases all column names or all values within col if pr.
-
reformat(self)¶ Re-formats DataFrame’s columns via
Column.reformat().
-
append_dupe_suffix(self)¶ Adds a trailing index number ‘_i’ to duplicate column names.
-
to_list(self, col: Optional[str] = None, n: Optional[int] = None) → List¶ Succinctly retrieves a column as a list.
-
add_tmstmp(self, col_nm: Optional[str] = None) → pandas.DataFrame¶ Adds a column containing the current timestamp to a DataFrame.
- Parameters
col_nm (str) – Name for column; defaults to LOADED_TMSTMP.
- property
original(self) → pandas.DataFrame¶ Returns the DataFrame in its original form (drops columns added by
SnowFrameand reverts to original column names).
-
cols_matching(self, patterns: List[str], ignore_patterns: List[str] = None) → List[str]¶ Returns a list of columns given a list of patterns to find.
- Parameters
- Returns (List[str]):
List of columns found/excluded.