Training¶
Data selector¶
When training models, it is common to try out different
subsets of features or subpopulations. DataSelector
allows you to define
a series of transformations on your data so you can succinctly define a
subsetting pipeline as a series of dictionaries.
-
class
sklearn_evaluation.training.
DataSelector
(*steps)¶ Subset a pandas.DataFrame by passing a series of steps
- Parameters
*steps – Steps to apply to the data sequentially (order matters). Each step must be a dictionary with a key “kind” whose value must be one of “column_drop”, “row_drop” or “column_keep”. The rest of the key-value pairs must match the signature for the corresponding Step objects
-
transform
(df, return_summary: bool = False)¶ Apply steps
- Parameters
df – Data frame to transform
return_summary – If False, the function only returns the output data frame, if True, it also returns a summary table
-
class
sklearn_evaluation.training.selector.
ColumnDrop
(names: Optional[list] = None, prefix: Optional[str] = None, suffix: Optional[str] = None, contains: Optional[str] = None, max_na_prop: Optional[float] = None)¶ Drop columns
- Parameters
names – List of columns to drop
prefix – Drop columns with this prefix (or list of)
suffix – Drop columns with this suffix (or list of)
contains – Drop columns if they contains this substring
max_na_prop – Drop columns whose proportion of NAs [0, 1] is larger than this
-
class
sklearn_evaluation.training.selector.
RowDrop
(if_nas: bool = False, query: Optional[str] = None)¶ Drop rows
- Parameters
if_nas – If True, deletes all rows where there is at leat one NA
query – Drops all rows matching the query (passed via pandas.query)
-
class
sklearn_evaluation.training.selector.
ColumnKeep
(names: Optional[list] = None, dotted_path: Optional[str] = None)¶ Subset columns
- Parameters
names – List of columns to keep