Training#

Data selector#

When training models, it is common to try out different subsets of features or subpopulations. DataSelector allows you to define a series of transformations on your data so you can succinctly define a subsetting pipeline as a series of dictionaries.

class sklearn_evaluation.training.DataSelector(*steps)#

Subset a pandas.DataFrame by passing a series of steps

Parameters: *steps – Steps to apply to the data sequentially (order matters). Each step must be a dictionary with a key “kind” whose value must be one of “column_drop”, “row_drop” or “column_keep”. The rest of the key-value pairs must match the signature for the corresponding Step objects

transform(df, return_summary: bool = False)#

Apply steps

Parameters

df – Data frame to transform
return_summary – If False, the function only returns the output data frame, if True, it also returns a summary table

ColumnDrop#

class sklearn_evaluation.training.selector.ColumnDrop(names: list = None, prefix: str = None, suffix: str = None, contains: str = None, max_na_prop: float = None)#

Drop columns

Parameters

names – List of columns to drop
prefix – Drop columns with this prefix (or list of)
suffix – Drop columns with this suffix (or list of)
contains – Drop columns if they contains this substring
max_na_prop – Drop columns whose proportion of NAs [0, 1] is larger than this

RowDrop#

class sklearn_evaluation.training.selector.RowDrop(if_nas: bool = False, query: str = None)#

Drop rows

Parameters

if_nas – If True, deletes all rows where there is at leat one NA
query – Drops all rows matching the query (passed via pandas.query)

ColumnKeep#

class sklearn_evaluation.training.selector.ColumnKeep(names: Optional[list] = None, dotted_path: Optional[str] = None)#

Subset columns

Parameters: names – List of columns to keep

sklearn-evaluation

Training

Contents

Training#

Data selector#

ColumnDrop#

RowDrop#

ColumnKeep#