The .ipynb format is capable of storing tables and charts in a standalone file. This makes it a great choice for model evaluation reports. NotebookCollection allows you to retrieve results from previously executed notebooks to compare them.
We use papermill to execute the notebook with different parameters, we’ll train 4 models: 2 random forest, a linear regression and a support vector regression:
[3]:
# models with their corresponding parametersparams=[{'model':'sklearn.ensemble.RandomForestRegressor','params':{'n_estimators':50}},{'model':'sklearn.ensemble.RandomForestRegressor','params':{'n_estimators':100}},{'model':'sklearn.linear_model.LinearRegression','params':{'normalize':True}},{'model':'sklearn.svm.LinearSVR','params':{}}]# ids to identify each experimentids=['random_forest_1','random_forest_2','linear_regression','support_vector_regression']# output filesfiles=[f'{i}.ipynb'foriinids]# execute notebooks using papermillforf,pinzip(files,params):pm.execute_notebook('train.ipynb',output_path=f,parameters=p)
To use NotebookCollection, we pass a a list of paths, and optionally, ids for each notebook (uses paths by default).
The only requirement is that cells whose output we want to extract must have tags, each tag then becomes a key in the notebook collection. For instructions on adding tags, see this.
Extracted tables add colors to certain cells to identify the best and worst metrics. By default, it assumes that metrics are errors (smaller is better). If you are using scores (larger is better), pass scores=True, if you have both, pass a list of scores:
On each notebook, metrics outputs a data frame with a single row with mean absolute error (mae) and mean squared error (mse) as columns.
For single-row tables, a “Compare” tab shows all results at once:
[8]:
nbs['metrics']
[8]:
random_forest_1
random_forest_2
linear_regression
support_vector_regression
mae
2.195940
2.187443
3.148256
5.637194
mse
10.660879
10.239911
20.724023
43.524603
r2
0.859129
0.864692
0.726157
0.424875
mae
mse
r2
0
2.19594
10.660879
0.859129
mae
mse
r2
0
2.187443
10.239911
0.864692
mae
mse
r2
0
3.148256
20.724023
0.726157
mae
mse
r2
0
5.637194
43.524603
0.424875
We can see that the second random forest is performing the best in both metrics.
river contains a multi-row table where with error metrics broken down by the CHAS indicator feature. Multi-row tables do not display the “Compare” tab:
[9]:
nbs['river']
[9]:
mae
mse
r2
CHAS
0.0
2.232430
11.068710
0.858899
1.0
1.555333
3.501173
0.868473
mae
mse
r2
CHAS
0.0
2.201354
10.542307
0.865609
1.0
1.943222
4.931181
0.814753
mae
mse
r2
CHAS
0.0
3.145562
21.137297
0.730547
1.0
3.195546
13.468775
0.494026
mae
mse
r2
CHAS
0.0
5.819687
45.465308
0.420420
1.0
2.433424
9.454464
0.644829
If we only compare two notebooks, the output is a bit different:
[10]:
# only compare two notebooksnbs_two=NotebookCollection(paths=files[:2],ids=ids[:2],scores=['r2'])
Comparing single-row tables includes a diff column with the error difference between experiments. Error reductions are showed in green, increments in red:
[11]:
nbs_two['metrics']
[11]:
random_forest_1
random_forest_2
diff
diff_relative
ratio
mae
2.195940
2.187443
-0.008497
-0.39%
0.996131
mse
10.660879
10.239911
-0.420968
-4.11%
0.960513
r2
0.859129
0.864692
0.005563
0.64%
1.006475
mae
mse
r2
0
2.19594
10.660879
0.859129
mae
mse
r2
0
2.187443
10.239911
0.864692
When comparing multi-row tables, the “Compare” tab appears, showing the difference between the tables:
[12]:
nbs_two['river']
[12]:
mae
mse
r2
CHAS
0.0
-0.031076
-0.526403
0.006710
1.0
0.387889
1.430008
-0.053720
mae
mse
r2
CHAS
0.0
2.232430
11.068710
0.858899
1.0
1.555333
3.501173
0.868473
mae
mse
r2
CHAS
0.0
2.201354
10.542307
0.865609
1.0
1.943222
4.931181
0.814753
When displaying dictionaries, a “Compare” tab shows with a diff view: