Tracking Machine Learning experiments¶
SQLiteTracker
provides a simple yet powerful way to track ML experiments using a SQLite database.
[1]:
from sklearn_evaluation import SQLiteTracker
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
[4]:
iris = load_iris(as_frame=True)
X, y = iris['data'], iris['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
models = [RandomForestRegressor(), LinearRegression(), Lasso()]
[5]:
for m in models:
model = type(m).__name__
print(f'Fitting {model}')
# .new() returns a uuid and creates an entry in the db
uuid = tracker.new()
m.fit(X_train, y_train)
y_pred = m.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
# add data with the .update(uuid, {'param': 'value'}) method
tracker.update(uuid, {'mse': mse, 'model': model, **m.get_params()})
Fitting RandomForestRegressor
Fitting LinearRegression
Fitting Lasso
Or use .insert(uuid, params)
to supply your own ID:
[6]:
svr = SVR()
svr.fit(X_train, y_train)
y_pred = svr.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
tracker.insert('my_uuid', {'mse': mse, 'model': type(svr).__name__, **svr.get_params()})
tracker
shows last experiments by default:
[7]:
tracker
[7]:
SQLiteTracker
uuid | created | parameters | comment |
---|---|---|---|
d0c38a7cf23141409f272b92fe38fac0 | 2021-03-14 15:55:47 | {"mse": 0.010483999999999999, "model": "RandomForestRegressor", "bootstrap": true, "ccp_alpha": 0.0, "criterion": "mse", "max_depth": null, "max_features": "auto", "max_leaf_nodes": null, "max_samples": null, "min_impurity_decrease": 0.0, "min_impurity_split": null, "min_samples_leaf": 1, "min_samples_split": 2, "min_weight_fraction_leaf": 0.0, "n_estimators": 100, "n_jobs": null, "oob_score": false, "random_state": null, "verbose": 0, "warm_start": false} | |
54c5022230674bc390c18aa2a96a85a9 | 2021-03-14 15:55:47 | {"mse": 0.042600341137617896, "model": "LinearRegression", "copy_X": true, "fit_intercept": true, "n_jobs": null, "normalize": false, "positive": false} | |
6a1bf08c756b4091b6f27b9598ef2b49 | 2021-03-14 15:55:47 | {"mse": 0.4317655183287657, "model": "Lasso", "alpha": 1.0, "copy_X": true, "fit_intercept": true, "max_iter": 1000, "normalize": false, "positive": false, "precompute": false, "random_state": null, "selection": "cyclic", "tol": 0.0001, "warm_start": false} | |
my_uuid | 2021-03-14 15:55:47 | {"mse": 0.030419125413621413, "model": "SVR", "C": 1.0, "cache_size": 200, "coef0": 0.0, "degree": 3, "epsilon": 0.1, "gamma": "scale", "kernel": "rbf", "max_iter": -1, "shrinking": true, "tol": 0.001, "verbose": false} |
(Most recent experiments)
Querying experiments¶
[8]:
ordered = tracker.query("""
SELECT uuid,
json_extract(parameters, '$.model') AS model,
json_extract(parameters, '$.mse') AS mse
FROM experiments
ORDER BY json_extract(parameters, '$.mse') ASC
""")
ordered
[8]:
model | mse | |
---|---|---|
uuid | ||
d0c38a7cf23141409f272b92fe38fac0 | RandomForestRegressor | 0.010484 |
my_uuid | SVR | 0.030419 |
54c5022230674bc390c18aa2a96a85a9 | LinearRegression | 0.042600 |
6a1bf08c756b4091b6f27b9598ef2b49 | Lasso | 0.431766 |
The query method returns a data frame with “uuid” as the index:
[9]:
type(ordered)
[9]:
pandas.core.frame.DataFrame
Adding comments¶
[10]:
tracker.comment(ordered.index[0], 'Best performing experiment')
User tracker[uuid]
to get a single experiment:
[11]:
tracker[ordered.index[0]]
[11]:
created | parameters | comment | |
---|---|---|---|
uuid | |||
d0c38a7cf23141409f272b92fe38fac0 | 2021-03-14 15:55:47 | {"mse": 0.010483999999999999, "model": "Random... | Best performing experiment |
Getting recent experiments¶
The recent method also returns a data frame:
[12]:
df = tracker.recent()
df
[12]:
created | parameters | comment | |
---|---|---|---|
uuid | |||
d0c38a7cf23141409f272b92fe38fac0 | 2021-03-14 15:55:47 | {"mse": 0.010483999999999999, "model": "Random... | Best performing experiment |
54c5022230674bc390c18aa2a96a85a9 | 2021-03-14 15:55:47 | {"mse": 0.042600341137617896, "model": "Linear... | None |
6a1bf08c756b4091b6f27b9598ef2b49 | 2021-03-14 15:55:47 | {"mse": 0.4317655183287657, "model": "Lasso", ... | None |
my_uuid | 2021-03-14 15:55:47 | {"mse": 0.030419125413621413, "model": "SVR", ... | None |
Pass normalize=True
to convert the nested JSON dictionary into columns:
[13]:
df = tracker.recent(normalize=True)
df
[13]:
created | mse | model | bootstrap | ccp_alpha | criterion | max_depth | max_features | max_leaf_nodes | max_samples | ... | tol | C | cache_size | coef0 | degree | epsilon | gamma | kernel | shrinking | comment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uuid | |||||||||||||||||||||
d0c38a7cf23141409f272b92fe38fac0 | 2021-03-14 15:55:47 | 0.010484 | RandomForestRegressor | True | 0.0 | mse | NaN | auto | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Best performing experiment |
54c5022230674bc390c18aa2a96a85a9 | 2021-03-14 15:55:47 | 0.042600 | LinearRegression | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None |
6a1bf08c756b4091b6f27b9598ef2b49 | 2021-03-14 15:55:47 | 0.431766 | Lasso | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0001 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None |
my_uuid | 2021-03-14 15:55:47 | 0.030419 | SVR | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | 0.0010 | 1.0 | 200.0 | 0.0 | 3.0 | 0.1 | scale | rbf | True | None |
4 rows × 39 columns
[14]:
# delete our example database
from pathlib import Path
Path('my_experiments.db').unlink()
[ ]: