https://ploomber.io/blog/experiment-tracking/

preloader
Ploomber - Build data pipelines. FAST.[?][?]

  * Home
  * Use Cases
  * Blog
  * Docs
  * GitHub
  * Pricing
  * Contact

Get Started
blog-post

Who needs MLflow when you have SQLite?

author image

Eduardo Blancas

Nov 15, 2022 - 8 Min read

Follow us:

    Follow @ploomber

Share:

 
Join our community!

demo

I spent about six years working as a data scientist and tried to use
MLflow several times (and others as well) to track my experiments;
however, every time I tried using it, I abandoned it a few days
after.

There were a few things I didn't like: it seemed too much to have to
start a web server to look at my experiments, and I found the query
feature extremely limiting (if my experiments are stored in a SQL
table, why not allow me to query them with SQL).

I also found comparing the experiments limited. I rarely have a
project where a single (or a couple of) metric(s) is enough to
evaluate a model. It's mostly a combination of metrics and evaluation
plots that I need to look at to assess a model. Furthermore, the
numbers/plots themselves have no value in isolation; I need to
benchmark them against a base model, and doing model comparisons at
this level was pretty slow from the GUI.

So I decided to develop an experiment tracker that would satisfy
those needs: no need to run a server, SQL as the query language,
flexible comparison capabilities, and no need to add custom code.
This version is the first release! So please take it for a spin and
share your feedback with us on GitHub or Slack; we want to build the
most powerful and flexible experiment tracker!

Setup

First, let's install the required packages:

pip install ploomber-engine sklearn-evaluation jupytext --upgrade

Now, let's download the sample script we'll use:

curl -O https://raw.githubusercontent.com/ploomber/posts/master/experiment-tracking/fit.py

Console output (1/1):

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1718  100  1718    0     0   4954      0 --:--:-- --:--:-- --:--:--  4965

For this example, we'll train several random forest models using
scikit-learn and then use the experiment tracker to evaluate results.
However, remember that the tool is generic enough to be used in other
data domains like bioinformatics or analytics.

Let's create our grid of parameters:

from ploomber_engine.tracking import track_execution
from sklearn.model_selection import ParameterGrid

grid = ParameterGrid(
    dict(
        n_estimators=[5, 10, 15, 25, 50, 100],
        model_type=["sklearn.ensemble.RandomForestClassifier"],
        max_depth=[5, 10, None],
        criterion=["gini", "entropy", "log_loss"],
    )
)

Let's now execute the script multiple times, one per set of
parameters, and store the results in the experiments.db SQLite
database:

for idx, p in enumerate(grid):
    if (idx + 1) % 10 == 0:
        print(f"Executed {idx + 1}/{len(grid)} so far...")

    track_execution("fit.py", parameters=p, quiet=True, database="experiments.db")

Console output (1/1):

Executed 10/54 so far...
Executed 20/54 so far...
Executed 30/54 so far...
Executed 40/54 so far...
Executed 50/54 so far...

Note: we're running a Python script, but Jupyter notebooks (.ipynb)
are also supported.

If you prefer so, you can also execute scripts from the terminal:

python -m ploomber_engine.tracking fit.py \
    -p n_estimators=100,model_type=sklearn.ensemble.RandomForestClassifier,max_depth=10

Experiment tracking with no extra code

If you look at the script's source code, you'll see that there are no
extra imports or calls to log the experiments, it's a vanilla Python
script.

The tracker runs the code, and it logs everything that meets any of
the following conditions:

1. A line that contains a variable name by itself

accuracy = accuracy_score(y_test, y_pred)
accuracy

2. A line that calls a function with no variable assignment

plot.confusion_matrix(y_test, y_pred)

Note that logs happen in groups (not line-by-line). A group is
defined as a contiguous set of lines delimited with line breaks:

# first group
accuracy = accuracy_score(y_test, y_pred)
accuracy

# second group
plot.confusion_matrix(y_test, y_pred)

Regular Python objects are supported (numbers, strings, lists,
dictionaries), and any object with a rich representation will also
work (e.g., matplotlib charts).

Querying experiments with SQL

After finishing executing the experiments, we can initialize our
database (experiments.db) and explore the results:

from sklearn_evaluation import SQLiteTracker

tracker = SQLiteTracker("experiments.db")

Since this is a SQLite database, we can use SQL to query our data
using the .query method. By default, we get a pandas.DataFrame:

df = tracker.query(
    """
SELECT *
FROM experiments
LIMIT 5
"""
)

type(df)

Console output (1/1):

pandas.core.frame.DataFrame

df.head(3)

Console output (1/1):

+-------------------------------------------------------------------+
|        |   created    |            parameters             |comment|
|--------+--------------+-----------------------------------+-------|
|  uuid  |              |                                   |       |
|--------+--------------+-----------------------------------+-------|
|15b173ac|2022-11-15    |{"criterion": "gini", "max_depth": |None   |
|        |20:09:15      |5, "model_t...                     |       |
|--------+--------------+-----------------------------------+-------|
|f30a65f0|2022-11-15    |{"criterion": "gini", "max_depth": |None   |
|        |20:09:17      |5, "model_t...                     |       |
|--------+--------------+-----------------------------------+-------|
|85684951|2022-11-15    |{"criterion": "gini", "max_depth": |None   |
|        |20:09:18      |5, "model_t...                     |       |
+-------------------------------------------------------------------+

The logged parameters are stored in the parameters column. This
column is a JSON object, and we can inspect the JSON keys with the
.get_parameters_keys() method:

tracker.get_parameters_keys()

Console output (1/1):

['accuracy',
 'classification_report',
 'confusion_matrix',
 'criterion',
 'f1',
 'max_depth',
 'model_type',
 'n_estimators',
 'precision',
 'precision_recall',
 'recall']

To get a sample query, we can use the .get_sample_query() method:

print(tracker.get_sample_query())

Console output (1/1):

SELECT
    uuid,
    json_extract(parameters, '$.accuracy') as accuracy,
    json_extract(parameters, '$.classification_report') as classification_report,
    json_extract(parameters, '$.confusion_matrix') as confusion_matrix,
    json_extract(parameters, '$.criterion') as criterion,
    json_extract(parameters, '$.f1') as f1,
    json_extract(parameters, '$.max_depth') as max_depth,
    json_extract(parameters, '$.model_type') as model_type,
    json_extract(parameters, '$.n_estimators') as n_estimators,
    json_extract(parameters, '$.precision') as precision,
    json_extract(parameters, '$.precision_recall') as precision_recall,
    json_extract(parameters, '$.recall') as recall
    FROM experiments
LIMIT 10

Let's perform a simple query that extracts our metrics, sorts by F1
score, and returns the top three experiments:

tracker.query(
    """
SELECT
    uuid,
    json_extract(parameters, '$.f1') as f1,
    json_extract(parameters, '$.accuracy') as accuracy,
    json_extract(parameters, '$.criterion') as criterion,
    json_extract(parameters, '$.n_estimators') as n_estimators,
    json_extract(parameters, '$.precision') as precision,
    json_extract(parameters, '$.recall') as recall
FROM experiments
ORDER BY f1 DESC
LIMIT 3
""",
    as_frame=False,
    render_plots=False,
)

Console output (1/1):

  uuid      f1    accuracy criterion n_estimators precision  recall
efa894dd 0.806188 0.798788 gini      100          0.774103  0.841048
ad8fb523 0.804184 0.795758 entropy   100          0.768889  0.842875
93154ceb 0.800583 0.792727 log_loss  100          0.767897  0.836175

Rendering plots

Plots are essential for evaluating our analysis, and the experiment
tracker can log them automatically. So let's pull the top 2
experiments (by F1 score) and get their confusion matrices,
precision-recall curves, and classification reports. Note that we
passed as_frame=False and render_plots=True to the .query method:

top_two = tracker.query(
    """
SELECT
    uuid,
    json_extract(parameters, '$.f1') as f1,
    json_extract(parameters, '$.confusion_matrix') as confusion_matrix,
    json_extract(parameters, '$.precision_recall') as precision_recall,
    json_extract(parameters, '$.classification_report') as classification_report
FROM experiments
ORDER BY f1 DESC
LIMIT 2
""",
    as_frame=False,
    render_plots=True,
)

top_two

Console output (1/1):

  uuid      f1    confusion_matrix precision_recall classification_report
efa894dd 0.806188 [B06AbrKCJy]     [88AMuXLgAA]     []
ad8fb523 0.804184 [wE6YYTxLSX]     [Dx6sHSFWNL]     []

Comparing plots

If you want to zoom into a couple of experiment plots, you can select
one and create a tab view. First, let's retrieve the confusion
matrices from the top 2 experiments:

top_two.get("confusion_matrix")

Console output (1/1):

efa894ddad8fb523
[B06AbrKCJy]
[wE6YYTxLSX]

The buttons in the top bar allow us to switch between the two plots
of the selected experiments. We can do the same with the
precision-recall curve:

top_two.get("precision_recall")

Console output (1/1):

efa894ddad8fb523
[88AMuXLgAA]
[Dx6sHSFWNL]

Note that the tabs will take the value of the first column in our
query (in our case, the experiment ID). So let's switch them to show
the F1 score via the index_by argument:

top_two.get("classification_report", index_by="f1")

Console output (1/1):

0.8061880.804184
[]
[]

Aggregation and plotting

Using SQL gives us power and flexibility to answer sophisticated
questions about our experiments.

For example, let's say we want to know the effect of the number of
trees (n_estimators) on our metrics. We can quickly write a SQL that
aggregates by n_estimators and computes the mean of all our metrics;
then, we can plot the values.

First, let's run the query. Note that we're passing as_frame=True so
we get a pandas.DataFrame we can manipulate:

df = tracker.query(
    """
SELECT
    json_extract(parameters, '$.n_estimators') as n_estimators,
    AVG(json_extract(parameters, '$.accuracy')) as accuracy,
    AVG(json_extract(parameters, '$.precision')) as precision,
    AVG(json_extract(parameters, '$.recall')) as recall,
    AVG(json_extract(parameters, '$.f1')) as f1
FROM experiments
GROUP BY n_estimators
""",
    as_frame=True,
).set_index("n_estimators")

df

Console output (1/1):

+-------------------------------------------------+
|            |accuracy|precision| recall |   f1   |
|------------+--------+---------+--------+--------|
|n_estimators|        |         |        |        |
|------------+--------+---------+--------+--------|
|     5      |0.736296|0.714476 |0.785966|0.747860|
|------------+--------+---------+--------+--------|
|     10     |0.751178|0.731587 |0.798011|0.761641|
|------------+--------+---------+--------+--------|
|     15     |0.758653|0.726269 |0.831303|0.774417|
|------------+--------+---------+--------+--------|
|     25     |0.765354|0.730959 |0.839965|0.781178|
|------------+--------+---------+--------+--------|
|     50     |0.770168|0.735536 |0.844296|0.785606|
|------------+--------+---------+--------+--------|
|    100     |0.773805|0.737420 |0.850386|0.789462|
+-------------------------------------------------+

Now, let's create the plot:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

for metric in ["accuracy", "f1", "recall", "precision"]:
    df[metric].plot(ax=ax, marker="o", style="--")

ax.legend()
ax.grid()
ax.set_title("Effect of increasing n_estimators")
ax.set_ylabel("Metric")
_ = ax.set_xticks(df.index)

Console output (1/1):

30-0

We can see that increasing n_estimators drives our metrics up;
however, after 25 estimators, the improvements get increasingly
small. Considering that a larger number of n_estimators will increase
the runtime of each experiment, we can use this plot to inform our
following experiments, cap n_estimators, and focus on other
parameters.

Closing remarks

In this blog post, we introduced a new experiment tracker that does
not require any server running in the background or extra code and
gives you the power of SQL to query and compare your experiments. Let
us know what you think by opening an issue on the GitHub repository
or sending us a message on Slack!

---------------------------------------------------------------------

Found an error? Click here to let us know.

Please enable JavaScript to view the comments powered by Disqus.
comments powered by Disqus

Recent Articles

blog-post

Evaluating clustering models with the elbow curve

Clustering techniques allow us to identify subgroups (or clusters) in
data such that data points in each same cluster ...

author image

Rodolfo Ferro

Nov 14, 2022 - 8 Min read
blog-post

Can I trust my model's probabilities? A deep dive into probability
calibration

Suppose you have a binary classifier and two observations; the model
scores them as 0.6 and 0.99, respectively. Is there ...

author image

Eduardo Blancas

Nov 06, 2022 - 14 Min read
This site uses cookies. By continuing to use this website, you agree
to their use.I Accept

Try Ploomber Cloud Now

Get Started
* 
Ploomber - Build data pipelines. FAST.[?][?]

Socials

  * contact@ploomber.io

  *  
  *  
  *  
  *  
  *  
  *  

Quick Links

  * Use Cases
  * Blog
  * Docs
  * GitHub
  * Pricing
  * Careers
  * Terms & Conditions
  * Privacy

Location & Contact

  * 69 Charlton Street, New York, NY 10014
  * +19179089460

Developed By Ploomber 2022