Edit on GitHub

Get Started: Experiments

⚠️ This feature is only available in DVC 2.0 ⚠️

Experiments proliferate quickly in ML projects where there are many parameters to tune or other permutations of the code. We can organize such projects and only keep what we ultimately need with dvc experiments. DVC can track experiments for you so there's no need to commit each one to Git. This way your repo doesn't become polluted with all of them. You can discard experiments once they're no longer needed.

πŸ“– See Experiment Management for more information on DVC's approach.

Running experiments

In the previous page, we learned how to tune ML pipelines and compare the changes. Let's further increase the number of features in the featurize stage to see how it compares.

dvc exp run makes it easy to change hyperparameters and run a new experiment:

$ dvc exp run --set-param featurize.max_features=3000
πŸ’‘ Expand to see what this command does.

dvc exp run is similar to dvc repro but with some added conveniences for running experiments. The --set-param (or -S) flag sets the values for parameters as a shortcut to editing params.yaml.

Check that the featurize.max_features value has been updated in params.yaml:

 featurize:
-  max_features: 1500
+  max_features: 3000

Any edits to dependencies (parameters or source code) will be reflected in the experiment run.

dvc exp diff compares experiments:

$ dvc exp diff
Path         Metric    Value    Change
scores.json  avg_prec  0.56191  0.009322
scores.json  roc_auc   0.93345  0.018087

Path         Param                   Value    Change
params.yaml  featurize.max_features  3000     1500

Queueing experiments

So far, we have been tuning the featurize stage, but there are also parameters for the train stage, which trains a random forest classifier.

These are the train parameters in params.yaml:

train:
  seed: 20170428
  n_est: 50
  min_split: 2

Let's setup experiments with different hyperparameters. We can define all the combinations we want to try without executing anything, by using the --queue flag:

$ dvc exp run --queue -S train.min_split=8
Queued experiment 'd3f6d1e' for future execution.
$ dvc exp run --queue -S train.min_split=64
Queued experiment 'f1810e0' for future execution.
$ dvc exp run --queue -S train.min_split=2 -S train.n_est=100
Queued experiment '7323ea2' for future execution.
$ dvc exp run --queue -S train.min_split=8 -S train.n_est=100
Queued experiment 'c605382' for future execution.
$ dvc exp run --queue -S train.min_split=64 -S train.n_est=100
Queued experiment '0cdee86' for future execution.

Next, run all queued experiments using --run-all (and in parallel with --jobs):

$ dvc exp run --run-all --jobs 2

Comparing many experiments

To compare all of these experiments, we need more than diff. dvc exp show compares any number of experiments in one table:

$ dvc exp show --no-timestamp
               --include-params train.n_est,train.min_split
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Experiment    ┃ avg_prec ┃ roc_auc ┃ train.n_est┃ train.min_split ┃
┑━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
β”‚ workspace     β”‚  0.56191 β”‚ 0.93345 β”‚ 50         β”‚ 2               β”‚
β”‚ master        β”‚  0.55259 β”‚ 0.91536 β”‚ 50         β”‚ 2               β”‚
β”‚ β”œβ”€β”€ exp-bfe64 β”‚  0.57833 β”‚ 0.95555 β”‚ 50         β”‚ 8               β”‚
β”‚ β”œβ”€β”€ exp-b8082 β”‚  0.59806 β”‚ 0.95287 β”‚ 50         β”‚ 64              β”‚
β”‚ β”œβ”€β”€ exp-c7250 β”‚  0.58876 β”‚ 0.94524 β”‚ 100        β”‚ 2               β”‚
β”‚ β”œβ”€β”€ exp-b9cd4 β”‚  0.57953 β”‚ 0.95732 β”‚ 100        β”‚ 8               β”‚
β”‚ β”œβ”€β”€ exp-98a96 β”‚  0.60405 β”‚  0.9608 β”‚ 100        β”‚ 64              β”‚
β”‚ └── exp-ad5b1 β”‚  0.56191 β”‚ 0.93345 β”‚ 50         β”‚ 2               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each experiment is given an arbitrary name by default (although we can specify one with dvc exp run -n.) We can see that exp-98a96 performed best among both of our metrics, with 100 estimators and a minimum of 64 samples to split a node.

See dvc exp show --help for more info on its options.

Persisting experiments

Now that we know the best parameters, let's keep that experiment and ignore the rest.

dvc exp apply rolls back the workspace

$ dvc exp apply exp-98a96
Changes for experiment 'exp-98a96' have been applied to your workspace.
πŸ’‘ Expand to see what this command does.

dvc exp apply is similar to dvc checkout but it works with experiments. DVC tracks everything in the pipeline for each experiment (parameters, metrics, dependencies, and outputs) and can later retrieve it as needed.

Check that scores.json reflects the metrics in the table above:

{ "avg_prec": 0.6040544652105823, "roc_auc": 0.9608017142900953 }

Once an experiment has been applied to the workspace, it is no different from reproducing the result without dvc exp run. Let's make it persistent in our regular pipeline by committing it in our Git branch:

$ git add dvc.lock params.yaml prc.json roc.json scores.json
$ git commit -a -m "Preserve best random forest experiment"

dvc push only uploads persistent experiments that have been committed to Git. The other experiments will not be pushed to the remote. See dvc exp push and dvc exp pull for how to share other experiments.

Cleaning up

After committing the best experiment to Git, let's take another look at the experiments table:

$ dvc exp show --no-timestamp
               --include-params train.n_est,train.min_split
┏━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Experiment ┃ avg_prec ┃ roc_auc ┃ train.n_est┃ train.min_split ┃
┑━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
β”‚ workspace  β”‚  0.60405 β”‚  0.9608 β”‚ 100        β”‚ 64              β”‚
β”‚ master     β”‚  0.60405 β”‚  0.9608 β”‚ 100        β”‚ 64              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Where did all the experiments go? By default, dvc exp show only shows experiments since the last commit, but don't worry. The experiments remain cached and can be shown or applied. For example, use -n to show experiments from the previous n commits:

$ dvc exp show -n 2 --no-timestamp
                    --include-params train.n_est,train.min_split
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Experiment    ┃ avg_prec ┃ roc_auc ┃ train.n_est┃ train.min_split ┃
┑━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
β”‚ workspace     β”‚  0.60405 β”‚  0.9608 β”‚ 100        β”‚ 64              β”‚
β”‚ master        β”‚  0.60405 β”‚  0.9608 β”‚ 100        β”‚ 64              β”‚
β”‚ 64d74b2       β”‚  0.55259 β”‚ 0.91536 β”‚ 50         β”‚ 2               β”‚
β”‚ β”œβ”€β”€ exp-bfe64 β”‚  0.57833 β”‚ 0.95555 β”‚ 50         β”‚ 8               β”‚
β”‚ β”œβ”€β”€ exp-b8082 β”‚  0.59806 β”‚ 0.95287 β”‚ 50         β”‚ 64              β”‚
β”‚ β”œβ”€β”€ exp-c7250 β”‚  0.58876 β”‚ 0.94524 β”‚ 100        β”‚ 2               β”‚
β”‚ β”œβ”€β”€ exp-98a96 β”‚  0.60405 β”‚  0.9608 β”‚ 100        β”‚ 64              β”‚
β”‚ β”œβ”€β”€ exp-b9cd4 β”‚  0.57953 β”‚ 0.95732 β”‚ 100        β”‚ 8               β”‚
β”‚ └── exp-ad5b1 β”‚  0.56191 β”‚ 0.93345 β”‚ 50         β”‚ 2               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Eventually, old experiments may clutter the experiments table.

dvc exp gc removes all references to old experiments:

$ dvc exp gc --workspace
$ dvc exp show -n 2 --no-timestamp
                    --include-params train.n_est,train.min_split
┏━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Experiment ┃ avg_prec ┃ roc_auc ┃ train.n_est┃ train.min_split ┃
┑━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
β”‚ workspace  β”‚  0.60405 β”‚  0.9608 β”‚ 100        β”‚ 64              β”‚
β”‚ master     β”‚  0.60405 β”‚  0.9608 β”‚ 100        β”‚ 64              β”‚
β”‚ 64d74b2    β”‚  0.55259 β”‚ 0.91536 β”‚ 50         β”‚ 2               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

dvc exp gc only removes references to the experiments, not the cached objects associated to them. To clean up the cache, use dvc gc.

Content

πŸ› Found an issue? Let us know! Or fix it:

Edit on GitHub

❓ Have a question? Join our chat, we will help you:

Discord Chat