We explain how DVC codifies and executes experiments, setting their parameters, using multiple jobs to run them in parallel, and running them in queues, among other details.
📖 If this is the first time you are introduced into data science experimentation, you may want to check the basics in Get Started: Experiments first.
DVC relies on pipelines that codify experiment workflows (code,
stages, parameters, outputs, etc.) in a
dvc.yaml file. These contain the commands to run the experiments.
You can run the pipeline using default settings with
dvc exp run:
$ dvc exp run
DVC keeps track of the dependency graph and runs only the stages with changed dependencies or missing outputs.
Example: for a pipeline composed of
evaluatestages, if a dependency of
preparestage has changed, the downstream stages (
evaluate) are also run.
By default DVC uses
./dvc.yaml (in the current directory). You can specify
dvc.yaml files in other directories, or even specific stages to run. These are
given as the last argument to the
dvc exp run. Examples:
$ dvc exp run my-project/dvc.yaml # a specific dvc.yaml file $ dvc exp run extract # a specific stage (from `./dvc.yaml`) $ dvc exp run my-project/dvc.yaml:extract # ^ a stage from a specific dvc.yaml file
📖 See reproduction
targetsfor all the details.
In some cases you may need to run a stage without invoking its dependents. The
-s) flag allows to run the command of a single stage.
Example: for a pipeline composed of
evaluatestages and you only want to run the
trainstage to check its outputs, you can do so by:
$ dvc exp run --single-stage train
DVC projects support more than a single pipeline in one or more
dvc.yaml files. In this case, you can run all pipelines with a single command:
$ dvc exp run --all-pipelines
Note that the order in which pipelines are executed is not guaranteed; Only the internal order of stage execution is.
When you want to have more granular control over which stages are run, you can
--interactive option. This flag allows you to confirm each stage
$ dvc exp run --interactive Going to reproduce stage: 'train'... continue? [y/n]
Parameters are the values that modify the underlying code's behavior, producing different experiment results. Machine learning experimentation, for example, involves searching hyperparameters that improve the resulting model metrics.
In DVC projects, parameters should be read by the code from parameter files
params.yaml by default). DVC parses these files to track individual param
values. When a tracked param is changed,
dvc exp run invalidates any stages
that depend on it, and reruns the experiment.
For a params file named
params.yaml with the contents
model: learning_rate: 0.0001
You can specify the parameter dependency as
$ dvc stage add -n train \ --parameter model.learning_rate \ --outs ...
⚠️ DVC does not check whether the parameters are actually used in your code.
DVC allows to update the parameters from command line when running
dvc experiments. The
-S) option takes a parameter name and
its value, and updates the params file before the run.
$ dvc exp run --set-param model.learning_rate=0.0002
To set more than one param for the same experiment, use the
-S option multiple
$ dvc exp run -S learning_rate=0.001 -S units=128
⚠️ Note that DVC doesn't check whether parameters given to
--set-paramare already in the parameters file. If there is a typo, a new or different param will be added/changed.
--queue option of
dvc exp run tells DVC to append an experiment for
later execution. Nothing is actually run yet.
$ dvc exp run --queue -S units=10 Queued experiment '1cac8ca' for future execution. $ dvc exp run --queue -S units=64 Queued experiment '23660bb' for future execution. $ dvc exp run --queue -S units=128 Queued experiment '3591a5c' for future execution. $ dvc exp run --queue -S units=256 Queued experiment '4109ead' for future execution.
Each experiment is derived from the workspace at the time it's queued. If you make changes in the workspace afterwards, they won't be reflected in queued experiments (once run).
Run them all one-by-one with the
--run-all flag. The order of execution is
independent of their creation order.
$ dvc exp run --run-all
To remove all experiments from the queue and start over, you can use
dvc exp remove --queue.
DVC allows to run queued experiments in parallel by specifying a number of
execution processes (
$ dvc exp run --run-all --jobs 4
Note that since each experiment runs in an independent temporary directory, common stages may sometimes be executed several times depending on the state of the run-cache at that time.
⚠️ Parallel runs are experimental and may be unstable at this time. ⚠️ Make sure you're using a number of jobs that your environment can handle (no more than the CPU cores).
To track successive steps in a longer or deeper experiment, you can register checkpoints from your code.
📖 See Checkpoints to learn about this feature.
Running the experiments containing checkpoints is no different than with regular ones, e.g.:
$ dvc exp run -S param=value
All checkpoints registered at runtime will be preserved, even if the process
gets interrupted (e.g. with
Ctrl+C, or by an error). Without interruption, a
"wrap-up" checkpoint will be added (if needed), so that changes to pipeline
outputs don't remain in the workspace.
Subsequent uses of
dvc exp run will continue from the latest checkpoint (using
the latest cached versions of all outputs). To resume from a previous checkpoint
(list them with
dvc exp show), you must first
dvc exp apply it before the
dvc exp run. For
--temp runs, use
--rev to specify the
checkpoint to continue from.
--reset to start over (discards previous checkpoints and
their outputs). This is useful for re-training ML models, for example.
Note that queuing an experiment that uses checkpoints implies
--reset, unless a
--revis provided (refer to the previous section).