DVC makes it easy to track metrics, update parameters, and visualize performance with plots. These concepts are introduced below.
All of the above can be combined into experiments to run and compare many iterations of your ML project.
First, let's see what is the mechanism to capture values for these ML attributes. Let's add a final evaluation stage to our pipeline from before:
$ dvc run -n evaluate \
-d src/evaluate.py -d model.pkl -d data/features \
-M scores.json \
--plots-no-cache prc.json \
--plots-no-cache roc.json \
python src/evaluate.py model.pkl \
data/features scores.json prc.json roc.json
The -M
option here specifies a metrics file, while --plots-no-cache
specifies a plots file (produced by this stage) which will not be
cached by DVC. dvc run
generates a new stage in the dvc.yaml
file:
evaluate:
cmd: python src/evaluate.py model.pkl data/features ...
deps:
- data/features
- model.pkl
- src/evaluate.py
metrics:
- scores.json:
cache: false
plots:
- prc.json:
cache: false
- roc.json:
cache: false
The biggest difference to previous stages in our pipeline is in two new
sections: metrics
and plots
. These are used to mark certain files containing
ML "telemetry". Metrics files contain scalar values (e.g. AUC
) and plots files
contain matrices and data series (e.g. ROC curves
or model loss plots) meant
to be visualized and compared.
With
cache: false
, DVC skips caching the output, as we wantscores.json
,prc.json
, androc.json
to be versioned by Git.
evaluate.py
writes the model's
ROC-AUC
and
average precision
to scores.json
, which in turn is marked as a metrics
file with -M
. Its
contents are:
{ "avg_prec": 0.5204838673030754, "roc_auc": 0.9032012604172255 }
evaluate.py
also writes precision
, recall
, and thresholds
arrays
(obtained using
precision_recall_curve
)
into the plots file prc.json
:
{
"prc": [
{ "precision": 0.021473008227975116, "recall": 1.0, "threshold": 0.0 },
...,
{ "precision": 1.0, "recall": 0.009345794392523364, "threshold": 0.6 }
]
}
Similarly, it writes arrays for the
roc_curve
into roc.json
for an additional plot.
DVC doesn't force you to use any specific file names, nor does it enforce a format or structure of a metrics or plots file. It's completely user/case-defined. Refer to
dvc metrics
anddvc plots
for more details.
You can view tracked metrics and plots with DVC. Let's start with the metrics:
$ dvc metrics show
Path avg_prec roc_auc
scores.json 0.52048 0.9032
To view plots, first specify which arrays to use as the plot axes. We only need to do this once, and DVC will save our plot configurations.
$ dvc plots modify prc.json -x recall -y precision
Modifying stage 'evaluate' in 'dvc.yaml'
$ dvc plots modify roc.json -x fpr -y tpr
Modifying stage 'evaluate' in 'dvc.yaml'
Now let's view the plots:
$ dvc plots show
file:///Users/dvc/example-get-started/plots.html
Let's save this iteration, so we can compare it later:
$ git add scores.json prc.json roc.json
$ git commit -a -m "Create evaluation stage"
Later we will see how to compare and visualize different pipeline iterations. For now, let's see how can we capture another important piece of information which will be useful for comparison: parameters.
It's pretty common for data science pipelines to include configuration files that define adjustable parameters to train a model, do pre-processing, etc. DVC provides a mechanism for stages to depend on the values of specific sections of such a config file (YAML, JSON, TOML, and Python formats are supported).
Luckily, we should already have a stage with
parameters in dvc.yaml
:
featurize:
cmd: python src/featurization.py data/prepared data/features
deps:
- data/prepared
- src/featurization.py
params:
- featurize.max_features
- featurize.ngrams
outs:
- data/features
The featurize
stage
was created with this
dvc run
command. Notice the argument sent to the -p
option (short for
--params
):
$ dvc run -n featurize \
-p featurize.max_features,featurize.ngrams \
-d src/featurization.py -d data/prepared \
-o data/features \
python src/featurization.py data/prepared data/features
The params
section defines the parameter dependencies of the featurize
stage. By default, DVC reads those values (featurize.max_features
and
featurize.ngrams
) from a params.yaml
file. But as with metrics and plots,
parameter file names and structure can also be user- and case-defined.
Here's the contents of our params.yaml
file:
prepare:
split: 0.20
seed: 20170428
featurize:
max_features: 500
ngrams: 1
train:
seed: 20170428
n_est: 50
min_split: 2
We are definitely not happy with the AUC value we got so far! Let's edit the
params.yaml
file to use bigrams and increase the number of features:
featurize:
- max_features: 500
- ngrams: 1
+ max_features: 1500
+ ngrams: 2
The beauty of dvc.yaml
is that all you need to do now is run:
$ dvc repro
It'll analyze the changes, use existing results from the run-cache, and execute only the commands needed to produce new results (model, metrics, plots).
The same logic applies to other possible adjustments — edit source code, update
datasets — you do the changes, use dvc repro
, and DVC runs what needs to be.
Finally, let's see how the updates improved performance. DVC has a few commands to see changes in and visualize metrics, parameters, and plots. These commands can work for one or across multiple pipeline iteration(s). Let's compare the current "bigrams" run with the last committed "baseline" iteration:
$ dvc params diff
Path Param HEAD workspace
params.yaml featurize.max_features 500 1500
params.yaml featurize.ngrams 1 2
dvc params diff
can show how params in the workspace differ vs. the last
commit.
dvc metrics diff
does the same for metrics:
$ dvc metrics diff
Path Metric HEAD workspace Change
scores.json avg_prec 0.52048 0.55259 0.03211
scores.json roc_auc 0.9032 0.91536 0.01216
And finally, we can compare both precision recall
and roc
curves with a
single command!
$ dvc plots diff
file:///Users/dvc/example-get-started/plots.html
See
dvc plots diff
for more info on its options.
All these commands also accept Git revisions (commits, tags, branch names) to compare.
On the next page, you can learn advanced ways to track, organize, and compare more experiment iterations.