In this section, we'll add visualization to the example-dvc-experiments
project (explored previously). If you would like to
try these yourself, please refer to the project. README about how to install.
A useful plot to show the classification performance is the confusion matrix. In order to produce it, DVC expects a CSV plots file in the form:
actual,predicted
0,0
0,2
...
We added a loop comparing the results to generate this file from the predictions.
Running the experiment with dvc exp run
will produce plots/confusion.csv
.
Use dvc plots show
to present it as an HTML file, and open it in the browser:
$ dvc plots show plots/confusion.csv --template confusion \
-x actual -y predicted
file:///.../example-dvc-experiments/plots/confusion.json.html
Let's produce another plot to see misclassified examples from each class. This procedure generates the misclassification examples from the validation data and arranges them into a confusion table that shows the correct label, and misclassification sample. The code to generate an image from a set of training images is omitted here but you can find the code in the example project.
$ dvc plots show plots/misclassified.png
An important issue for deep learning projects is to observe in which epoch do training and validation loss differ. DVC helps in that regard with its Python integrations to deep learning libraries via DVCLive.
The example project uses Keras to train a classifier, and we have a DVCLive callback that visualizes the training and validation loss for each epoch. We first import the callback from DVCLive.
from dvclive.keras import DvcLiveCallback
Then we add this callback to the
fit
method
call.
model.fit(
...
callbacks=[DvcLiveCallback()],
...)
With these two changes, the model metrics are automatically logged to
dvclive.json
and plotted in training_metrics/index.html
:
DVCLive has other capabilities, like saving the model every epoch or modifying these default values.
In summary, DVC provides more than one option to use visualization in your workflow:
DVC can generate HTML files that includes interactive plots from data series in JSON, YAML, CSV, or TSV format.
DVC can keep track of image files produced as plot outputs from the training/evaluation scripts.
DVCLive integrations can produce plots automatically during training.