Edit on GitHub

plots

A set of commands to visualize and compare plot metrics in structured files (JSON, YAML, CSV, or TSV): show, diff, and modify.

Synopsis

usage: dvc plots [-h] [-q | -v] {show,diff,modify} ...

positional arguments:
  COMMAND
    show         Generate plot from a metrics file.
    diff         Plot differences in metrics between commits.
    modify       Modify plot properties associated with a target file.

Types of metrics

DVC has two concepts for metrics, that represent different results of machine learning training or data processing:

  1. dvc metrics represent scalar numbers such as AUC, true positive rate, etc.
  2. dvc plots can be used to visualize data series such as AUC curves, loss functions, confusion matrices, etc.

Description

DVC provides a set of commands to visualize certain metrics of machine learning experiments as plots. Usual plot examples are AUC curves, loss functions, confusion matrices, among others.

This type of metrics files are created by users, or generated by user data processing code, and can be defined in dvc.yaml (plots field) for tracking (optional).

DVC generates plots as HTML files that can be open with a web browser. These HTML files use Vega-Lite. Vega-Lite is a declarative grammar for defining plots using JSON. The plots can also be saved as SVG or PNG image filed from the browser.

In contrast to dvc metrics, these metrics should be stored as data series. Unlike its dvc metrics counterpart, dvc plots diff cannot calculate numeric differences between the metrics in different experiments.

Supported file formats

Plot metrics can be organized as data series in JSON, YAML 1.2, CSV, or TSV files. DVC expects to see an array (or multiple arrays) of objects (usually float numbers) in the file.

In tabular file formats such as CSV and TSV, each column is an array. dvc plots subcommands can produce plots for a specified column or a set of them. For example, epoch, AUC, and loss are the column names below:

epoch, AUC, loss
34, 0.91935, 0.0317345
35, 0.91913, 0.0317829
36, 0.92256, 0.0304632
37, 0.92302, 0.0299015

Hierarchical file formats such as JSON and YAML consists of an array of consistent objects (sharing a common structure): All objects should contain the fields used for the X and Y axis of the plot (see DVC template anchors); Extra elements will be ignored silently.

dvc plots subcommands can produce plots for a specified field or a set of them, from the array's objects. For example, val_loss is one of the field names in the train array below:

{
  "train": [
    {"val_accuracy": 0.9665, "val_loss": 0.10757},
    {"val_accuracy": 0.9764, "val_loss": 0.07324},
    {"val_accuracy": 0.8770, "val_loss": 0.08136},
    {"val_accuracy": 0.8740, "val_loss": 0.09026},
    {"val_accuracy": 0.8795, "val_loss": 0.07640},
    {"val_accuracy": 0.8803, "val_loss": 0.07608},
    {"val_accuracy": 0.8987, "val_loss": 0.08455}
  ]
}

Plot templates

Users have the ability to change the way plots are displayed by modifying the Vega-Lite specification, thus generating plots in the style that best fits the their needs. This keeps DVC projects programming language agnostic, as it's independent from user display configuration and visualization code.

Built-in plot templates are stored in the .dvc/plots/ directory. The default one is called default.json. It can be changed with the --template (-t) option of dvc plots show and dvc plots diff. For templates in the .dvc/plots/ directory, the path and the json extension are not required: you can specify only the base name e.g. --template scatter.

DVC has the following built-in plot templates:

  • default - linear plot
  • scatter - scatter plot
  • smooth - linear plot with LOESS smoothing, see example
  • confusion - confusion matrix, see example

Custom templates

Plot template files are Vega-Lite files that use predefined DVC anchors as placeholders for DVC to inject the plot values. You can create a custom template from scratch, or modify an existing one from .dvc/plots/.

๐Ÿ’ก Note that custom templates can be safely added to the template directory.

All metrics files given to dvc plots show and dvc plots diff as input are combined together into a single data array for injection into a template file. There are two important fields that DVC adds to the plot data:

  • index - zero-based counter for the data rows/values. In many cases it corresponds to a machine learning training epoch or step number.
  • rev - Git commit hash, tag, or branch of the metrics file. This helps distinguish between different versions when using the dvc plots diff command.

Note that in the case of CSV/TSV metrics files, column names from the table header (first row) are equivalent to field names.

DVC template anchors

  • <DVC_METRIC_DATA> (required) - the plot data from any type of metrics files is converted to a single JSON array, and injected instead of this anchor. Two additional fields will be added: index and rev (explained above).
  • <DVC_METRIC_TITLE> (optional) - a title for the plot, that can be defined with the --title option of the dvc plot subcommands.
  • <DVC_METRIC_X> (optional) - field name of the data for the X axis. It can be defined with the -x option of the dvc plot subcommands. The auto-generated index field (explained above) is the default.
  • <DVC_METRIC_Y> (optional) - field name of the data for the Y axis. It can be defined with the -y option of the dvc plot subcommands. It defaults to the last header of the metrics file: the last column for CSV/TSV, or the last field for JSON/YAML.
  • <DVC_METRIC_X_LABEL> (optional) - field name to display as the X axis label
  • <DVC_METRIC_Y_LABEL> (optional) - field name to display as the X axis label

HTML templates

It's possible to supply an HTML file to dvc plot show and dvc plot diff by using the the --html-template option. This allows you to customize the container where DVC will inject plots it generates.

โš ๏ธ This is a separate feature from custom Vega-Lite templates.

The only requirement for this HTML file is to specify the place to inject plots with a {plot_divs} marker. See an example that uses this feature to render DVC plots without an Internet connection, below.

Options

  • -h, --help - prints the usage/help message, and exit.
  • -q, --quiet - do not write anything to standard output.
  • -v, --verbose - displays detailed tracing information.

Example: Tabular data

We'll use tabular metrics file logs.csv for this example:

epoch,accuracy,loss,val_accuracy,val_loss
0,0.9418667,0.19958884770199656,0.9679,0.10217399864746257
1,0.9763333,0.07896138601688048,0.9768,0.07310650711813942
2,0.98375,0.05241111190887168,0.9788,0.06665669009438716
3,0.98801666,0.03681169906261687,0.9781,0.06697812260198989
4,0.99111664,0.027362171787042946,0.978,0.07385754839298315
5,0.9932333,0.02069501801203781,0.9771,0.08009233058886166
6,0.9945,0.017702101902437668,0.9803,0.07830339228538505
7,0.9954,0.01396906608727198,0.9802,0.07247738889862157

Let's plot the last column (default behavior):

$ dvc plots show logs.csv
file:///Users/usr/src/plots/logs.csv.html

Difference in this metric between the current project version and the previous commit:

$ dvc plots diff -d logs.csv HEAD^
file:///Users/usr/src/plots/logs.csv.html

Visualize a specific field:

$ dvc plots show -y loss logs.csv
file:///Users/usr/src/plots/logs.html

Example: Smooth plot

In some cases we would like to smooth our plot. In this example we will use a plot with 1000 data points:

$ dvc plots show data.csv
file:///Users/usr/src/plots/plots.html

We can use the -t option and smooth template to make it less noisy:

$ dvc plots show -t smooth data.csv
file:///Users/usr/src/plots/plots.html

Example: Confusion matrix

We'll use classes.csv for this example:

actual,predicted
cat,cat
cat,cat
cat,cat
cat,dog
cat,dinosaur
cat,dinosaur
cat,bird
turtle,dog
turtle,cat
...

Let's visualize it:

$ dvc plots show classes.csv --template confusion -x actual -y predicted
file:///Users/usr/src/plots/classes.csv.html

A confusion matrix template is predefined in DVC (found in .dvc/plots/confusion.json).

Example: Offline HTML Template

The plots generated by dvc plots uses Vega-Lite JavaScript libraries, and by default these load online resources. There may be times when you need to produce plots without Internet access, or want to customize the plots output to put some extra content, like banners or extra text. DVC allows to replace the HTML file that contains the final plots.

Download the Vega-Lite libraries into the directory where you'll produce the dvc plots:

$ wget https://cdn.jsdelivr.net/npm/vega@5.20.2 -O my_vega.js
$ wget https://cdn.jsdelivr.net/npm/vega-lite@5.1.0 -O my_vega_lite.js
$ wget https://cdn.jsdelivr.net/npm/vega-embed@6.18.2 -O my_vega_embed.js

Create the following HTML file and save it in .dvc/plots/mypage.html:

<html>
  <head>
    <script src="../path/to/my_vega.js" type="text/javascript"></script>
    <script src="../path/to/my_vega_lite.js" type="text/javascript"></script>
    <script src="../path/to/my_vega_embed.js" type="text/javascript"></script>
  </head>
  <body>
    {plot_divs}
  </body>
</html>

Note that this is a standard HTML file with only {plot_divs} as a placeholder for DVC to inject plots. <script> tags in this file point to the local JavaScript libraries we downloaded above. We can use it like this:

$ dvc plots show --html-template .dvc/plots/mypage.html

You can also make it the default HTML template by setting it as dvc config parameter plots.html_template.

$ dvc config plots.html_template plots/mypage.html

Note that the path supplied to dvc config plots.html_template is relative to .dvc/ directory.