Edit on GitHub

plots

Contains commands to visualize plot metrics in structured files (JSON, CSV, or TSV): show, diff.

Synopsis

usage: dvc plots [-h] [-q | -v] {show,diff} ...

positional arguments:
  COMMAND
    show         Generate a plot image file from a metrics file.
    diff         Plot differences in metrics between commits in the
                 DVC repository, or between the last commit and the
                 workspace.

Types of metrics

DVC has two concepts for metrics, that represent different results of machine learning training or data processing:

  1. dvc metrics represent scalar numbers such as AUC, true positive rate, etc.
  2. dvc plots can be used to visualize data series such as AUC curves, loss functions, confusion matrices, etc.

Description

DVC provides a set of commands to visualize metrics of machine learning experiments. Usual plot examples are AUC curves, loss functions, confusion matrices, among others.

This kind of metric files are created by users, or generated by user data processing code. dvc plots subcommands can work with metric files committed to a Git repo history, data files controlled by DVC, or any other file in system.

DVC generates plots as HTML files that can be open with a web browser. These HTML files use Vega-Lite. Vega is a declarative grammar for defining plots using JSON. The plots can also be saved as SVG or PNG image filed from the browser.

In contrast to dvc metrics, these metrics should be stored as data series. Unlike its dvc metrics counterpart, dvc plots diff cannot calculate numeric differences between the metrics in different experiments.

Supported file formats

Continuous metrics can be organized as data series in JSON, CSV, or TSV files. DVC expects to see an array (or multiple arrays) of objects (usually float numbers) in the file.

In tabular file formats such as CSV and TSV, each column (or field) is an array. dvc plots show can generate visuals for a specified column or a set of columns. Like AUC column:

epoch, AUC, loss
34, 0.91935, 0.0317345
35, 0.91913, 0.0317829
36, 0.92256, 0.0304632
37, 0.92302, 0.0299015

In hierarchical file formats such as JSON, an array of JSON objects is expected. dvc plots show command can generate visuals for a specified field name or a set of fields from the array's object. Like val_loss field in the train array in this example:

{
  "train": [
    {"val_accuracy": 0.9665, "val_loss": 0.10757},
    {"val_accuracy": 0.9764, "val_loss": 0.07324},
    {"val_accuracy": 0.8770, "val_loss": 0.08136},
    {"val_accuracy": 0.8740, "val_loss": 0.09026},
    {"val_accuracy": 0.8795, "val_loss": 0.07640},
    {"val_accuracy": 0.8803, "val_loss": 0.07608},
    {"val_accuracy": 0.8987, "val_loss": 0.08455}
  ]
}

Plot templates

DVC gives users the ability to change the Vega JSON schema, and generate plots in the format that best fits the their needs. This doesn't make DVC projects dependent on user visualization code, programming language, or specific environments, keeping DVC agnostic.

Built-in plot templates are stored in the .dvc/plots/ directory. The default one is called default.json. It can be changed with the --template (-t) option of dvc plots show and dvc plots diff. For templates in the .dvc/plots/ directory, the path and the json extension are not required: you can specify only the base name e.g. --template scatter.

Custom templates

Plot template files are just JSON specifications with predefined DVC anchors that help DVC to inject user's data properly. You can create a custom template from scratch or modify an existing one from .dvc/plots/. Custom templates can be added to the template directory.

All JSON files given to dvc plots show and dvc plots diff as input are combined together into a single data array for the injection to a template file. There are two important fields that DVC adds to the plot data:

  • index - self-incrementing, zero-based counter for the data rows/values. In many cases it corresponds to a machine learning training epoch or step number.
  • rev - Git commit hash, tag, or branch of the metrics file. This helps distinguish between different versions when using the dvc plots diff command.

DVC applies the same logic to all CSV/TSV files, but first transforms the data into JSON. DVC uses column names from a header for JSON conversion into fields.

DVC template anchors:

  • <DVC_METRIC_DATA> - plotting command input data from either CSV or JSON files is converted to JSON array and injected instead of this anchor. Two additional fields will be added: index and rev (explained above).
  • <DVC_METRIC_TITLE> - a title for the plot, that can be defined by --title option.
  • <DVC_METRIC_Y> - a field name for Y axis of the plot. It can be defined by -y option of the commands. The default field is the last field found in the input file: the last column in CSV file or the last field in the JSON array object.
  • <DVC_METRIC_X> - a field name for Y axes. It can be defined by -x option. index is the default field for X.
  • <DVC_METRIC_Y_TITLE> - a displayed field label for Y.
  • <DVC_METRIC_X_TITLE> - a displayed field label for X.

Options

  • -h, --help - prints the usage/help message, and exit.
  • -q, --quiet - do not write anything to standard output.
  • -v, --verbose - displays detailed tracing information.

Example: Tabular data

We'll use tabular metrics file logs.csv for this example:

epoch,accuracy,loss,val_accuracy,val_loss
0,0.9418667,0.19958884770199656,0.9679,0.10217399864746257
1,0.9763333,0.07896138601688048,0.9768,0.07310650711813942
2,0.98375,0.05241111190887168,0.9788,0.06665669009438716
3,0.98801666,0.03681169906261687,0.9781,0.06697812260198989
4,0.99111664,0.027362171787042946,0.978,0.07385754839298315
5,0.9932333,0.02069501801203781,0.9771,0.08009233058886166
6,0.9945,0.017702101902437668,0.9803,0.07830339228538505
7,0.9954,0.01396906608727198,0.9802,0.07247738889862157

Let's plot the last column (default behavior):

$ dvc plots show logs.csv
file:///Users/usr/src/plots/logs.csv.html

Difference in this metric between the current project version and the previous commit:

$ dvc plots diff -d logs.csv HEAD^
file:///Users/usr/src/plots/logs.csv.html

Visualize a specific field:

$ dvc plots show -y loss logs.csv
file:///Users/usr/src/plots/logs.html

Example: Confusion matrix

We'll use classes.csv for this example:

actual,predicted
cat,cat
cat,cat
cat,cat
cat,dog
cat,dinosaur
cat,dinosaur
cat,bird
turtle,dog
turtle,cat
...

Let's visualize it:

$ dvc plots show classes.csv --template confusion -x actual -y predicted
file:///Users/usr/src/plots/classes.csv.html

A confusion matrix template is predefined in DVC (found in .dvc/plots/confusion.json).

Content

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

Have a question? Join our chat, we will help you:

Discord Chat