Edit on GitHub

How it Works

Directory structure

DVCLive will store the logged data under the directory (dir) passed to Live(). If not provided, dvclive will be used by default.

The contents of the directory will depend on the methods used:

MethodWrites to
Live.log_artifact(path){path}.dvc
Live.log_metric()dvclive/plots/metrics
Live.log_image()dvclive/plots/images
Live.log_param()dvclive/params.yaml
Live.log_sklearn_plot()dvclive/plots/sklearn
Live.make_dvcyaml()dvclive/dvc.yaml
Live.make_report()dvclive/report.{md/html}
Live.make_summary()dvclive/metrics.json
Live.next_step()dvclive/dvc.yaml
dvclive/metrics.json
dvclive/report.{md/html}
Live.end()dvclive/dvc.yaml
dvclive/metrics.json
dvclive/report.{md/html}

Example

To illustrate with an example, given the following code:

import random

from dvclive import Live
from PIL import Image

EPOCHS = 2

with Live(save_dvc_exp=True) as live:
    live.log_param("epochs", EPOCHS)

    for i in range(EPOCHS):
        live.log_metric("metric", i + random.random())
        live.log_metric("nested/metric", i + random.random())
        live.log_image("img.png", Image.new("RGB", (50, 50), (i, i, i)))
        Path("model.pt").write_text(str(random.random()))
        live.next_step()

    live.log_artifact("model.pt")
    live.log_sklearn_plot("confusion_matrix", [0, 0, 1, 1], [0, 1, 0, 1])
    live.summary["additional_metric"] = 1.0
# live.end() has been called at this point

The resulting structure will be:

dvclive
├── dvc.yaml
├── metrics.json
├── params.yaml
├── plots
│   ├── images
│   │   └── img.png
│   ├── metrics
│   │   ├── metric.tsv
│   │   └── nested
│   │       └── metric.tsv
│   └── sklearn
│       └── confusion_matrix.json
└── report.html
model.pt
model.pt.dvc

Track the results

DVCLive expects each run to be tracked by Git, so it will save each run to the same path and overwrite the results each time. Include save_dvc_exp=True to auto-track as a DVC experiment. DVC experiments are Git commits that DVC can find but that don't clutter your Git history or create extra branches.

Track large artifacts with DVC

Models and data are often large and aren't easily tracked in Git. Live.log_artifact("model.pt") will cache the model.pt file with DVC and make Git ignore it. It will generate a model.pt.dvc metadata file, which can be tracked in Git and becomes part of the experiment. With this metadata file, you can retrieve the versioned artifact from the Git commit.

Run with DVC

Experimenting in Python interactively (like in notebooks) is great for exploration, but eventually you may need a more structured way to run reproducible experiments (for example, running a parallelized hyperparameter search). By configuring DVC pipelines, you can run experiments with dvc exp run.

DVCLive prints instructions for how to configure a pipeline stage in dvc.yaml like:

stages:
  dvclive:
    cmd: <python my_code_file.py my_args>
    deps:
      - <my_code_file.py>
    outs:
      - model.pt

Add this pipeline stage into dvc.yaml, modifying it to fit your project. Then, run it with dvc exp run. This will track the inputs and outputs of your code, and also enable features like queuing, parameter tuning, and grid searches.

Add to a dvc.yaml file at the base of your repository. Do not use dvclive/dvc.yaml since DVCLive will overwrite it during each run.

If you already have a .dvc file like model.pt.dvc, DVC will not allow you to also track model.pt in dvc.yaml. You must dvc remove model.pt.dvc before you can add it to dvc.yaml.

Content

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

Have a question? Join our chat, we will help you:

Discord Chat