How it Works
Directory structure
DVCLive will store the logged data under the directory (dir
) passed to
Live()
. If not provided, dvclive
will be used by
default.
The contents of the directory will depend on the methods used:
Method | Writes to |
---|---|
Live.log_artifact(path) | {path}.dvc |
Live.log_metric() | dvclive/plots/metrics |
Live.log_image() | dvclive/plots/images |
Live.log_param() | dvclive/params.yaml |
Live.log_sklearn_plot() | dvclive/plots/sklearn |
Live.make_dvcyaml() | dvclive/dvc.yaml |
Live.make_report() | dvclive/report.{md/html} |
Live.make_summary() | dvclive/metrics.json |
Live.next_step() | dvclive/dvc.yaml dvclive/metrics.json dvclive/report.{md/html} |
Live.end() | dvclive/dvc.yaml dvclive/metrics.json dvclive/report.{md/html} |
Example
To illustrate with an example, given the following code:
import random
from dvclive import Live
from PIL import Image
EPOCHS = 2
with Live(save_dvc_exp=True) as live:
live.log_param("epochs", EPOCHS)
for i in range(EPOCHS):
live.log_metric("metric", i + random.random())
live.log_metric("nested/metric", i + random.random())
live.log_image("img.png", Image.new("RGB", (50, 50), (i, i, i)))
Path("model.pt").write_text(str(random.random()))
live.next_step()
live.log_artifact("model.pt")
live.log_sklearn_plot("confusion_matrix", [0, 0, 1, 1], [0, 1, 0, 1])
live.summary["additional_metric"] = 1.0
# live.end() has been called at this point
The resulting structure will be:
dvclive
├── dvc.yaml
├── metrics.json
├── params.yaml
├── plots
│ ├── images
│ │ └── img.png
│ ├── metrics
│ │ ├── metric.tsv
│ │ └── nested
│ │ └── metric.tsv
│ └── sklearn
│ └── confusion_matrix.json
└── report.html
model.pt
model.pt.dvc
Track the results
DVCLive expects each run to be tracked by Git, so it will save each run to the
same path and overwrite the results each time. Include
save_dvc_exp=True
to auto-track
as a DVC experiment. DVC experiments are Git commits that DVC can
find but that don't clutter your Git history or create extra branches.
Track large artifacts with DVC
Models and data are often large and aren't easily tracked in Git.
Live.log_artifact("model.pt")
will
cache the model.pt
file with DVC
and make Git ignore it. It will generate a model.pt.dvc
metadata file, which
can be tracked in Git and becomes part of the experiment. With this metadata
file, you can retrieve
the versioned artifact from the Git commit.
Run with DVC
Experimenting in Python interactively (like in notebooks) is great for
exploration, but eventually you may need a more structured way to run
reproducible experiments (for example, running a parallelized hyperparameter
search). By configuring DVC pipelines, you can
run experiments
with dvc exp run
.
DVCLive prints instructions for how to configure a pipeline stage in dvc.yaml
like:
stages:
dvclive:
cmd: <python my_code_file.py my_args>
deps:
- <my_code_file.py>
outs:
- model.pt
Add this pipeline stage into dvc.yaml
, modifying it to fit your project. Then,
run it with dvc exp run
. This will track the inputs and outputs of your code,
and also enable features like queuing, parameter tuning, and grid searches.
Add to a dvc.yaml
file at the base of your repository. Do not use
dvclive/dvc.yaml
since DVCLive will overwrite it during each run.
If you already have a .dvc
file like model.pt.dvc
, DVC will not allow you to
also track model.pt
in dvc.yaml
. You must dvc remove model.pt.dvc
before
you can add it to dvc.yaml
.