Hugging Face Transformers
DVCLive allows you to add experiment tracking capabilities to your Hugging Face Transformers projects.
If you are using Hugging Face Accelerate, check the DVCLive - Hugging Face Accelerate page.
Usage
If you have dvclive
installed, the DVCLiveCallback
will be used for
tracking experiments and logging metrics, parameters, and plots automatically
for transformers>=4.36.0
.
To log the model, set HF_DVCLIVE_LOG_MODEL=true
in your environment.
os.environ["HF_DVCLIVE_LOG_MODEL"] = "true"
from transformers import TrainingArguments, Trainer
# optional, `report_to` defaults to "all"
args = TrainingArguments(..., report_to="dvclive")
trainer = Trainer(..., args=args)
To customize tracking, include the DVCLiveCallback
in the callbacks list
passed to your
Trainer
,
along with a Live
instance including additonal arguments:
from dvclive import Live
from transformers.integrations import DVCLiveCallback
...
trainer = Trainer(...)
trainer.add_callback(DVCLiveCallback(Live(dir="custom_dir")))
trainer.train()
For transformers<4.36.0
, import the callback from dvclive
instead of
transformers
:
from dvclive.huggingface import DVCLiveCallback
...
trainer = Trainer(...)
trainer.add_callback(DVCLiveCallback())
trainer.train()
dvclive.huggingface.DVCLiveCallback
will be deprecated in DVCLive 4.0 in favor
of transformers.integrations.DVCLiveCallback
.
Examples
Log model checkpoints
Use HF_DVCLIVE_LOG_MODEL=true
or log_model=True
to save the checkpoints (it
will use Live.log_artifact()
internally to save those).
If true, DVCLive will save a copy of the last checkpoint to the
dvclive/artifacts
directory and annotate it with name last
or best
(if
args.load_best_model_at_end).
This is useful to be consumed in the model registry or automation scenarios.
- Save the
last
checkpoint at the end of training:
os.environ["HF_DVCLIVE_LOG_MODEL"] = "true"
from transformers import TrainingArguments, Trainer
args = TrainingArguments(..., report_to="dvclive")
trainer = Trainer(..., args=args)
- Save the
best
checkpoint at the end of training:
os.environ["HF_DVCLIVE_LOG_MODEL"] = "true"
from transformers import TrainingArguments, Trainer
args = TrainingArguments(..., report_to="dvclive")
trainer = Trainer(..., args=args)
trainer.args.load_best_model_at_end = True
- Save updates to the checkpoints directory whenever a new checkpoint is saved:
os.environ["HF_DVCLIVE_LOG_MODEL"] = "all"
from transformers import TrainingArguments, Trainer
args = TrainingArguments(..., report_to="dvclive")
trainer = Trainer(..., args=args)
Passing additional DVCLive arguments
Use live
to pass an existing Live
instance.
from dvclive import Live
from transformers.integrations import DVCLiveCallback
with Live("custom_dir") as live:
trainer = Trainer(...)
trainer.add_callback(DVCLiveCallback(live=live))
# Log additional metrics after training
live.log_metric("summary_metric", 1.0, plot=False)
Output format
Each metric will be logged to:
{Live.plots_dir}/metrics/{split}/{metric}.tsv
Where:
{Live.plots_dir}
is defined inLive
.{split}
can be eithertrain
oreval
.{metric}
is the name provided by the framework.