Hugging Face Transformers
DVCLive allows you to add experiment tracking capabilities to your Hugging Face Transformers projects.
If you are using Hugging Face Accelerate, check the DVCLive - Hugging Face Accelerate page.
Usage
If you have dvclive installed, the DVCLiveCallback will be used for
tracking experiments and logging metrics, parameters, and plots automatically
for transformers>=4.36.0.
To log the model, set HF_DVCLIVE_LOG_MODEL=true in your environment.
os.environ["HF_DVCLIVE_LOG_MODEL"] = "true"
from transformers import TrainingArguments, Trainer
# optional, `report_to` defaults to "all"
args = TrainingArguments(..., report_to="dvclive")
trainer = Trainer(..., args=args)To customize tracking, include the DVCLiveCallback in the callbacks list
passed to your
Trainer,
along with a Live instance including additonal arguments:
from dvclive import Live
from transformers.integrations import DVCLiveCallback
...
trainer = Trainer(...)
trainer.add_callback(DVCLiveCallback(Live(dir="custom_dir")))
trainer.train()For transformers<4.36.0, import the callback from dvclive instead of
transformers:
from dvclive.huggingface import DVCLiveCallback
...
trainer = Trainer(...)
trainer.add_callback(DVCLiveCallback())
trainer.train()dvclive.huggingface.DVCLiveCallback will be deprecated in DVCLive 4.0 in favor
of transformers.integrations.DVCLiveCallback.
Examples
Log model checkpoints
Use HF_DVCLIVE_LOG_MODEL=true or log_model=True to save the checkpoints (it
will use Live.log_artifact() internally to save those).
If true, DVCLive will save a copy of the last checkpoint to the
dvclive/artifacts directory and annotate it with name last or best (if
args.load_best_model_at_end).
This is useful to be consumed in the model registry or automation scenarios.
- Save the
lastcheckpoint at the end of training:
os.environ["HF_DVCLIVE_LOG_MODEL"] = "true"
from transformers import TrainingArguments, Trainer
args = TrainingArguments(..., report_to="dvclive")
trainer = Trainer(..., args=args)- Save the
bestcheckpoint at the end of training:
os.environ["HF_DVCLIVE_LOG_MODEL"] = "true"
from transformers import TrainingArguments, Trainer
args = TrainingArguments(..., report_to="dvclive")
trainer = Trainer(..., args=args)
trainer.args.load_best_model_at_end = True- Save updates to the checkpoints directory whenever a new checkpoint is saved:
os.environ["HF_DVCLIVE_LOG_MODEL"] = "all"
from transformers import TrainingArguments, Trainer
args = TrainingArguments(..., report_to="dvclive")
trainer = Trainer(..., args=args)Passing additional DVCLive arguments
Use live to pass an existing Live instance.
from dvclive import Live
from transformers.integrations import DVCLiveCallback
with Live("custom_dir") as live:
trainer = Trainer(...)
trainer.add_callback(DVCLiveCallback(live=live))
# Log additional metrics after training
live.log_metric("summary_metric", 1.0, plot=False)Output format
Each metric will be logged to:
{Live.plots_dir}/metrics/{split}/{metric}.tsvWhere:
{Live.plots_dir}is defined inLive.{split}can be eithertrainoreval.{metric}is the name provided by the framework.