Assuming DVC is already installed, let's initialize it by
dvc init inside a Git project:
⚙️ Expand to prepare the project.
A few internal files are
created that should be added to Git:
$ git status
Changes to be committed:
new file: .dvc/.gitignore
new file: .dvc/config
$ git commit -m "Initialize DVC"
Now you're ready to DVC!
DVC's features can be grouped into functional components. We'll explore them one
by one in the next few pages:
- Data and model versioning (try
this next) is the base layer of DVC for large files, datasets, and machine
learning models. Use a regular Git workflow, but without storing large files
in the repo (think "Git for data"). Data is stored separately, which allows
for efficient sharing.
- Data and model access shows how to use
data artifacts from outside of the project and how to import data artifacts
from another DVC project. This can help to download a specific version of an
ML model to a deployment server or import a model to another project.
- Data pipelines describe how models and other
data artifacts are built, and provide an efficient way to reproduce them.
Think "Makefiles for data and ML projects" done right.
- Metrics, parameters, and plots can
be attached to pipelines. These let you capture, navigate, and evaluate ML
projects without leaving Git. Think "Git for machine learning".
- Experiments enable exploration, iteration, and
comparison across many ML experiments. Track your experiments with automatic
versioning and checkpoint logging. Compare differences in parameters, metrics,
code, and data. Apply, drop, roll back, resume, or share any experiment.