Assuming DVC is already installed, let's initialize it by
dvc init inside a Git project:
⚙️ Expand to prepare the project.
directories and files
are created that should be added to Git:
$ git status
Changes to be committed:
new file: .dvc/.gitignore
new file: .dvc/config
$ git commit -m "Initialize DVC"
DVC features can be grouped into functional components. We'll explore them one
by one in the next few sections:
- Data versioning is the base layer of DVC for
large files, datasets, and machine learning models. It looks like a regular
Git workflow, but without storing large files in the repo (think "Git for
data"). Data is stored separately, which allows for efficient sharing.
- Data access shows how to use data artifacts from
outside of the project and how to import data artifacts from another DVC
project. This can help to download a specific version of an ML model to a
deployment server or import a model to another project.
- Data pipelines describe how models and other
data artifacts are built, and provide an efficient way to reproduce them.
Think "Makefiles for data and ML projects" done right.
- Experiments attach parameters, metrics, plots.
You can capture and navigate experiments without leaving Git. Think "Git for