Assuming DVC is already installed, let's initialize it by
dvc init inside a Git project:
⚙️ Expand to prepare the project.
A few directories and files are created that
should be added to Git:
$ git status
Changes to be committed:
new file: .dvc/.gitignore
new file: .dvc/config
$ git commit -m "Initialize DVC"
Now you're ready to DVC!
DVC's features can be grouped into functional components. We'll explore them one
by one in the next few pages:
- Data versioning (try this next) is the base
layer of DVC for large files, datasets, and machine learning models. Use a
regular Git workflow, but without storing large files in the repo (think "Git
for data"). Data is stored separately, which allows for efficient sharing.
- Data access shows how to use data artifacts from
outside of the project and how to import data artifacts from another DVC
project. This can help to download a specific version of an ML model to a
deployment server or import a model to another project.
- Data pipelines describe how models and other
data artifacts are built, and provide an efficient way to reproduce them.
Think "Makefiles for data and ML projects" done right.
- Experiments attach parameters, metrics, plots.
You can capture and navigate experiments without leaving Git. Think "Git for