Assuming DVC is already installed, let's initialize it by
dvc init inside a Git project:
⚙️ Expand to prepare the project.
directories and files
are created that should be added to Git:
$ git status
Changes to be committed:
new file: .dvc/.gitignore
new file: .dvc/config
$ git commit -m "Initialize DVC"
DVC functionality can be split into layers and we'll explore them one by one in
the next few sections:
- Data versioning is the core part of DVC for
large files, datasets, ML models versioning and efficient sharing. We'll show
how to use a regular Git workflow, without storing large files with Git. Think
"Git for data".
- Data access shows how to use data artifacts from
outside of the project and how to import data artifacts from another DVC
project. This can help to download a specific version of an ML model to a
deployment server or import a model to another project.
- Data pipelines describe how models and other
data artifacts are built, and provide an efficient way to reproduce them.
Think "Makefiles for data and ML projects" done right.
- Experiments attach parameters, metrics, plots.
You can capture and navigate experiments not leaving Git. Think "Git for ML".