Before we begin, settle on a directory for this guide. Everything we will do will be self contained there.
Inside your chosen directory, we will use our current working directory as a
DVC project. Let's initialize it by running
dvc init inside a Git
$ dvc init
A few internal files are created that should be added to Git:
$ git status Changes to be committed: new file: .dvc/.gitignore new file: .dvc/config ... $ git commit -m "Initialize DVC"
Now you're ready to DVC!
To help you understand and use DVC better, consider the following three use-cases: data management, experiment tracking and model management. You may pick any to start learning about how DVC helps you "solve" that scenario!
Choose a trail to jump into its first chapter:
Data Management - Track and version large amounts of data along with your code, and use DVC as a build system for reproducible, data driven pipelines.
Experiment Management - Easily track your experiments and their progress by only instrumenting your code, and collaborate on ML experiments like software engineers do for code.
Model Management - Use the DVC model registry to manage the lifecycle of your models in an auditable way. Easily access your models and integrate your model registry actions into CICD pipelines to follow GitOps best practices.
Feel free to "choose your own adventure" and follow the chapters which answer your specific needs. In case you're unsure where to start, we recommend starting with data management.