Edit on GitHub

Get Started with DVC

Assuming DVC is already installed, let's initialize it by running dvc init inside a Git project:

Imagine we want to building an ML project from scratch. Let's start by creating a Git repository:

$ mkdir example-get-started
$ cd example-get-started
$ git init

This directory name is actually used in our example-get-started repo.

$ dvc init

A few internal files are created that should be added to Git:

$ git status
Changes to be committed:
        new file:   .dvc/.gitignore
        new file:   .dvc/config
        ...
$ git commit -m "Initialize DVC"

Now you're ready to DVC!

The value of DVC's several feature sets is best understood from different angles. Pick one of the two trails below to learn about DVC from that perspective:

Data Management Trail

  • Data and model versioning is the base layer of DVC for large files, datasets, and machine learning models. Use a standard Git workflow, but without storing large files in the repo. Data is cached by DVC, allowing for efficient sharing. Think "Git for data".

  • Data and model access goes over using data artifacts from outside of the project and importing them from another DVC project. This can help to download a specific version of an ML model to a deployment server or import a dataset into another project.

  • Data pipelines describe how models and other data artifacts are built, and provide an efficient way to reproduce them. Think "Makefiles for data and ML projects" done right.

  • Metrics, parameters, and plots can be attached to pipelines. These let you capture, evaluate, and visualize ML projects without leaving Git.

The steps and results of some of these chapters are captured in our example-get-started repo. Feel free to git clone/checkout any of its tags.

Experiment Management Trail

  • Experiments enable exploration, iteration, and comparison across many trials in ML projects. Track your experiments with automatic versioning and checkpoint logging. Compare differences in parameters, metrics, code, and data. Apply, drop, roll back, resume, or share any experiment.

  • Visualization helps you compare experiment results visually, track your plots, and generate them with library integrations.

These are captured in our example-dvc-experiments repo (see its tags).

Following the Get Started

Each page in the trails above is more or less independent, especially if you're only reading them to get a general idea of the features in question. For better learning, try each step yourself from the beginning of any trail. Some of the preparation steps may be inside collapsed sections you can click on to expand:

Click the header again to collapse this message. Or move on by picking a page from the list above, left-side navigation, or just click NEXT below!

Content

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

Have a question? Join our chat, we will help you:

Discord Chat