Skip to content
Edit on GitHub

Get Started

Assuming DVC is already installed, let's initialize it by running dvc init inside a Git project:

We'll be building an NLP project from scratch together. The end result is published on GitHub — feel free to clone the repo.

Let's start with git init:

$ mkdir example-get-started
$ cd example-get-started
$ git init
$ dvc init

A few internal files are created that should be added to Git:

$ git status
Changes to be committed:
        new file:   .dvc/.gitignore
        new file:   .dvc/config
        ...
$ git commit -m "Initialize DVC"

Now you're ready to DVC!

DVC's multiple feature sets are best understood from different angles. Pick a trail below to see an overview of all features from that perspective:

Data Management

  • Data and model versioning is the base layer of DVC for large files, datasets, and machine learning models. Use a standard Git workflow, but without storing large files in the repo. Data is cached by DVC, allowing for efficient sharing. Think "Git for data".

  • Data and model access shows how to bring, explore, and access data artifacts from outside the project. This can help download a specific version of an ML model to a deployment server or import a dataset into another project, for example.

  • Data pipelines describe how models and other data artifacts are built, and provide an efficient way to reproduce them. Think "Makefiles for data and ML projects" done right.

  • Metrics, parameters, and plots can be attached to pipelines. These let you capture, evaluate, and visualize ML projects without leaving Git.

Experiment Management

  • Experiments enable exploration, iteration, and comparison across many ML experiments. Track your experiments with automatic versioning and checkpoint logging. Compare differences in parameters, metrics, code, and data. Apply, drop, roll back, resume, or share any experiment.

  • Visualization helps you compare experiment results visually, track your plots, and generate them with library integrations.

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

Have a question? Join our chat, we will help you:

Discord Chat