Check out our new VS Code extension for experiment tracking and model development
Assuming DVC is already installed, let's initialize it by
running dvc init
inside a Git project:
We'll be building an NLP project from scratch together. The end result is published on GitHub — feel free to clone the repo.
Let's start with git init
:
$ mkdir example-get-started
$ cd example-get-started
$ git init
$ dvc init
A few internal files are created that should be added to Git:
$ git status
Changes to be committed:
new file: .dvc/.gitignore
new file: .dvc/config
...
$ git commit -m "Initialize DVC"
Now you're ready to DVC!
New! Once you set up a DVC project, you can work on it from the VS Code IDE or online with Iterative Studio, the web UI that integrates all of our data science tools. Check out this live demo!
DVC's features can be grouped into functional components. You can explore them in two independent trails:
Data and model versioning (try this next) is the base layer of DVC for large files, datasets, and machine learning models. Use a regular Git workflow, but without storing large files in the repo (think "Git for data"). Data is stored separately, which allows for efficient sharing.
Data and model access shows how to use data artifacts from outside of the project and how to import data artifacts from another DVC project. This can help to download a specific version of an ML model to a deployment server or import a model to another project.
Data pipelines describe how models and other data artifacts are built, and provide an efficient way to reproduce them. Think "Makefiles for data and ML projects" done right.
Metrics, parameters, and plots can be attached to pipelines. These let you capture, navigate, and evaluate ML projects without leaving Git. Think "Git for machine learning".
Experiments enable exploration, iteration, and comparison across many ML experiments. Track your experiments with automatic versioning and checkpoint logging. Compare differences in parameters, metrics, code, and data. Apply, drop, roll back, resume, or share any experiment.
Visualization compare experiment results visually, track your plots and generate them with library integrations.