$ dvc init
A few internal files are created that should be added to Git:
$ git status Changes to be committed: new file: .dvc/.gitignore new file: .dvc/config ... $ git commit -m "Initialize DVC"
Now you're ready to DVC!
The value of DVC's several feature sets is best understood from different angles. Pick one of the two trails below to learn about DVC from that perspective:
Data and model versioning is the base layer of DVC for large files, datasets, and machine learning models. Use a standard Git workflow, but without storing large files in the repo. Data is cached by DVC, allowing for efficient sharing. Think "Git for data".
Data and model access goes over using data artifacts from outside of the project and importing them from another DVC project. This can help to download a specific version of an ML model to a deployment server or import a dataset into another project.
Data pipelines describe how models and other data artifacts are built, and provide an efficient way to reproduce them. Think "Makefiles for data and ML projects" done right.
Metrics, parameters, and plots can be attached to pipelines. These let you capture, evaluate, and visualize ML projects without leaving Git.
Experiments enable exploration, iteration, and comparison across many trials in ML projects. Track your experiments with automatic versioning and checkpoint logging. Compare differences in parameters, metrics, code, and data. Apply, drop, roll back, resume, or share any experiment.
Visualization helps you compare experiment results visually, track your plots, and generate them with library integrations.
Each page in the trails above is more or less independent, especially if you're only reading them to get a general idea of the features in question. For better learning, try each step yourself from the beginning of any trail. Some of the preparation steps may be inside collapsed sections you can click on to expand: