Edit on GitHub

Get Started: Data Management


  • Data and model versioning - Manage large files, datasets, and machine learning models. Track your data and couple its versions to your code versions, while keeping it stored properly outside of your Git repo.

  • Data pipelines - Use pipelines to describe how models and other data artifacts are built, and provide an efficient way to reproduce them. Think "Makefiles for data and ML projects" done right.

  • Metrics, parameters, and plots - These are first class citizens in DVC pipelines. Capture, evaluate, and visualize ML projects without leaving Git.

The steps and results of some of these chapters are captured in our example-get-started repo. Feel free to git clone/checkout any of its tags.

Where To Go Next

Pick a page from the list above, the left-side navigation bar, or just click NEXT below!

Click here to jump back to the Get Started landing page.


๐Ÿ› Found an issue? Let us know! Or fix it:

Edit on GitHub

โ“ Have a question? Join our chat, we will help you:

Discord Chat