One of the main uses of DVC repositories is the versioning of data and model files. DVC also enables cross-project reusability of these data artifacts. This means that your projects can depend on data from other repositories — like a package management system for data science.
We can build a DVC project dedicated to versioning datasets (or data features, ML models, etc.). The repository contains the necessary metadata, as well as the entire change history. The data itself is stored in one or more DVC remotes. This is what we call a data registry — data management middleware between ML projects and cloud storage. Advantages:
dvc importcommands, similar to software package management like
👩💻 Intrigued? Try our registry tutorial to learn how DVC looks and feels firsthand.
See also Model Registry.