A model registry is a tool to catalog ML models and their versions. Models from your data science projects can be discovered, tested, shared, deployed, and audited from there. DVC, GTO, and MLEM enable these capabilities on top of Git, so you can stick to an existing software engineering stack. No more divide between ML engineering and operations!
ML model registries give your team key capabilities:
Many of these benefits are built into DVC: Your modeling process and performance data become codified in Git-based DVC repositories, making it possible to reproduce and manage models with standard Git workflows (along with code). Large model files are stored separately and efficiently, and can be pushed to remote storage — a scalable access point for sharing.
See also Data Registry.
To make a Git-native registry (on top of DVC or not), one option is to use GTO (Git Tag Ops). It tags ML model releases and promotions, and links them to artifacts in the repo using versioned annotations. This creates abstractions for your models, which lets you manage their lifecycle freely and directly from Git.
And to productionize the models, you can save and package them with the MLEM Python API or CLI, which automagically captures all the context needed to distribute them. It can store model files on the cloud (by itself or with DVC), list and transfer them within locations, wrap them as a local REST server, or even containerize and deploy them to cloud providers!
This ecosystem of tools from Iterative brings your ML process into GitOps. This means you can manage and deliver ML models with software engineering methods such as continuous integration (CI/CD), which can sync with the state of the artifacts in your registry.