Just as we use experiment tracking to manage model development, it is a good idea to keep a model registry to manage the lifecycle of the models we get from our experiments. Using DVC and DVC Studio we will set up a model registry where we can discover, share, deploy and audit all our models and which will serve as the single source of truth for our model management. If you prefer to see how to start managing models in DVC Studio as quickly as possible without walking through an example, see Manage Models.
Behind the scenes, DVC Studio uses a command line tool called GTO for most model registry actions.
With GTO you can also set up the model registry locally without DVC Studio. You can see how this is done in the expandable "Under the hood" sections in this chapter.
Let's now train a model and add it to the model registry. We will be using DVCLive and add a model using Python code. This will also automatically save the model to DVC.
We use the
log_artifact method to
cache the model with DVC and add it to the the model registry. Open
the training script
src/train.py in our example repository and have a look at
the following code under the
with Live(...) context:
with Live(...) as live: ... live.log_artifact( path="models/model.pkl", type="model", name="pool-segmentation", desc="This is a Computer Vision (CV) model that's segmenting out swimming pools from satellite images.", labels=["cv", "segmentation", "satellite-images"], )
path parameter tells DVC that our model is to be found under
type parameter is
"model" and so it will show up
in DVC Studio as a model type artifact (you can show other artifact types using
filters, but they are hidden by default). The rest of the parameters are
descriptive and optional and will also show up in DVC Studio.
Now we just need to run the Python script which includes this code to cache and register the model. If you are following our example repository then this has already been done and we can continue to the next section.
If you are building your own repository, you will need to run the script and push the result to your Git remote (e.g., GitHub) yourself.
In this guide, we will be using DVC Studio to manage our model registry. DVC Studio enables you to see models across all projects, manage their lifecycle, and download them with only a token. You can find out more about it here.
From the Models tab in DVC Studio we will have an overview of all models, latest model versions as well stages each of the model versions is assigned to. We can get more details for each model by clicking on the model name.
You can check out our example model in DVC Studio to see what it will look like once we finish all the steps in this guide.
Now that we have added a model, you should see something like the following
picture in DVC Studio if you go to the Models tab and then select the
You can also see the state of the project at this point captured in our example repository.
Now that we have our first model in the model registry, we can start registering model versions for the model. We do it by choosing a specific commit in our model development history and attaching a version to it to make it easier to keep track of it. You can now do that directly in the DVC Studio UI as follows.
Since we saved our model to DVC and added it to the model registry in the latest commit, we can just keep the commit which was selected by DVC Studio automatically. We will also keep the suggested version number v1.0.0.
For more details and other ways of registering model versions you can have a look at the corresponding documentation.
Once we register our first model version, DVC Studio will also automatically connect it to experiment tracking and all metrics which are tracked there will also show up in the model registry for each model version.
We have a first version for our model and now it is a good time to assign a model lifecycle stage to it. You can create any number of lifecycle stages with any names you wish but in this example we will only create two stages called "dev" and "prod".
Stages are created whenever a model version is assigned to them. You can now assign the model version 1.0.0 to the "dev" stage as follows.
When we assign the model to a stage, it can automatically trigger actions in our CICD workflows, like deploying the model to a new environment (we will explore how this is done in the Using and Deploying models chapter).
Let's say that we've decided to promote our model version 1.0.0 to production and denote that it is no longer in the "dev" stage. First, assign the model version to the "prod" stage just like we did with the "dev" stage in the previous section.
Now, to remove the "dev" stage from our model version 1.0.0 and assign it only to "prod", follow these steps:
It is also possible to de-register model versions or deprecate and remove models from the registry entirely. To see how, have a look at the documentation.
The detailed view of our model in the registry should now match what we see in our example.
Every action we performed in our model registry leaves a trace so that the model history can be audited. If you now look at the model details page of our model, you should see something like this:
As we noted above, DVC uses special Git tags to keep track of model registry actions, so all of this history is actually stored directly in your Git repository. DVC Studio can parse these tags and show them to us in a user-friendly way.
If you look at the tags in our example repository, you can see that all the model registry actions that we performed are captured by such tags.