Skip to content

End-to-End Computer Vision API, Part 3: Remote Experiments & CI/CD For Machine Learning

This is the last part of a three-part series of posts:

In part 1, we talked about exploratory work in Jupyter Notebooks; versioning data in remote storage with DVC; and refactoring the code from Jupyter Notebooks into DVC pipeline stages.

Part 2 talked about the process of managing experiments with DVC pipelines, DVCLive and Iterative Studio.

In this final part, we will focus on leveraging cloud infrastructure with CML; enabling automatic reporting (graphs, images, reports and tables with performance metrics) for PRs; and the eventual deployment process.

  • Alex Kim
  • May 09, 20229 min read
Hero Picture

Leveraging Cloud Resources with CI/CD and CML

If you use the CML library in combination with CI/CD tools like GitHub Actions or GitLab CI/CD, you can quickly and easily:

  1. provision a powerful virtual machine (VM) in the cloud as training Computer Vision (CV) models often requires powerful GPUs rarely available on local machines
  2. submit your ML training job to it
  3. save the results (metrics, models and other training artifacts)
  4. automatically shut down the VM without having to worry about excessive cloud bills

Continuous Integration and Deployment for Machine Learning Continuous Integration and Deployment for Machine Learning

We've configured three workflow files for GitHub Actions, each of which corresponds to a particular stage depending on the project's lifecycle we are in:

1. Workflow for experimentation and hyperparameter tuning

Workflow for experimentation and hyperparameter tuning Workflow for experimentation and hyperparameter tuning In this stage, we'll create an experiment branch so that can experiment with data preprocessing, change model architecture, tune hyperparameters, etc. Once we think our experiment is ready to be run, we'll push our changes to a remote repository (in this case, GitHub). This push will trigger a CI/CD job in GitHub Actions, which in turn will:

a) provision an EC2 virtual machine with a GPU in AWS:

- name: Deploy runner on AWS EC2
  run: |
    cml runner \
        --cloud=aws \
        --cloud-region=us-east-1 \
        --cloud-type=g4dn.xlarge \

b) deploy our experiment branch to a Docker container on this machine:

  needs: deploy-runner
  runs-on: [self-hosted, cml-runner]
    image: iterativeai/cml:0-dvc2-base1
    options: --gpus all
  environment: cloud
    contents: read
    id-token: write
    - uses: actions/checkout@v2

c) rerun the entire DVC pipeline and push metrics back to GitHub:

- name: dvc-repro-cml
  run: |
    # Install dependencies
    pipenv install --skip-lock
    pipenv run dvc pull
    pipenv run dvc exp run
    pipenv run dvc push

d) open a pull request and post a report to it that contains a table with metrics and model outputs on a few test images:

# Open a pull request
cml pr dvc.lock metrics.json training_metrics.json training_metrics_dvc_plots/**
# Create CML report
echo "## Metrics" >
pipenv run dvc metrics show --md >>
echo "## A few random test images" >>
for file in $(ls data/test_preds/ | sort -R | tail -20); do
  cml publish data/test_preds/$file --md >>
cml send-comment --pr --update

The report structure is fully customizable. Below is an example of what the PR and the CML report would look like in this case. The test images show (from left to right) input images, ground truth masks and prediction masks.

PR and CML report PR and CML report

At this point, we can assess the results in Iterative Studio and GitHub and decide whether we want to accept the PR or keep experimenting.

2. Workflow for deploying to the development environment

Workflow for deploying to the development environment Workflow for deploying to the development environment Once we are happy with our model's performance on the experiment branch, we can merge it into the development branch. This would trigger a different CI/CD job that will:

a) retrain the model if the dev branch contains changes not present in the exp branch. DVC will skip this stage if that's not the case. This step looks almost identical to step (1.c) above (rerunning the pipeline & reporting metrics on GitHub) in the above workflow.

b) deploy the web REST API application (that incorporates the new model) to a development endpoint on Heroku:

  needs: train-and-push
  runs-on: ubuntu-latest
    - uses: actions/checkout@v2
    - uses: actions/download-artifact@master
        name: model_pickle
        path: models
    - uses: akhileshns/heroku-deploy@v3.12.12
        heroku_api_key: ${{secrets.HEROKU_API_KEY}}
        heroku_app_name: demo-api-mag-tiles-dev
        heroku_email: ''
        team: iterative-sandbox
        usedocker: true

The development endpoint is now accessible at (note -dev),

and we can use it to assess the end-to-end performance of the overall solution. If we pick a random test image exp3_num_258558.jpg, Test image exp3_num_258558.jpg Test image exp3_num_258558.jpg

we can send it to the endpoint using the curl command like this:

$ curl -F 'image=@data/MAGNETIC_TILE_SURFACE_DEFECTS/test_images/exp3_num_258558.jpg' \

This will return some http-header info and the body of the response containing the defect segmentation mask (0 for pixel locations without defects and 1 otherwise):

*   Trying
* Connected to ( port 443 (#0)

Alternatively, we can do a similar thing with a Python script that also saves the output mask into a exp3_num_258558_mask.png image:

import json
from pathlib import Path

import as cm
import matplotlib.pyplot as plt
import numpy as np
import requests

url = ''
file_path = Path(
files = {'image': (str(file_path), open(file_path, 'rb'), "image/jpeg")}
response =, files=files)
data = json.loads(response.content)
pred = np.array(data['pred'])
plt.imsave(f'{file_path.stem}_mask.png', pred, cmap=cm.gray)

Below you can see what this mask looks like. Output mask exp3_num_258558_mask.png Output mask exp3_num_258558_mask.png

Before we merge the dev branch into the main branch, we would need to thoroughly test and monitor the application in the development environment. A good test could be duplicating real image requests to the dev endpoint for some period of time and assess the quality of the returned segmentation masks.

3. Workflow for deploying to the production environment

Workflow for deploying to the production environment Workflow for deploying to the production environment

If there are no issues and we are confident in the quality of the new model, we can merge the development branch into the main branch of our repository. Again, this triggers the third CI/CD workflow that deploys the code from the main branch to the production API. This looks identical to the deployment into the development environment, except now the deployment endpoint will be (note -prod).


In this series of posts (see Part 1 and Part 2), we described how we addressed the problem of building a Computer Vision Web API for defect detection. We’ve chosen this approach because it addresses the common challenges that are shared across many CV projects: how to version datasets that consist of a large number of small- to medium-sized files; how to avoid triggering long-running stages of an ML pipeline when it’s not needed for reproducibility; how to run model training jobs on the cloud infrastructure without having to provision and manage everything yourself; and, finally, how to track progress in key metrics when you run many ML experiments.

We've talked about the following:

  • Common difficulties when building Computer Vision Web API for defect detection (link)
  • Pros and cons of exploratory work in Jupyter Notebooks (link)
  • Versioning data in remote storage with DVC (link)
  • Moving and refactoring the code from Jupyter Notebooks into DVC pipeline stages (link)
  • Experiment management and versioning (link)
  • Visualization of experiments and collaboration in Iterative Studio (link)
  • Remote experiments, CI/CD, and production deployment (this post)

What to Try Next

Missed the previous parts of this post? See Part 1: Data Versioning and ML Pipelines and Part 2: Local Experiments.

  • Reproduce this solution by setting your own configs, tokens, and access keys for GitHub, AWS, and Heroku
  • Add a check to merge PRs automatically if the metrics have improved
  • Add a few simple unit tests and insert them into CML workflow files so they run before reproducing the pipeline
  • Apply this approach to a different Computer Vision problem using a different dataset or different problem type (image classification, object detection, etc.)
Subscribe for updates. We won't spam you.