Back to blogs

Transforming a Jupyter Notebook into a Reproducible Pipeline for Experiments with DVC

This blog post is an adaptation of Rob De Wit’s presentation on the subject using his Pokémon Generator project at PyData USA 2023. You can find the video here and the repo of the project here.

  • Rob de Wit
  • October 31, 20256 min read
Hero Picture

Learn how to transform your Jupyter Notebook prototype into a production-ready DVC pipeline.

When we experiment with machine learning models, it’s easy to get lost in the cycle of trying new parameters, swapping datasets, or adjusting architectures. That’s how progress is made, but without structure, reproducibility, and tracking, you risk losing valuable results or being unable to explain why a model worked (or failed).

In this post, I use a Pokémon generator I created with LoRA (Low-Rank Adaptation of Large Language Models) to demonstrate how I approach turning one-off prototypes into structured, reproducible pipelines using versioned data, parameters, and experiments.

Pokémon image generated with LoRA

Why Reproducibility Matters

Reproducibility is the backbone of science, and machine learning is no different. A reproducible experiment means:

  • The same combination of data, code, and parameters produces the same result.
  • You can trace back decisions: which dataset, which hyperparameters, which preprocessing steps.
  • Collaboration can be acheived. Your colleagues (and your future self!) can understand and build upon your work.

When I worked on earlier projects, we often had to reconstruct models after the fact, trying to remember what went into training them. That experience convinced me that reproducibility should be built into every pipeline from day one. That’s why I treat every experiment as a deterministic combination of code + data + parameters, and I build pipelines that make this explicit.

Moving from Jupyter Notebook to a Reproducible Pipeline

Jupyter notebooks are a great tool for prototyping data science projects, but not the best go-to when needing to reproduce your results. Part of the greatness is the ability to easily change cells and re-run sections, visualize data in-line with code, and share analysis with narrative text. But on the flip side, these benefits can lead to a breakdown in your ability to accurately reproduce results. Notebooks are also challenging to test, scale, and manage dependencies. So how can we set up our pipeline for reproducible success?

Git plus DVC

Enter DVC

If you’ve ever tried to manage large files with Git, you have come to realize that Git in and of itself is not sufficient. DVC operates like Git, but for large data, models, and your machine learning experimentation process, versioning everything along with the code. Let’s see how this works under the hood.

How DVC tracks your Data

DVC dataset tracking

In your Git repository, you have a main branch with commits and a branch with a dataset, in this case a dataset of Pokémon images.

DVC metadata

As image data are large, we do not want to keep them in Git so DVC replaces them with a metadata file. The metadata for the dataset contains the hash, the size, number of files, and other data.

DVC dataset hash

The hash in Git points to the .dvc/cache, which is where the physical images are actually stored on your file system.

DVC new dataset hash

If you create another commit with a different dataset (noted by a different font on the left in the Git repo). A new hash will point to the new dataset in the .dvc/cache. In this case, one image was removed and one added with two staying the same.

How a DVC Pipeline Works

Below you will find the sections of the Jupyter Notebook on the left. Each of these stages produces outputs as seen on the right.

Machine learning stages and outputs

As these are now specified, they can be used as downstream dependencies for other stages.

Pipeline dependencies

So if the train_lora stage is dependent on the processed images, we can ensure that the stage only triggers once there are new images in the processed directory. Additionally, if we make a change in the train_lora stage, none of the previous stages that were not changed will need to be run with DVC, saving you development time.

How Tracking Experiments Work with Git and DVC

In addition to your data and pipeline, DVC can version your experiments along with Git. We use DVC for larger files and Git for the smaller files.

Experiment tracking with Git and DVC

All of these things together represent an experiment and can be recorded as a git commit with a hash. This way this experiment and all its modifications will be able to be reproduced using a git checkout and dvc checkout with its hash (See experiment hash noted at bottom.)

Each experiment can receive a hash

Converting from a Jupyter Notebook to a DVC Project

We will set up our Pokémon generator as a DVC project.

Building the Pipeline

Here’s the approach I’ve taken to bring structure into experimentation:

  1. Start with a Base Model. Use an off-the-shelf model as your foundation. Fine-tune it, adapt it, and make it your own, but always know what version you started from.
  2. Track Everything. Every dataset, parameter, and code change should be versioned. We can use DVC for this. Think of it like Git for your machine learning workflow: commits that point not just to code, but to data and model states.
  3. Modularize the Workflow. Break experiments into stages: data prep, training, evaluation, etc. That way, you can rerun only what changes instead of starting from scratch every time.
  4. Run Reproducible Experiments. Each experiment should be captured so you can roll back, compare results, and build confidence in the best-performing model.
  5. Move Toward Production. Once an experiment proves itself, package it into a pipeline that can run with a single command. That pipeline is what bridges the gap between “something interesting in a notebook” and “a reliable system in production.”

Step 1: Define the Pipeline

Start by breaking your workflow into stages. For example, in this project the dvc.yaml looks like this:

stages:
  set_up_diffusers:
    cmd: |
      git clone --depth 1 --branch v0.14.0 https://github.com/huggingface/diffusers.git diffusers
      pip3.10 install -r "diffusers/examples/dreambooth/requirements.txt"
      accelerate config default
    outs:
      - diffusers:
          cache: false
  scrape_pokemon_images:
    cmd: python3 src/scrape_pokemon_images.py --params params.yaml
    deps:
      - src/scrape_pokemon_images.py
    outs:
      - data/external/pokemon
  download_pokemon_stats:
    cmd:
      kaggle datasets download -d brdata/complete-pokemon-dataset-gen-iiv -f
      Pokedex_Cleaned.csv -p data/external/
    outs:
      - data/external/Pokedex_Cleaned.csv
  resize_pokemon_images:
    cmd: python3 src/resize_pokemon_images.py --params params.yaml
    deps:
      - src/resize_pokemon_images.py
      - data/external/pokemon
      - data/external/Pokedex_Cleaned.csv
    outs:
      - data/processed/pokemon
    params:
      - base
      - data_etl
  train_lora:
    cmd: >
      accelerate launch --mps
      "diffusers/examples/dreambooth/train_dreambooth_lora.py"
      --pretrained_model_name_or_path=${train_lora.base_model}
      --instance_data_dir=${data_etl.train_data_path}
      --output_dir=${train_lora.lora_path} --instance_prompt='a pkmnlora
      pokemon' --resolution=512 --train_batch_size=1
      --gradient_accumulation_steps=1 --checkpointing_steps=500
      --learning_rate=${train_lora.learning_rate} --lr_scheduler='cosine'
      --lr_warmup_steps=0 --max_train_steps=${train_lora.max_train_steps}
      --seed=${train_lora.seed}
    deps:
      - diffusers
      - data/processed/pokemon
    outs:
      - models/pkmnlora
    params:
      - data_etl
      - train_lora
  generate_text_to_image:
    cmd: python3 src/generate_text_to_image.py --params params.yaml
    outs:
      - outputs
    deps:
      - src/generate_text_to_image.py
      - models/pkmnlora
    params:
      - train_lora
      - generate_text_to_image

Each stage declares:

  • Command (cmd) – what to run
  • Dependencies (deps) – inputs the stage needs
  • Outputs (outs) – files the stage produces. This way, when you change a dependency (e.g., a new dataset or updated parameter), only the affected stages re-run.

Step 2: Track Parameters

Instead of hardcoding hyperparameters, keep them in a structured file like params.yaml:

base:
  train_pokemon_type: all

data_etl:
  external_data_path: 'data/external/'
  train_data_path: 'data/processed/pokemon'

train_lora:
  seed: 1337
  model_directory: 'models'
  base_model: 'runwayml/stable-diffusion-v1-5'
  lora_path: 'models/pkmnlora'
  learning_rate: 0.0001
  max_train_steps: 15000

generate_text_to_image:
  seed: 3000
  num_inference_steps: 35
  batch_size: 1
  batch_count: 20
  prompt: 'a pkmnlora pokemon'
  negative_prompt: ''
  output_directory: 'outputs'
  use_lora: True

Now you can run controlled experiments:

$ dvc exp run -S training.learning_rate=0.01

This will execute the pipeline with the updated parameter, track the run, and save results.

Step 3: Track Experiments

For this Pokémon project, it’s not as relevant because the results are images with subjective grading. But with projects where you’re tracking metrics, with pipelines defined and parameters externalized, you can now compare experiments systematically:

$ dvc exp show

Example output:

Experimenttrain.learning_ratetrain.epochsAccuracyLoss
baseline0.001100.820.41
exp-12340.01100.850.37
exp-56780.001200.840.39

This makes it easy to see how parameter changes affect performance—without losing reproducibility.

Step 4: Move Toward Production

Once you’re confident in a pipeline:

  1. Lock the configuration – commit your dvc.yaml and params.yaml.
  2. Version your data – every dataset version is tracked (no guessing which CSV was used).
  3. Promote a model – move the best checkpoint into a production/ folder or model registry. Then your entire workflow can be reproduced with a single command:
$ dvc repro

That runs the whole pipeline—data prep, training, evaluation—with the exact same inputs and parameters.

Lessons Learned

  • Reproducibility = productivity. You spend less time debugging “mystery results.”
  • Experiment tracking is collaborative. Colleagues can see exactly what you tried, what worked, and what didn’t.
  • Pipelines scale. What starts as a notebook prototype can evolve into a production-ready workflow.

Final Thoughts

Experimentation will always be messy—but pipelines don’t have to be. By structuring workflows into reproducible pipelines, you get the freedom to explore while ensuring you can always reproduce and explain your results. If you’d like to try this yourself, check out the example pipeline repo and the docs for more info on building workflows specific to your project.


📰 Join our Newsletter to stay up to date with news and contributions from the Community!

Back to blogs