Edit on GitHub

params

Contains a command to show changes in parameters: diff.

Synopsis

usage: dvc params [-h] [-q | -v] {diff} ...

positional arguments:
  COMMAND
    diff         Show changes in params between commits in the
                 DVC repository, or between a commit and the workspace.

Description

In order to track parameters and hyperparameters associated to machine learning experiments in DVC projects, DVC provides a different type of dependencies: parameters. They usually have simple names like epochs, learning-rate, batch_size, etc.

To start tracking parameters, list them under the params field of dvc.yaml stages (manually or with the the -p/--params option of dvc run). For example:

stages:
  learn:
    cmd: ./deep.py
    params:
      - epochs
      - tuning.learning-rate
      - myparams.toml:
          - batch_size

In contrast to a regular dependency, a parameter dependency is not a file or directory. Instead, it consists of a parameter name (or key) in a parameters file, where the parameter value should be found. This allows you to define stage dependencies more granularly: changes to other parts of the params file will not affect the stage. Parameter dependencies also prevent situations where several stages share a regular dependency (e.g. a config file), and any change in it invalidates all of them (see dvc status), causing unnecessary re-executions upon dvc repro.

The default parameters file name is params.yaml, but any other YAML 1.2, JSON, TOML, or Python files can be used additionally (listed under params: with a sub-list of param values, as shown in the sample above) . These files are typically written manually (or they can be generated) and they can be versioned directly with Git.

Parameter values should be organized in tree-like hierarchies (dictionaries) inside params files (see Examples). DVC will interpret param names as the tree path to find those values. Supported types are: string, integer, float, and arrays (groups of params). Note that DVC does not ascribe any specific meaning to these values.

DVC saves parameter names and values to dvc.lock in order to track them over time. They will be compared to the latest params files to determine if the stage is outdated upon dvc repro (or dvc status).

Note that DVC does not pass the parameter values to stage commands. The commands executed by DVC will have to load and parse the parameters file by itself.

The dvc params diff command is available to show parameter changes, displaying their current and previous values.

๐Ÿ’ก Parameters can also be used for templating dvc.yaml itself.

Options

  • -h, --help - prints the usage/help message, and exit.
  • -q, --quiet - do not write anything to standard output.
  • -v, --verbose - displays detailed tracing information.

Examples

First, let's create a simple parameters file in YAML format, using the default file name params.yaml:

lr: 0.0041

train:
  epochs: 70
  layers: 9

process:
  thresh: 0.98
  bow: 15000

Using dvc run, define a stage that depends on params lr, layers, and epochs from the params file above. Full paths should be used to specify layers and epochs from the train group:

$ dvc run -n train -d train.py -d users.csv -o model.pkl \
          -p lr,train.epochs,train.layers \
          python train.py

Note that we could use the same parameter addressing with JSON, TOML, or Python parameters files.

The train.py script will have some code to parse and load the needed parameters. For example:

import yaml

with open("params.yaml", 'r') as fd:
    params = yaml.safe_load(fd)

lr = params['lr']
epochs = params['train']['epochs']
layers = params['train']['layers']

You can find that each parameter was defined in dvc.yaml, as well as saved to dvc.lock along with the values. These are compared to the params files when dvc repro is used, to determine if the parameter dependency has changed.

# dvc.yaml
stages:
  train:
    cmd: python train.py
    deps:
      - users.csv
    params:
      - lr
      - train.epochs
      - train.layers
    outs:
      - model.pkl

Alternatively, the entire group of parameters train can be referenced, instead of specifying each of the params separately:

$ dvc run -n train -d train.py -d users.csv -o model.pkl \
          -p lr,train \
          python train.py
# in dvc.yaml
params:
  - lr
  - train

In the examples above, the default parameters file name params.yaml was used. Note that this file name can be redefined using a prefix in the -p argument of dvc run. In our case:

$ dvc run -n train -d train.py -d logs/ -o users.csv -f \
          -p parse_params.yaml:threshold,classes_num \
          python train.py

Examples: Print all parameters

Following the previous example, we can use dvc params diff to list all of the param values available in the workspace:

$ dvc params diff
Path         Param           Old    New
params.yaml  lr              โ€”      0.0041
params.yaml  process.bow     โ€”      15000
params.yaml  process.thresh  โ€”      0.98
params.yaml  train.epochs    โ€”      70
params.yaml  train.layers    โ€”      9

This command shows the difference in parameters between the workspace and the last committed version of the params.yaml file. In our example there's no previous version, which is why all Old values are โ€”.

Examples: Python parameters file

Consider this Python parameters file named params.py:

# All standard variable types are supported.
BOOL = True
INT = 5
FLOAT = 0.001
STR = 'abc'
DICT = {'a': 1, 'b': 2}
LIST = [1, 2, 3]
SET = {4, 5, 6}
TUPLE = (10, 100)
NONE = None

# DVC can retrieve class constants and variables defined in __init__
class TrainConfig:

    EPOCHS = 70

    def __init__(self):
        self.layers = 5
        self.layers = 9  # TrainConfig.layers param will be 9
        self.sum = 1 + 2  # Will NOT be found due to the expression
        bar = 3  # Will NOT be found since it's locally scoped


class TestConfig:

    TEST_DIR = 'path'
    METRICS = ['metric']

The following stage depends on params BOOL, INT, as well as TrainConfig's EPOCHS and layers:

$ dvc run -n train -d train.py -d users.csv -o model.pkl \
          -p params.py:BOOL,INT,TrainConfig.EPOCHS,TrainConfig.layers \
          python train.py

Resulting dvc.yaml and dvc.lock files (notice the params lists):

stages:
  train:
    cmd: python train.py
    deps:
      - users.csv
    params:
      - params.py:
          - BOOL
          - INT
          - TrainConfig.EPOCHS
          - TrainConfig.layers
    outs:
      - model.pkl
schema: '2.0'
stages:
  train:
    cmd: python train.py
    deps:
      - path: users.csv
        md5: 23be4307b23dcd740763d5fc67993f11
    params:
      params.py:
        INT: 5
        BOOL: true
        TrainConfig.EPOCHS: 70
        TrainConfig.layers: 9
    outs:
      - path: model.pkl
        md5: 1c06b4756f08203cc496e4061b1e7d67

Alternatively, the entire TestConfig params group (class) can be referenced (dictionaries are also supported), instead of the parameters in it:

$ dvc run -n train -d train.py -d users.csv -o model.pkl \
          -p params.py:BOOL,INT,TestConfig \
          python train.py
Content

๐Ÿ› Found an issue? Let us know! Or fix it:

Edit on GitHub

โ“ Have a question? Join our chat, we will help you:

Discord Chat