Edit on GitHub

DVC Configuration

Once initialized in a project, DVC populates its installation directory with internal files, which include .dvc/config, the default configuration file.

Config files can be composed manually (or programmatically), or managed with the helper command dvc config.

Config file locations

.dvc/config is meant to be tracked by Git and should not contain sensitive user info or secrets (passwords, SHH keys, etc).

DVC supports saving configuration outside of the repository, either in a Git-ignored file alongside the regular config file or in other places in your file system. These locations and their loading priority are detailed below:

PriorityTypemacOS locationLinux location (typical*)Windows location
1Local.dvc/config.localSameSame
2Project (default).dvc/configSameSame
3Global$HOME/Library/Application\ Support/dvc/config$HOME/.config/dvc/config%LocalAppData%\iterative\dvc\config
4System/Library/Application\ Support/dvc/config/etc/xdg/dvc/config%AllUsersProfile%\Application Data\iterative\dvc\config

* For Linux, the global file may be found in $XDG_CONFIG_HOME, and the system file in $XDG_CONFIG_DIRS[0], if those env vars are defined.

See also dvc config flags --local, --global, and --system.

Configuration sections

The following config sections are written by this command to the appropriate config file (.dvc/config by default), supporting different config options within:

  • core.remote - name of the default remote storage

  • core.interactive - whether to always ask for confirmation before reproducing each stage in dvc repro. (Normally, this behavior requires using the -i option of that command.) Accepts values: true and false.

  • core.analytics - used to turn off anonymized usage statistics. Accepts values true (default) and false.

  • core.checksum_jobs - number of threads for computing file hashes. Accepts positive integers. The default value is max(1, min(4, cpu_count() // 2)).

  • core.hardlink_lock - use hardlink file locks instead of the default ones, based on flock (i.e. project lock file .dvc/lock). Accepts values true and false (default). Useful when the DVC project is on a file system that doesn't properly support file locking (e.g. NFS v3 and older).

  • core.no_scm - tells DVC to not expect or integrate with Git (even if the project is initialized inside a Git repo). Accepts values true and false (default). Set with the --no-scm option of dvc init (more details).

  • core.check_update - disable/enable DVC's automatic update checks, which notify the user when a new version is available. Accepts values true (default) and false.

  • core.autostage - if enabled, DVC will automatically stage (git add) DVC files created or modified by DVC commands. The files will not be committed. Accepts values true and false (default).

Unlike most other sections, configuration files may have more than one 'remote'. All of them require a unique "name" and a url value. They can also specify jobs, verify, and many platform-specific key/value pairs like port and password.

See Remote Storage Configuration for more details.

For example, the following config file defines a temp remote in the local file system (located in /tmp/dvcstore), and marked as default (via core section):

['remote "temp"']
    url = /tmp/dvcstore
[core]
    remote = temp
  • cache.dir - set/unset cache directory location. A correct value is either an absolute path, or a path relative to the config file location. The default value is cache, that resolves to .dvc/cache (relative to the project config file location).

    See also the helper command dvc cache dir to intuitively set this config option, properly transforming paths relative to the current working directory into paths relative to the config file location.

  • cache.type - link type that DVC should use to link data files from cache to the workspace. Possible values: reflink, symlink, hardlink, copy or an ordered combination of those, separated by commas e.g: reflink,hardlink,copy. Default: reflink,copy

    There are pros and cons to different link types. Refer to File link types for a full explanation of each one.

    If you set cache.type to hardlink or symlink, manually modifying tracked data files in the workspace would corrupt the cache. To prevent this, DVC automatically protects those kinds of links (making them read-only). Use dvc unprotect to be able to modify them safely.

    To apply changes to this config option in the workspace, restore all file links/copies from cache with dvc checkout --relink.

  • cache.slow_link_warning - used to turn off the warnings about having a slow cache link type. These warnings are thrown by dvc pull and dvc checkout when linking files takes longer than usual, to remind them that there are faster cache link types available than the defaults (reflink,copy – see cache.type). Accepts values true and false.

    These warnings are automatically turned off when cache.type is manually set.

  • cache.shared - permissions for newly created or downloaded cache files and directories. The only accepted value right now is group, which makes DVC use 664 (rw-rw-r—) for files and 775 (rwxrwxr-x) for directories. This is useful when sharing a cache among projects. The default permissions for cache files is system dependent. In Linux and macOS for example, they're determined using os.umask.

The following parameters allow setting an external cache location. A dvc remote name is used (instead of the URL) because often it's necessary to configure authentication or other connection settings, and configuring a remote is the way that can be done.

  • cache.local - name of a local remote to use as external cache. This will overwrite the value in cache.dir (see dvc cache dir).

  • cache.s3 - name of an Amazon S3 remote to use as external cache.

  • cache.gs - name of a Google Cloud Storage remote to use as external cache.

  • cache.ssh - name of an SSH remote to use as external cache.

  • cache.hdfs - name of an HDFS remote to use as external cache.

  • cache.webhdfs - name of an HDFS remote with WebHDFS enabled to use as external cache.

    Avoid using the same remote storage used for dvc push and dvc pull as external cache, because it may cause file hash overlaps: the hash of an external output could collide with that of a local file with different content.

Sets the defaults for experiment configuration via Hydra Composition.

  • hydra.enabled - enables Hydra config composition.
  • hydra.config_dir - location of the directory containing Hydra config groups. Defaults to conf.
  • hydra.config_name - the name of the file containing the Hydra defaults list (located inside hydra.config_dir). Defaults to config.yaml.
  • parsing.bool - Controls the templating syntax for boolean values when used in dictionary unpacking.

    Valid values are "store_true" (default) and "boolean_optional", named after Python argparse actions.

    Given the following params.yaml:

    dict:
      bool-true: true
      bool-false: false

    And corresponding dvc.yaml:

    stages:
      foo:
        cmd: python foo.py ${dict}

    When using store_true, cmd will be:

    python foo.py --bool-true

    Whereas when using boolean_optional, cmd will be:

    python foo.py --bool-true --no-bool-false
  • parsing.list - Controls the templating syntax for list values when used in dictionary unpacking.

    Valid values are "nargs" (default) and "append", named after Python argparse actions.

    Given the following params.yaml:

    dict:
      list: [1, 2, 'foo']

    And corresponding dvc.yaml:

    stages:
      foo:
        cmd: python foo.py ${dict}

    When using nargs, cmd will be:

    python foo.py --list 1 2 'foo'

    Whereas when using append, cmd will be:

    python foo.py --list 1 --list 2 --list 'foo'
  • state.row_limit - maximum number of entries in state databases. This affects the physical size of the state files, as well as the performance of certain DVC operations. The default is 10,000,000 rows. The bigger the limit, the longer the file hash history that DVC can keep, for example.

  • state.row_cleanup_quota - percentage of the state database to be deleted when it reaches the state.row_limit. The default quota is 50%. DVC removes the oldest entries (created when dvc status is used, for example).

  • state.dir - specify a custom location for the state databases (links/ and md5/ directories), by default in .dvc/tmp. This may be necessary when using DVC on NFS or other mounted volumes where SQLite encounters file permission errors.

  • index.dir - specify a custom location for the directory where remote index files will be stored, by default in .dvc/tmp/index. This may be necessary when using DVC on NFS or other mounted volumes.
Content

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

Have a question? Join our chat, we will help you:

Discord Chat