Edit on GitHub

.dvc files

When you add a file or directory with dvc add, dvc import, or dvc import-url, a file ending with the .dvc extension ("dot DVC file") is created based on the data file name (e.g. data.xml.dvc). It contain the information needed to track the data with DVC.

They use a simple YAML format, meant to be easy to read, edit, or even created manually. Here is an example:

outs:
  - md5: a304afb96060aad90176268345e10355
    path: data.xml
    desc: Cats and dogs dataset

# Comments and user metadata are supported.
meta:
  name: 'John Doe'
  email: john@doe.com

Accepted fields

.dvc files can contain the following fields:

  • outs (always present): List of output entries (details below) that represent the files or directories tracked with DVC. Typically there is only one (but several can be added or combined manually).
  • deps: List of dependency entries (details below). Only present when dvc import or dvc import-url are used to generate this .dvc file. Typically there is only one (but several can be added manually).
  • wdir: Working directory for the outs and deps paths (relative to the .dvc file's location). It defaults to . (the file's location).
  • md5: (only for imports) MD5 hash of the import .dvc file itself.
  • meta (optional): Arbitrary metadata can be added manually with this field. Any YAML contents is supported. meta contents are ignored by DVC, but they can be meaningful for user processes that read .dvc files.

Note that comments can be added to .dvc files using the # comment format. meta fields and # comments are preserved among executions of the dvc repro and dvc commit commands, but not when a .dvc file is overwritten by dvc add, dvc move, dvc import, or dvc import-url.

Output entries

outs fields can contain these subfields:

  • path: Path to the file or directory (relative to wdir, which defaults to the file's location)
  • md5, etag, or checksum: Hash value for the file or directory being tracked with DVC. MD5 is used for most locations (local file system and SSH); ETag for HTTP, S3, or Azure external outputs; and a special checksum for HDFS and WebHDFS.
  • size: Size of the file or directory (sum of all files).
  • nfiles: If this output is a directory, the number of files inside (recursive).
  • isexec: Whether this is an executable file. DVC preserves execute permissions upon dvc checkout and dvc pull. This has no effect on directories, or in general on Windows.
  • cache: Whether or not this file or directory is cached (true by default). See the --no-commit option of dvc add.
  • persist: Whether the output file/dir should remain in place while dvc repro runs (false by default: outputs are deleted when dvc repro starts
  • desc (optional): User description for this output (supported in metrics and plots too). This doesn't affect any DVC operations.

Dependency entries

deps fields can contain these subfields:

  • path: Path to the dependency (relative to wdir, which defaults to the file's location)
  • md5, etag, or checksum: Hash value for the file or directory being tracked with DVC. MD5 is used for most locations (local file system and SSH); ETag for HTTP, S3, or Azure external dependencies; and a special checksum for HDFS and WebHDFS. See dvc import-url for more information.
  • size: Size of the file or directory (sum of all files).
  • nfiles: If this dependency is a directory, the number of files inside (recursive).
  • repo: This entry is only for external dependencies created with dvc import, and can contains the following fields:

    • url: URL of Git repository with source DVC project
    • rev: Only present when the --rev option of dvc import is used. Specific commit hash, branch or tag name, etc. (a Git revision) used to import the dependency from.
    • rev_lock: Git commit hash of the external DVC repository at the time of importing or updating the dependency (with dvc update)
Content

๐Ÿ› Found an issue? Let us know! Or fix it:

Edit on GitHub

โ“ Have a question? Join our chat, we will help you:

Discord Chat