.dvc
filesWhen you add a file or directory with dvc add
, dvc import
, or
dvc import-url
, a file ending with the .dvc
extension ("dot DVC file") is
created based on the data file name (e.g. data.xml.dvc
). It contain the
information needed to track the data with DVC.
They use a simple YAML format, meant to be easy to read, edit, or even created manually. Here is an example:
outs:
- md5: a304afb96060aad90176268345e10355
path: data.xml
desc: Cats and dogs dataset
# Comments and user metadata are supported.
meta:
name: 'John Doe'
email: john@doe.com
.dvc
files can contain the following fields:
outs
(always present): List of output entries (details
below) that represent the files or directories tracked with DVC. Typically
there is only one (but several can be added or combined manually).deps
: List of dependency entries (details below).
Only present when dvc import
or dvc import-url
are used to generate this
.dvc
file. Typically there is only one (but several can be added manually).wdir
: Working directory for the outs
and deps
paths (relative to the
.dvc
file's location). It defaults to .
(the file's location).md5
: (only for imports) MD5 hash of the import .dvc
file
itself.meta
(optional): Arbitrary metadata can be added manually with this field.
Any YAML contents is supported. meta
contents are ignored by DVC, but they
can be meaningful for user processes that read .dvc
files.Note that comments can be added to .dvc
files using the # comment
format.
meta
fields and #
comments are preserved among executions of the dvc repro
and dvc commit
commands, but not when a .dvc
file is overwritten by
dvc add
, dvc move
, dvc import
, or dvc import-url
.
outs
fields can contain these subfields:
path
: Path to the file or directory (relative to wdir
, which defaults to
the file's location)md5
, etag
, or checksum
: Hash value for the file or directory being
tracked with DVC. MD5 is used for most locations (local file system and SSH);
ETag for
HTTP, S3, or Azure external outputs;
and a special checksum for HDFS and WebHDFS.size
: Size of the file or directory (sum of all files).nfiles
: If this output is a directory, the number of files inside
(recursive).isexec
: Whether this is an executable file. DVC preserves execute
permissions upon dvc checkout
and dvc pull
. This has no effect on
directories, or in general on Windows.cache
: Whether or not this file or directory is cached (true
by default). See the --no-commit
option of dvc add
.persist
: Whether the output file/dir should remain in place while
dvc repro
runs (false
by default: outputs are deleted when dvc repro
startsdesc
(optional): User description for this output (supported in metrics and
plots too). This doesn't affect any DVC operations.deps
fields can contain these subfields:
path
: Path to the dependency (relative to wdir
, which defaults to the
file's location)md5
, etag
, or checksum
: Hash value for the file or directory being
tracked with DVC. MD5 is used for most locations (local file system and SSH);
ETag for
HTTP, S3, or Azure external dependencies; and a special
checksum for HDFS and WebHDFS. See dvc import-url
for more information.size
: Size of the file or directory (sum of all files).nfiles
: If this dependency is a directory, the number of files inside
(recursive).repo
: This entry is only for external dependencies created with
dvc import
, and can contains the following fields:
url
: URL of Git repository with source DVC projectrev
: Only present when the --rev
option of dvc import
is used.
Specific commit hash, branch or tag name, etc. (a
Git revision) used to import the
dependency from.rev_lock
: Git commit hash of the external DVC repository at
the time of importing or updating the dependency (with dvc update
)