For the status of tracked data, see
dvc data status (similar to
Searches for changes in the existing tracked data and pipelines. In local mode,
it shows which files or directories have changed in the workspace
(thus could be added or
reproduced again). In remote mode, it reports
the differences between cache vs. remote storage (
dvc push or
dvc pull could be used to synchronize these).
|local||none||Comparisons are made between data files in the workspace and corresponding files in the cache directory (e.g. |
|remote||Comparisons are made between the cache, and a given DVC remote.|
|remote||Comparisons are made between the cache, and the |
targets given to this command limit what to show changes for. It accepts
paths to tracked files or directories (including paths inside tracked
.dvc files, and stage names (found in
--all-commits options enable comparing
DVC-tracked files referenced in multiple Git commits at once.
If no differences are detected,
dvc status prints
Data and pipelines are up to date, or
Cache and remote 'myremote' are in sync (if using the
-r options are
used). If differences are detected, the changes in dependencies
and/or outputs for each stage are listed. For each item listed,
either the file name or hash is shown, along with a state description, as
changed checksum means that the
.dvcfile hash has changed (e.g. someone manually edited it).
changed deps or changed outs means that there are changes in dependencies or outputs tracked by the stage or
.dvcfile. Depending on the use case, commands like
dvc repro, or
dvc exp runcan be used to update the file. Possible states are:
- new: An output is found in the workspace, but
there is no corresponding file hash saved in the
- modified: An output or dependency is found in the workspace,
but the corresponding file hash in the
.dvcfile is not up to date.
- deleted: The output or dependency is referenced in a
.dvcfile, but does not exist in the workspace.
- not in cache: An output exists in the workspace, and the corresponding
file hash in the
.dvcfile is up to date, but there is no corresponding cache file or directory.
- new: An output is found in the workspace, but there is no corresponding file hash saved in the
update available means that an import stage is outdated (the original data source has changed). The imported data can be brought to its latest version by using
- new means that the file/directory exists in the cache but not in remote storage.
- deleted means that the file/directory doesn't exist in the cache, but exists in remote storage.
- missing means that the file/directory doesn't exist neither in cache, nor in remote storage.
For missing data, there's nothing to retrieve from storage. This can happen
for example in fresh DVC repository clones if the data wasn't
uploaded from the original repo, or after certain uses of
dvc gc. You can try
dvc repro to regenerate the output locally, and
dvc push remotely after
dvc remoteused is determined in order, based on
--all-branches- compares cache content against all Git branches, as well as the current workspace. This basically runs the same status command in every branch of this repo. The corresponding branches are shown in the status output. Applies only if
-rremote is specified. Note that this can be combined with
-Tbelow, for example using the
--all-tags- compares cache content against all Git tags, as well as the workspace. Note that this can be combined with
-aabove, for example using the
--all-commits- compares cache content against all Git commits, as well as the workspace. This compares the cache content for the entire commit history of the project.
--with-deps- only meaningful when specifying
targets. This determines files to check by resolving all dependencies of the targets: DVC searches backward from the targets in the corresponding pipelines. This will not show changes occurring in later stages than the
--recursive- determines the files to check status for by searching each target directory and its subdirectories for stages (in
.dvcfiles to inspect. If there are no directories among the targets, this option has no effect.
--json- prints the command's output in easily parsable JSON format, instead of a human-readable table.
--jobs <number>- parallelism level for DVC to access data from remote storage. This only applies when the
--cloudoption is used, or a
--remoteis given. The default value is
4 * cpu_count(). Note that the default value can be set using the
jobsconfig option with
dvc remote modify. Using more jobs may speed up the operation.
--help- prints the usage/help message, and exit.
--quiet- do not write anything to standard output. Exit with 0 if data and pipelines are up to date, otherwise 1.
--verbose- displays detailed tracing information.
$ dvc status baz.dvc: changed outs: modified: baz dofoo: changed deps: modified: baz changed outs: modified: foo dobar: changed deps: modified: foo changed outs: deleted: bar
This shows that for stage
dofoo, the dependency
baz and the output
have changed. Likewise for stage
dobar, the dependency
foo has changed and
bar doesn't exist in the workspace. For
baz.dvc, the file
tracked by it has changed.
dvc status only checks the tracked data corresponding to any given
$ dvc status foo.dvc dobar foo.dvc: changed outs: modified: foo changed checksum dobar: changed deps: modified: foo changed outs: not in cache: bar
Note that you can check data within directories tracked, such as the
directory (tracked with
$ tree data data ├── raw │ ├── partition.1.dat │ ├── ... │ └── partition.n.dat └── raw.dvc $ dvc fetch data/raw/partition.1.dat new: data/raw
$ vi code/featurization.py ... edit the code $ dvc status model.p Data and pipelines are up to date. $ dvc status model.p --with-deps matrix-train.p: changed deps: modified: code/featurization.py
dvc status command may be limited to a target that had no changes, but by
--with-deps, any change in a preceding stage will be found.
Let's now assume that we have a shared S3
dvc remote and would like to check
which files we have generated but haven't pushed to the remote yet:
$ dvc remote list mystorage s3://bucket/path
And would like to check what files we have generated but haven't pushed to the remote yet:
$ dvc status --remote mystorage ... new: data/model.p new: data/eval.txt new: data/matrix-train.p new: data/matrix-test.p
The output shows where the location of the remote storage is, as well as any
differences between the cache and
Let's import a data file (
data.csv) from a different DVC repository
into our current project using
$ dvc import different/repo/location data.csv
data.csv.dvc file is called an import stage. If the
original file or directory changes later,
dvc status will show "update
available" as output:
$ dvc status data.csv.dvc: changed deps: update available: data.csv (different/repo/location)
The imported data can be brought to its latest version by using