status
Show changes in the project pipelines, as well as file mismatches either between the cache and workspace, or between the cache and remote storage.
For the status of tracked data, see
dvc data status
(similar to
git status
).
Synopsis
usage: dvc status [-h] [-v] [-j <number>] [-q] [-c] [-r <name>] [-a] [-T]
[--all-commits] [-d] [-R] [--json] [--no-updates]
[targets [targets ...]]
positional arguments:
targets Limit command scope to these tracked files/directories,
.dvc files, or stage names.
Description
Searches for changes in the existing tracked data and pipelines. In local mode,
it shows which files or directories have changed in the workspace
(thus could be added or
reproduced again). In remote mode, it reports
the differences between cache vs. remote storage (dvc push
or dvc pull
could be used to synchronize these).
Mode | Option | Description |
---|---|---|
local | none | Comparisons are made between data files in the workspace and corresponding files in the cache directory (e.g. .dvc/cache ) |
remote | --remote (-r ) | Comparisons are made between the cache, and a given DVC remote. |
remote | --cloud (-c ) | Comparisons are made between the cache, and the dvc remote default . |
Without arguments, this command checks all dvc.yaml
and .dvc
files to
rebuild and validate pipeline(s). It then compares outputs defined
in these files against the actual data in the workspace.
Any targets
given to this command limit what to show changes for. It accepts
paths to tracked files or directories (including paths inside tracked
directories), .dvc
files, and stage names (found in dvc.yaml
).
The --all-branches
, --all-tags
, and --all-commits
options enable comparing
DVC-tracked files referenced in multiple Git commits at once.
If no differences are detected, dvc status
prints
Data and pipelines are up to date
, or
Cache and remote 'myremote' are in sync
(if using the -c
or -r
options are
used). If differences are detected, the changes in dependencies
and/or outputs for each stage are listed. For each item listed,
either the file name or hash is shown, along with a state description, as
detailed bellow.
Local workspace status
-
changed checksum means that the
.dvc
file hash has changed (e.g. someone manually edited it). -
always changed means that this stage (in
dvc.yaml
) has neither dependencies nor outputs, or that thealways_changed
field set totrue
(seedvc stage add --always-changed
). -
changed deps or changed outs means that there are changes in dependencies or outputs tracked by the stage or
.dvc
file. Depending on the use case, commands likedvc commit
,dvc repro
, ordvc exp run
can be used to update the file. Possible states are:- new: An output is found in the workspace, but
there is no corresponding file hash saved in the
dvc.lock
or.dvc
file yet. - modified: An output or dependency is found in the workspace,
but the corresponding file hash in the
dvc.lock
or.dvc
file is not up to date. - deleted: The output or dependency is referenced in a
dvc.lock
or.dvc
file, but does not exist in the workspace. - not in cache: An output exists in the workspace, and the corresponding
file hash in the
dvc.lock
or.dvc
file is up to date, but there is no corresponding cache file or directory.
- new: An output is found in the workspace, but
there is no corresponding file hash saved in the
-
update available means that an import stage is outdated (the original data source has changed). The imported data can be brought to its latest version by using
dvc update
.
Comparison against remote storage
- new means that the file/directory exists in the cache but not in remote storage.
- deleted means that the file/directory doesn't exist in the cache, but exists in remote storage.
- missing means that the file/directory doesn't exist neither in cache, nor in remote storage.
For new and deleted data, the cache is different from remote storage. Bringing
the two into sync requires dvc pull
or dvc push
.
For missing data, there's nothing to retrieve from storage. This can happen
for example in fresh DVC repository clones if the data wasn't
uploaded from the original repo, or after certain uses of dvc gc
. You can try
dvc repro
to regenerate the output locally, and dvc push
remotely after
that.
Options
-
-c
,--cloud
- enables comparison against advc remote
. If the--remote
option is not used, DVC will compare against thedvc remote default
(seedvc config core.remote
).The
dvc remote
used is determined in order, based on- the
remote
fields in thedvc.yaml
or.dvc
files. - the value passed to the
--remote
option via CLI. - the value of the
core.remote
config option (seedvc remote default
).
- the
-
-a
,--all-branches
- compares cache content against all Git branches, as well as the current workspace. This basically runs the same status command in every branch of this repo. The corresponding branches are shown in the status output. Applies only if--cloud
or a-r
remote is specified. Note that this can be combined with-T
below, for example using the-aT
flags. -
-T
,--all-tags
- compares cache content against all Git tags, as well as the workspace. Note that this can be combined with-a
above, for example using the-aT
flags. -
-A
,--all-commits
- compares cache content against all Git commits, as well as the workspace. This compares the cache content for the entire commit history of the project. -
-d
,--with-deps
- only meaningful when specifyingtargets
. This determines files to check by resolving all dependencies of the targets: DVC searches backward from the targets in the corresponding pipelines. This will not show changes occurring in later stages than thetargets
. -
-R
,--recursive
- determines the files to check status for by searching each target directory and its subdirectories for stages (indvc.yaml
) and.dvc
files to inspect. If there are no directories among the targets, this option has no effect. -
-r <name>
,--remote <name>
- name of thedvc remote
to compare against (seedvc remote list
). Implies--cloud
. -
--json
- prints the command's output in easily parsable JSON format, instead of a human-readable table. -
--no-updates
- ignore updates to import stages. By default,dvc status
will check whether there are updates available from the sources of the imports.--no-updates
will skip these checks. -
-j <number>
,--jobs <number>
- parallelism level for DVC to access data from remote storage. This only applies when the--cloud
option is used, or a--remote
is given. The default value is4 * cpu_count()
. Note that the default value can be set using thejobs
config option withdvc remote modify
. Using more jobs may speed up the operation. -
-h
,--help
- prints the usage/help message, and exit. -
-q
,--quiet
- do not write anything to standard output. Exit with 0 if data and pipelines are up to date, otherwise 1. -
-v
,--verbose
- displays detailed tracing information.
Examples
$ dvc status
baz.dvc:
changed outs:
modified: baz
dofoo:
changed deps:
modified: baz
changed outs:
modified: foo
dobar:
changed deps:
modified: foo
changed outs:
deleted: bar
This shows that for stage dofoo
, the dependency baz
and the output foo
have changed. Likewise for stage dobar
, the dependency foo
has changed and
the output bar
doesn't exist in the workspace. For baz.dvc
, the file baz
tracked by it has changed.
Example: Specific files or directories
dvc status
only checks the tracked data corresponding to any given targets
:
$ dvc status foo.dvc dobar
foo.dvc:
changed outs:
modified: foo
changed checksum
dobar:
changed deps:
modified: foo
changed outs:
not in cache: bar
In this case, the target
foo.dvc
is a.dvc
file to track thefoo
file, whiledobar
is the name of a stage defined indvc.yaml
.
Note that you can check data within directories tracked, such as the data/raw
directory (tracked with data/raw.dvc
):
$ tree data
data
โโโ raw
โ โโโ partition.1.dat
โ โโโ ...
โ โโโ partition.n.dat
โโโ raw.dvc
$ dvc fetch data/raw/partition.1.dat
new: data/raw
Example: Dependencies
$ vi code/featurization.py
... edit the code
$ dvc status model.p
Data and pipelines are up to date.
$ dvc status model.p --with-deps
matrix-train.p:
changed deps:
modified: code/featurization.py
The dvc status
command may be limited to a target that had no changes, but by
adding --with-deps
, any change in a preceding stage will be found.
Example: Remote comparisons
Let's now assume that we have a shared S3 dvc remote
and would like to check
which files we have generated but haven't pushed to the remote yet:
$ dvc remote list
mystorage s3://bucket/path
And would like to check what files we have generated but haven't pushed to the remote yet:
$ dvc status --remote mystorage
...
new: data/model.p
new: data/eval.txt
new: data/matrix-train.p
new: data/matrix-test.p
The output shows where the location of the remote storage is, as well as any
differences between the cache and mystorage
remote.
Example: Check imported data
Let's import a data file (data.csv
) from a different DVC repository
into our current project using dvc import
.
$ dvc import different/repo/location data.csv
The resulting data.csv.dvc
file is called an import stage. If the
original file or directory changes later, dvc status
will show "update
available" as output:
$ dvc status
data.csv.dvc:
changed deps:
update available: data.csv (different/repo/location)
The imported data can be brought to its latest version by using dvc update
.
To skip this check (for example, to speed up status checks, or because you don't
have permission to access the original source data), use --no-updates
:
$ dvc status --no-updates
Data and pipelines are up to date.