Edit on GitHub

diff

Show added, modified, or deleted DVC-tracked files and directories between two commits in the DVC repository, or between a commit and the workspace.

Synopsis

usage: dvc diff [-h] [-q | -v]
                [--targets [<paths> [<paths> ...]]]
                [--json] [--show-hash] [--md]
                [a_rev] [b_rev]

positional arguments:
  a_rev      Old Git commit to compare (defaults to HEAD)
  b_rev      New Git commit to compare (defaults to current workspace)

Description

Prints a list of files and directories added, modified, renamed, or deleted in a Git commit b_rev as compared to another Git commit a_rev. Both a_rev and b_rev accept any Git revision - branch or tag name, Git commit hash, etc.

Note that for renames dvc diff only detects files which have been renamed but are otherwise unmodified between Git commits.

It defaults to comparing the current workspace and the last commit (HEAD), if arguments a_rev and b_rev are not specified.

Options --json and --show-hash can be used to modify format of the output of this command. See the Options and Examples sections below for more details.

dvc diff does not have an effect when the repository is not tracked by Git, for example when dvc init was used with the --no-scm option.

Note that current dvc diff implementation does not show the line-to-line comparison among the files in each revision, like git diff or GNU diff can. This is because the data data tracked by DVC can come in many possible formats e.g. structured text, or binary blobs, etc. For an example on how to create line-to-line text file comparison, refer to this comment.

Options

  • --targets <paths> - specific DVC-tracked files to compare.

    When specifying arguments for --targets before a_rev/b_rev, you should use -- after this option's arguments (POSIX terminals), e.g.:

    $ dvc diff --targets t1.json t2.yaml -- HEAD v1
  • --json - prints the command's output in easily parsable JSON format, instead of a human-readable table.

  • --md - prints the command's output in Markdown table format.

  • --show-hash - print file and directory hash values along with their path. Useful for debug purposes.

  • --hide-missing - do not list data missing from both workspace and cache (not in cache). Only list files and directories which have been explicitly added, modified, or deleted. This option does nothing when comparing two Git commits.

  • -h, --help - prints the usage/help message, and exit.

  • -q, --quiet - do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1.

  • -v, --verbose - displays detailed tracing information.

Examples

For these examples we can use the Get Started project.

Start by cloning our example repo if you don't already have it:

$ git clone https://github.com/iterative/example-get-started
$ cd example-get-started

Download data using:

$ dvc fetch -T
Preparing to download data from 'https://remote.dvc.org/get-started'
...

With the -T option, dvc fetch makes sure that we have all the data files related to all existing Git tags in the repo. You may see the available tags of our example repo here.

Example: Checking workspace changes

The minimal dvc diff, run without arguments, defaults to comparing DVC-tracked files between HEAD (last Git commit) and the current workspace (uncommitted changes, if any):

$ dvc diff

Example: Comparing workspace with arbitrary commits

Let's checkout the 2-track-data tag, corresponding to the Data Versioning Get Started chapter, right after we added data.xml file with DVC:

$ git checkout 2-track-data
$ dvc checkout

To see the difference between the very previous commit of the project and the workspace, we can use HEAD^ as a_rev:

$ dvc diff HEAD^
Added:
    data/data.xml

files summary: 1 added, 0 deleted, 0 modified

Example: Comparing tags or branches

Our example repository has the baseline-experiment and bigrams-experiment tags tags, that reference two different modeling experiments.

Having followed the example's setup, move into the example-get-started/ directory. Then make sure that you have the latest code and data with the following commands:

$ git checkout master
$ dvc checkout
$ dvc diff baseline-experiment bigrams-experiment
Modified:
    data/features/
    data/features/test.pkl
    data/features/train.pkl
    model.pkl
    prc.json
    scores.json

files summary: 0 added, 0 deleted, 5 modified

The output from this command confirms that there's a difference in 5 files between the tags baseline-experiment and bigrams-experiment.

Example: Using different output formats

Let's use the same command as above, but with JSON output and including hash values:

$ dvc diff --json --show-hash \
           baseline-experiment bigrams-experiment

It outputs:

{
  "added": [],
  "deleted": [],
  "modified": [
    {
      "path": "data/features/",
      "hash": {
        "old": "3338d2c21bdb521cda0ba4add89e1cb0.dir",
        "new": "42c7025fc0edeb174069280d17add2d4.dir"
      }
    },
    ...
    {
      "path": "model.pkl",
      "hash": {
        "old": "43630cce66a2432dcecddc9dd006d0a7",
        "new": "662eb7f64216d9c2c1088d0a5e2c6951"
      }
    }
    ...
  ]
}

Example: Renamed files

Having followed the previous examples' setup, move into the example-get-started/ directory. Then make sure that you have the latest code and data with the following commands:

$ git checkout master
$ dvc checkout

dvc diff only detects files which have been renamed but are otherwise unmodified. If a file has been renamed and also has modified content, dvc diff will show that the original file has been deleted and the modified version has been added.

Let's see what happens when we rename data/data.xml in our workspace:

$ dvc move data/data.xml data/other_data.xml
$ dvc diff
Renamed:
    data/data.xml -> data/other_data.xml

files summary: 0 added, 0 deleted, 1 renamed, 0 modified, 0 not in cache

Now, let's modify data/other_data.xml and then see what happens again:

$ echo "<row Id=\"12345678\" Body=\"New row\" />" >> data/other_data.xml
$ dvc diff
Added:
    data/other_data.xml

Deleted:
    data/data.xml

files summary: 1 added, 1 deleted, 0 renamed, 0 modified, 0 not in cache

In this case, since the file content in data/other_data.xml has been changed, dvc diff can no longer detect that it is a renamed version of the original file data/data.xml.