Marks which files and/or directories should be excluded when traversing a DVC project.
Sometimes you might want DVC to ignore some files while working with the
project. For example, when working in a workspace directory with a
large number of data files, you might encounter extended execution time for
operations as simple as
dvc status. In other case you might want to omit files
or folders unrelated to the project (like
.DS_Store on macOS). To address
these scenarios, DVC supports optional
.dvcignorefile. These can be placed in the root of the project, or in any subdirectory (see the remarks below).
Ignored files will not be saved in cache, they will be non-existent for DVC. It's worth to remember that, especially when ignoring files inside DVC-handled directories.
Keep in mind that when you add
.dvcignore patterns that affect an existing
output, its status will change and DVC will behave as if the
affected files were deleted.
If DVC finds a
.dvcignore file inside a dependency or output directory, it
raises an error. Ignoring files inside such directories should be handled from a
.dvcignore in higher levels of the project tree.
Let's see what happens when we add a file to
$ mkdir data $ echo 1 > data/data1 $ echo 2 > data/data2 $ tree . └── data ├── data1 └── data2
We created the
data/ directory with two data files. Let's ignore one of them,
and double check that it's being ignored by DVC:
$ echo data/data1 >> .dvcignore $ cat .dvcignore data/data1 $ dvc check-ignore data/* data/data1
dvc check-ignorefor more details on that command.
Let's now track the directory with
dvc add, and see what happens in the
$ dvc add data ... $ tree .dvc/cache .dvc/cache ├── 26 │ └── ab0db90d72e28ad0ba1e22ee510510 └── ad └── 8b0ddcf133a6e5833002ce28f97c5a.dir $ md5 data/* b026324c6904b2a9cb4b88d6d61c81d1 data/data1 26ab0db90d72e28ad0ba1e22ee510510 data/data2
There are only 2 cache entries, and one of them (the one starting with
.dir) is for the
data/ directory itself. Checking the hash value
of the data files manually, we can see that the other cache entry (the one
26) is for
data2. There is no cache entry for the
(whose hash value starts with
ab). This means that
dvc add did ignore
Refer to Structure of cache directory for more info.
Now, let's modify file
data1 and see if it affects
$ dvc status Data and pipelines are up to date. $ echo "2345" >> data/data1 $ dvc status Data and pipelines are up to date.
dvc status ignores
Similarly, deleting a dvc ignored file also does not affect
$ rm data/data1 $ dvc status Data and pipelines are up to date.
Modifications/deletions on a tracked file produce a different output:
$ echo "345" >> data/data2 $ dvc status data.dvc: changed outs: modified: data
$ mkdir data $ echo data1 >> data/data1 $ echo data2 >> data/data2 $ tree . . └── data ├── data1 └── data2 $ echo data/data1 >> .dvcignore $ cat .dvcignore data/data1 $ dvc add data
If we move the ignored file to a new file within the
data directory (which is
not dvc ignored), DVC will behave as if we modified the directory by adding a
$ dvc status Data and pipelines are up to date. $ mv data/data1 data/data3 $ dvc status data.dvc: changed outs: modified: data
Let's analyze an example workspace:
$ mkdir dir1 dir2 $ echo data1 >> dir1/data1 $ echo data2 >> dir2/data2 $ dvc add dir1/data1 dir2/data2 $ tree . . ├── dir1 │ ├── data1 │ └── data1.dvc └── dir2 ├── data2 └── data2.dvc
Modify data files:
$ echo mod > dir1/data1 $ echo mod > dir2/data2
$ dvc status dir1/data1.dvc: changed outs: modified: dir1/data1 dir2/data2.dvc: changed outs: modified: dir2/data2
Note that both data files are displayed as modified. Create a
and insert pattern matching one of the files:
$ echo 'dir1/*' >> .dvcignore
Check status again:
$ dvc status dir2/data2.dvc: changed outs: modified: dir2/data2
Only the second file is displayed because DVC now ignores