cache.type config option is set to
hardlink (not the
dvc config cache for more info.), updating tracked files has to
be carried out with caution, to avoid data corruption. This is due to the way in
which DVC handles linking data files between the cache and the
workspace (refer to
Large Dataset Optimization for
For an example of the cache corruption problem see issue #599 in our GitHub repo.
Otherwise (the data was tracked with
dvc add), use one of the procedures below
to "unlink" the data from the cache prior to updating it. We'll be working with
Unlink the file with
dvc unprotect. This will make
train.tsv safe to edit:
$ dvc unprotect train.tsv
Then edit the content of the file, for example with:
$ echo "new data item" >> train.tsv
Add the new version of the file back with DVC:
$ dvc add train.tsv $ git add train.tsv.dvc $ git commit -m "modify train data"
If you want to replace the file altogether, you can take the following steps.
$ dvc remove train.tsv.dvc
Next, replace the file with new content:
$ echo new > train.tsv
And start tracking it again:
$ dvc add train.tsv $ git add train.tsv.dvc .gitignore $ git commit -m "new train data"