Edit on GitHub

Updating Tracked Files

Due to the way DVC handles linking between the data files between the cache and their counterparts in the workspace (refer to Large Dataset Optimization), updating tracked files has to be carried out with caution to avoid data corruption when the DVC config option cache.type is set to hardlink or/and symlink. (See dvc config cache for more details on setting the cache file link types.)

For an example of the cache corruption problem see issue #599 in our GitHub repository.

Assume train.tsv is tracked by DVC and you want to update it. Here updating may mean either replacing train.tsv with a new file having the same name or editing the content of the file.

If you run dvc repro there is no need to manage generated (output) files manually. DVC removes them for you before executing the stage that generates them.

If you use DVC to track a file that is generated during your pipeline (e.g. some intermediate result or a final model file i.e. model.pkl) and you don't use dvc run and dvc repro to manage your pipeline, use the procedure below (run dvc unprotect or dvc remove) to unlink it from DVC cache prior to the execution of the script that modifies it.

See also dvc unprotect and dvc config cache to learn more about protecting your data files.

Replacing file

If you want to replace the file, you can take the following steps.

First, un-track the file. This will remove train.tsv from the workspace:

$ dvc remove train.tsv.dvc

Next, replace the file with new content:

$ echo new > train.tsv

And start tracking it again:

$ dvc add train.tsv
$ git add train.tsv.dvc .gitignore
$ git commit -m "new train data"

Modifying content

"Unlink" the file with dvc unprotect. This will make train.tsv safe to edit:

$ dvc unprotect train.tsv

Edit the content of the file:

$ echo "new data item" >> train.tsv

Add the new version of the file back with DVC:

$ dvc add train.tsv
$ git add train.tsv.dvc
$ git commit -m "modify train data"
Content

๐Ÿ› Found an issue? Let us know! Or fix it:

Edit on GitHub

โ“ Have a question? Join our chat, we will help you:

Discord Chat