
How can I delete DVC-tracked files from cloud storage?
Thanks for the question @fireballpoint1!
You can find the best way to delete files from your cloud storage in our docs. Make sure you're super careful when deleting data from the cloud because it's an irreversible action. Here's an example of a deletion command that will clear out everything in your cloud storage except what is referenced in your workspace.:
$ dvc gc --workspace --cloudThis option only keeps the files and directories referenced in the workspace and
it removes everything else, including data in the cloud and cache. By default,
this command will use the default remote you have set. You can specify a
different remote storage with the --remote option like this:
$ dvc gc --workspace --cloud --remote name_of_remoteI'm using DVC experiments, but the Git index gets corrupted with large (4GB) files. What is the best workaround?
Great question from @charles.melby-thompson!
Experiment files may be tracked by Git or DVC. For large files, we generally recommend tracking them with DVC, in which case file size shouldn't be an issue.
By default, experiments will track all other files with Git. However, Git will fail with too much data. If there are files you don't want to track at all (such as large temporary/intermediate files), you can add them to your .gitignore file.
Check out this open issue with experiments for more details and to provide feedback.
Can CML self-hosted runners stop the instance after the idle timeout instead of terminating?
This is another fantastic question from @jotsif!
No, we deliberately terminate the instance to avoid unexpected costs. Stopped
but unterminated instances
can still cost the same as running ones.
It's best to let the CML runner terminate and create new instances, running
dvc pull to restore your data each time.
However, if you're trying to preserve data (e.g. cache dependencies to speed up experimentation time) on an AWS EC2 instance, you could connect persistent AWS S3 remote storage.
How can I use one dvc.yaml file with multiple pipeline folders with different params.yaml files?
@louisv, thanks for this question!
It seems like you're looking for the parametrization functionality. You can
learn more about how it works
in this doc,
but here's a an example of what that might look like in the dvc.yaml.
stages:
cleanups:
foreach: # List of simple values
- raw1
- labels1
- raw2
do:
cmd: clean.py "${item}"
outs:
- ${item}.cln
At our March Office Hours Meetup we will be about how you can create, run, and benchmark DVC pipelines with ZnTrack! RSVP for the Meetup here to stay up to date with specifics as we get closer to the event!
Join us in Discord to get all your DVC and CML questions answered!
📰 Join our Newsletter to stay up to date with news and contributions from the Community!
