July '21 Community Gems

A roundup of technical Q&A's from the DVC community. This month: self-hosted runners, DVC commits, troubleshooting remotes, and more.

  • Milecia McGregor
  • July 27, 20215 min read
Hero Picture

Q: I'm trying to use the --reuse option of cml-runner. If I launch 2 CML experiments in parallel, will CML use the same runner or spin up another one if the existing one is in use?

If you don't reuse the runner and you have set up a deploy job, that deploy job will launch two cloud runners. With --reuse it will check if the runner with that tag exists and will not launch another one. Every runner will be listening for incomming jobs until the max idle time.

Let's say that you set up one runner with --reuse and launch multiple jobs. What will happen is that only one runner should be launched and will take all the jobs.

The runner that deploys the workflow is not tied specifically to the train job that it's going to be launched in the same workflow. You just add runners to the pool and they will be waiting until the idle time is done.

We're working on something like --reuse-idle that would be easy to implement. The idea would be to reuse only idle runners, so that if your job fails and the fix is pretty fast, you don't need to spin up another runner. You can track our progress on that through this GitHub issue.

A great question from @Corentin in the Discord community!

Q: How can I run self-hosted runners on an on-premise machine indefinitely?

You can achieve this by passing the --idle-timeout=0 option to cml-runner in order to disable the timeout.

Thanks @achbogga!

Q: How can I change the default VPC to a different one with cml-cloudrunner for AWS?

Great gem from @krish98409!

You could setting the security group via cloud-aws-security-group. It will pick the VPC that manages that precise security group.

We still don't provide a way of specifying VPCs other than the default one, but it's an issue that we're currently working on: https://github.com/iterative/terraform-provider-iterative/issues/107

Q: Is it possible to rename and modify a file inside a directory tracked by DVC in one commit/change?

If you modify the name and modify the file, you just need to run dvc commit and then commit the change into Git.

This was a good question for everyone. Thanks @snowpong!

Q: How can I list the experiments I've queued?

This is a great question to help us all understand something so thanks @adwivedi.

To look at your queued experiments, run dvc exp show. All of the queued experiments will be marked with an asterisk *.

Queued experiments are not shown with the dvc exp list command at the moment.

Q: I have two machines and a central remote. With my second machine, I want to pull the dataset from the first machine. How can I pull the data with DVC?

Make sure that you have configured a DVC remote and run dvc push from your first machine. You should be able to find the files on the remote storage where you pushed them to after running that command. Then you can run dvc pull on your second machine and this should give you the dataset you pushed from the first machine.

You will run into some issues if your remote isn't configured properly on the second machine. Check your .dvc/config file for the second machine to make sure there aren't any errors. It could be something as simple as a connection string without the necessary quotation marks!

Thanks so much for this question @raharth!

Q: dvc push says, "Everything is up to date." However, I modified my dataset and this is confirmed with dvc status, where it lists a "modified" entry on the changed outs. How can I force a push of my changes?

You need to run dvc commit to commit your changes to the cache.

Good question @BSVogler.

Q: I'm trying to use the DVC API in a Jupyter notebook. Can I simulate a dvc push command via the API?

Nice job working with the Python API @harry134!

You can use the Repo API like this.

from dvc.repo import Repo

repo = Repo()
repo.push()

The API isn't production ready, so documentation is lacking at the moment. Although, we do use it internally all the time, so you can use it with caution too.


At our August Office Hours Meetup, we'll be learning about DVC and Streamlit integration. RSVP for the Meetup here to stay up to date with specifics as we get closer to the event!

Join us in Discord to get answers for your DVC and CML questions!

Subscribe for updates. We won't spam you.