Edit on GitHub

Remote Storage

DVC remotes provide access to external storage locations to track and share your data and ML models. Usually, those will be shared between devices or team members who are working on a project. For example, you can download data artifacts created by colleagues without spending time and resources to regenerate them locally. See also dvc push and dvc pull.

DVC remotes are similar to Git remotes (e.g. GitHub or GitLab hosting), but for cached data instead of code.

DVC does not provide or recommend a specific storage service (unlike code repos). You can bring your own platform from a wide variety of supported storage types.

Main uses of remote storage:

  • Synchronize large files and directories tracked by DVC.
  • Centralize or distribute data storage for sharing and collaboration.
  • Back up different versions of datasets and models (saving space locally).

Configuration

You can set up one or more storage locations with dvc remote commands. These read and write to the remote section of the project's config file (.dvc/config), which you could edit manually as well.

For example, let's define a remote storage location on an S3 bucket:

$ dvc remote add myremote s3://mybucket

DVC reads existing configuration you may have locally for major cloud providers (AWS, Azure, GCP) so that many times all you need to do is dvc remote add!

You may also need to customize authentication or other config with dvc remote modify:

$ dvc remote modify --local \
                    myremote credentialpath ~/.aws/alt
$ dvc remote modify myremote connect_timeout 300

The --local flag is needed to write sensitive user info to a Git-ignored config file (.dvc/config.local) so that no secrets are leaked (see dvc config). This means that each copy of the DVC repository has to re-configure these values.

# .dvc/config
['remote "myremote"']
    url = s3://my-bucket
    connect_timeout = 300
# .dvc/config.local
['remote "myremote"']
    credentialpath = ~/.aws/alt
# .gitignore
.dvc/config.local

Finally, you can git commit the changes to share the remote location with your team.

Supported storage types

Cloud providers

Self-hosted / On-premises

File systems (local remotes)

Not related to the --local option of dvc remote and dvc config!

You can also use system directories, mounted drives, network resources e.g. network-attached storage (NAS), and other external devices as storage. We call all these "local remotes".

Here, the word "local" refers to where the storage is found: typically another directory in the same file system. And "remote" is how we call storage for DVC projects.

Using an absolute path (recommended because it's saved as-is in DVC config):

$ dvc remote add -d myremote /tmp/dvcstore
# .dvc/config
['remote "myremote"']
    url = /tmp/dvcstore

When using a relative path, it will be saved relative to the config file location, but resolved against the current working directory.

$ dvc remote add -d myremote ../dvcstore
# .dvc/config
['remote "myremote"']
    url = ../../dvcstore
Content

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

Have a question? Join our chat, we will help you:

Discord Chat