Remote Storage
DVC remotes provide access to external storage locations to track and share
your data and ML models. Usually, those will be shared between devices or team
members who are working on a project. For example, you can download data
artifacts created by colleagues without spending time and resources to
regenerate them locally. See also dvc push
and dvc pull
.
DVC remotes are similar to Git remotes (e.g. GitHub or GitLab hosting), but for cached data instead of code.
DVC does not provide or recommend a specific storage service (unlike code repos). You can bring your own platform from a wide variety of supported storage types.
Main uses of remote storage:
- Synchronize large files and directories tracked by DVC.
- Centralize or distribute data storage for sharing and collaboration.
- Back up different versions of datasets and models (saving space locally).
Configuration
You can set up one or more storage locations with dvc remote
commands. These
read and write to the remote
section of the project's config file
(.dvc/config
), which you could edit manually as well.
For example, let's define a remote storage location on an S3 bucket:
$ dvc remote add myremote s3://mybucket
DVC reads existing configuration you may have locally for major cloud providers
(AWS, Azure, GCP) so that many times all you need to do is dvc remote add
!
You may also need to customize authentication or other config with
dvc remote modify
:
$ dvc remote modify --local \
myremote credentialpath ~/.aws/alt
$ dvc remote modify myremote connect_timeout 300
The --local
flag is needed to write sensitive user info to a Git-ignored
config file (.dvc/config.local
) so that no secrets are leaked (see
dvc config
). This means that each copy of the DVC repository has
to re-configure these values.
# .dvc/config
['remote "myremote"']
url = s3://my-bucket
connect_timeout = 300
# .dvc/config.local
['remote "myremote"']
credentialpath = ~/.aws/alt
# .gitignore
.dvc/config.local
Finally, you can git commit
the changes to share the remote location with your
team.
Supported storage types
Cloud providers
- Amazon S3 (AWS) and S3-compatible e.g. MinIO
- Microsoft Azure Blob Storage
- Google Cloud Storage (GCP)
- Google Drive
- Aliyun OSS
Self-hosted / On-premises
File systems (local remotes)
Not related to the --local
option of dvc remote
and dvc config
!
You can also use system directories, mounted drives, network resources e.g. network-attached storage (NAS), and other external devices as storage. We call all these "local remotes".
Here, the word "local" refers to where the storage is found: typically another directory in the same file system. And "remote" is how we call storage for DVC projects.
Using an absolute path (recommended because it's saved as-is in DVC config):
$ dvc remote add -d myremote /tmp/dvcstore
# .dvc/config
['remote "myremote"']
url = /tmp/dvcstore
When using a relative path, it will be saved relative to the config file location, but resolved against the current working directory.
$ dvc remote add -d myremote ../dvcstore
# .dvc/config
['remote "myremote"']
url = ../../dvcstore