Edit on GitHub

Remote Storage

DVC remotes provide optional/additional storage to backup and share your data and ML model. For example, you can download data artifacts created by colleagues without spending time and resources to regenerate them locally. See dvc push and dvc pull.

DVC remotes are similar to Git remotes, but for cached data.

This is somehow like GitHub or GitLab providing hosting for source code repositories. However, DVC does not provide or recommend a specific storage service. Instead, it adopts a bring-your-own-platform approach, supporting a wide variety of storage types.

The main uses of remote storage are:

  • Synchronize DVC-tracked data (previously cached).
  • Centralize or distribute large file storage for sharing and collaboration.
  • Back up different versions of your data and models.
  • Save space in your working environment (by deleting pushed files/directories).

Configuration

You can set up one or more remote storage locations, mainly with the dvc remote add and dvc remote modify commands. These read and write to the remote section of the project's configuration file (.dvc/config), which you could edit manually as well.

Typically, you'll first register a DVC remote by adding its name and URL (or file path), e.g.:

$ dvc remote add mybucket s3://my-bucket

Then, you'll usually need or want to configure the remote's authentication credentials or other properties, etc. For example:

$ dvc remote modify --local \
                    mybucket credentialpath ~/.aws/alt

$ dvc remote modify mybucket connect_timeout 300

Make sure to use the --local flag when writing secrets to configuration. This creates a second config file in .dvc/config.local that is ignored by Git. This way your secrets do not get to the repository. See dvc config for more info.

This also means each copy of the DVC repository may have to re-configure remote storage authentication.

# .dvc/config
['remote "mybucket"']
    url = s3://my-bucket
    connect_timeout = 300
# .dvc/config.local
['remote "mybucket"']
    credentialpath = ~/.aws/alt
# .gitignore
.dvc/config.local

Finally, you can git commit the changes to share the general configuration of your remote (.dvc/config) via the Git repo.

Supported storage types

See more details.

Cloud providers

  • Amazon S3 (AWS)
  • S3-compatible e.g. MinIO
  • Microsoft Azure Blob Storage
  • Google Drive
  • Google Cloud Storage (GCP)
  • Aliyun OSS

Self-hosted / On-premises

  • SSH servers; Like scp
  • HDFS & WebHDFS
  • HTTP
  • WebDAV
  • Local directories, mounted drives; Like rsync

    Includes network resources e.g. network-attached storage (NAS) or other external devices

Content

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

Have a question? Join our chat, we will help you:

Discord Chat