Once you install DVC, you'll be able to start using it (in its local setup) immediately.
However, remote will be required (see
dvc remote) if you need to share data or
models outside of the context of a single project, for example with other
collaborators or even with yourself in a different computing environment. It's
similar to the way you would use GitHub or any other Git server to store and
share your code.
For simplicity, let's setup a local remote:
$ dvc remote add -d myremote /tmp/dvc-storage $ git commit .dvc/config -m "Configure local remote"
We only use a local remote in this section for simplicity's sake as you learn to use DVC. For most use cases, other "more remote" types of remotes will be required.
Adding a remote should be specified by both its type (protocol) and its path. DVC currently supports these types of remotes:
s3: Amazon Simple Storage Service
azure: Microsoft Azure Blob Storage
gdrive: Google Drive
gs: Google Cloud Storage
ssh: Secure Shell (requires SFTP)
hdfs: Hadoop Distributed File System
http: HTTP and HTTPS protocols
local: Directory in the local file system
If you installed DVC via
pipand plan to use cloud services as remote storage, you might need to install these optional dependencies:
[ssh]. Alternatively, use
[all]to include them all. The command should look like this:
pip install "dvc[s3]". (This example installs
boto3library along with DVC to support S3 storage.)
For example, to setup an S3 remote we would use something like this (make sure
$ dvc remote add -d s3remote s3://mybucket/myproject
This command is only shown for informational purposes. No need to actually run it in order to continue with the Get Started.
You can see that DVC doesn't require installing any databases, servers, or warehouses. It can use bare S3 or SSH to store data, intermediate results, and models.