usage: dvc push [-h] [-q | -v] [-j <number>] [-r <name>] [-a] [-T] [--all-commits] [--glob] [-d] [-R] [--run-cache] [targets [targets ...]] positional arguments: targets Limit command scope to these tracked files/directories, .dvc files, or stage names.
dvc push and
dvc pull commands are the means for uploading and
downloading data to and from remote storage (S3, SSH, GCS, etc.). These commands
are similar to
git push and
git pull, respectively.
Data sharing across environments,
and preserving data versions (input datasets, intermediate results, models,
metrics, etc.) remotely are the most common
use cases for these commands.
Without arguments, it uploads the files and directories referenced in the
current workspace (found in all
.dvc files) that are missing
from the remote. Any
targets given to this command limit what to push. It
accepts paths to tracked files or directories (including paths inside tracked
.dvc files, and stage names (found in
--all-commits options enable pushing
files/dirs referenced in multiple Git commits.
For all outputs referenced in each target, DVC finds the
corresponding files and directories in the cache (identified by
hash values saved in
.dvc files). DVC then gathers a list of
files missing from the remote storage, and uploads them.
--all-branches- determines the files to upload by examining
.dvcfiles in all Git branches instead of just those present in the current workspace. It's useful if branches are used to track experiments or project checkpoints. Note that this can be combined with
-Tbelow, for example using the
--all-tags- same as
-aabove, but applies to Git tags as well as the workspace. Useful if tags are used to track "checkpoints" of an experiment or project. Note that both options can be combined, for example using the
--all-commits- same as
-Tabove, but applies to all Git commits as well as the workspace. This uploads tracked data for the entire commit history of the project.
--with-deps- determines files to upload by tracking dependencies to the
targets. If none are provided, this option is ignored. By traversing all stage dependencies, DVC searches backward from the target stages in the corresponding pipelines. This means DVC will not push files referenced in later stages than the
--recursive- determines the files to push by searching each target directory and its subdirectories for
.dvcfiles to inspect. If there are no directories among the
targets, this option is ignored.
--remote <name>- name of the remote storage to push to (see
dvc remote list).
--run-cache- uploads all available history of stage runs to the remote repository.
--jobs <number>- parallelism level for DVC to upload data to remote storage. The default value is
4 * cpu_count(). For SSH remotes, the default is
4. Note that the default value can be set using the
jobsconfig option with
dvc remote modify. Using more jobs may improve the overall transfer speed.
--glob- allows pushing files and directories that match the pattern specified in
targets. Shell style wildcards supported:
--help- prints the usage/help message, and exit.
--quiet- do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1.
--verbose- displays detailed tracing information.
$ dvc remote add --default r1 \ ssh://_username_@_host_/path/to/dvc/cache/directory
For existing projects, remotes are usually already set up. You can use
dvc remote listto check them:
$ dvc remote list r1 ssh://_username_@_host_/path/to/dvc/cache/directory
Push entire data cache from the current workspace to the default remote:
$ dvc push
Push files related to a specific
.dvc file only:
$ dvc push data.zip.dvc
Imagine the project has been modified such that the outputs of some of these stages need to be uploaded to remote storage.
$ dvc status --cloud ... new: data/model.p new: data/matrix-test.p new: data/matrix-train.p
One could do a simple
dvc push to share all the data, but what if you only
want to upload part of the data?
$ dvc push --with-deps test-posts # Do some work based on the partial update... # Then push the rest of the data: $ dvc push --with-deps matrix-train $ dvc status --cloud Cache and remote 'r1' are in sync.
We specified a stage in the middle of this pipeline (
test-posts) with the
--with-deps caused DVC to start with that
.dvc file, and search
backwards through the pipeline for data files to upload.
matrix-train stage occurs later (it's the last one), its data was
not pushed. However, we then specified it in the second push, so all remaining
data was uploaded.
Finally, we used
dvc status to double check that all data had been uploaded.
Let's take a detailed look at what happens to the cache directory as you run an experiment locally and push data to remote storage. To set the example consider having created a workspace that contains some code and data, and having set up a remote.
Some work has been performed in the workspace, and new data is ready for
uploading to the remote.
dvc status --cloud
will list several files in
new state. We can see exactly what that means by
looking in the project's cache:
$ tree .dvc/cache .dvc/cache ├── 02 │ └── 423d88d184649a7157a64f28af5a73 ├── 0b │ └── d48000c6a4e359f4b81285abf059b5 ├── 38 │ └── 64e70211d3bdb367ad1432bfc14c1f.dir ├── 4a │ └── 8c47036c79c01522e79ac0f518d0f7 ├── 6c │ └── 3074754e3a9b563b62c8f1a38670dc ├── 77 │ └── bea77463abe2b7c6b4d13f00d2c7b4 └── 88 └── c3db1c257136090dbb4a7ddf31e678.dir 10 directories, 9 files $ tree ~/vault/recursive ~/vault/recursive ├── 0b │ └── d48000c6a4e359f4b81285abf059b5 ├── 4a │ └── 8c47036c79c01522e79ac0f518d0f7 └── 88 └── c3db1c257136090dbb4a7ddf31e678.dir 5 directories, 5 files
.dvc/cache is the local cache, while
~/vault/recursive is a
"local remote" (another directory in the local file system). This listing shows
the cache having more files in it than the remote – which is what the
Refer to Structure of cache directory for more info.
Next we can copy the remaining data from the cache to the remote using
$ tree ~/vault/recursive ~/vault/recursive ├── 02 │ └── 423d88d184649a7157a64f28af5a73 ├── 0b │ └── d48000c6a4e359f4b81285abf059b5 ├── 38 │ └── 64e70211d3bdb367ad1432bfc14c1f.dir ├── 4a │ └── 8c47036c79c01522e79ac0f518d0f7 ├── 6c │ └── 3074754e3a9b563b62c8f1a38670dc ├── 77 │ └── bea77463abe2b7c6b4d13f00d2c7b4 └── 88 └── c3db1c257136090dbb4a7ddf31e678.dir 10 directories, 10 files $ dvc status --cloud Cache and remote 'r1' are in sync.
dvc status --cloud, DVC verifies that indeed there are no more
files to push to remote storage.