get-url
Download a file or directory from a supported URL (for example s3://,
ssh://, and other protocols) into the local file system.
See
dvc getto download data/model files or directories from other DVC repositories (e.g. hosted on GitHub).
Synopsis
usage: dvc get-url [-h] [-q | -v] [-j <number>] [-f] [--fs-config <name>=<value>] url [out]
positional arguments:
url (See supported URLs in the description.)
out Destination path to put files in.Description
In some cases it's convenient to get a file or directory from a remote location
into the local file system. The dvc get-url command helps the user do just
that.
Note that unlike
dvc import-url, this command does not track the downloaded data files (does not create a.dvcfile). For that reason, this command doesn't require an existing DVC project to run in.
The url argument should provide the location of the data to be downloaded,
while out can be used to specify the directory and/or file name desired for
the downloaded data. If an existing directory is specified, then the file or
directory will be placed inside.
See dvc list-url for a way to browse the external location for files and
directories to download.
DVC supports several types of (local or) remote data sources (protocols):
| Type | Description | url format example |
|---|---|---|
s3 | Amazon S3 | s3://bucket/data |
azure | Microsoft Azure Blob Storage | azure://container/data |
gs | Google Cloud Storage | gs://bucket/data |
ssh | SSH server | ssh://user@example.com/path/to/data |
hdfs | HDFS to file* | hdfs://user@example.com/path/to/data.csv |
http | HTTP to file* | https://example.com/path/to/data.csv |
webdav | WebDav to file* | webdavs://example.com/enpoint/path |
webhdfs | HDFS REST API* | webhdfs://user@example.com/path/to/data.csv |
local | Local path | /path/to/local/data |
If you installed DVC via
pipand plan to use cloud services as remote storage, you might need to install these optional dependencies:[s3],[azure],[gs],[oss],[ssh]. Alternatively, use[all]to include them all. The command should look like this:pip install "dvc[s3]". (This example installsboto3library along with DVC to support S3 storage.)
* Notes on remote locations:
- HDFS, HTTP, WebDav, and WebHDFS do not support downloading entire directories, only single files.
Another way to understand the dvc get-url command is as a tool for downloading
data files. On GNU/Linux systems for example, instead of dvc get-url with
HTTP(S) it's possible to instead use:
$ wget https://example.com/path/to/data.csvOptions
-
-j <number>,--jobs <number>- parallelism level for DVC to download data from the source. The default value is4 * cpu_count(). Using more jobs may speed up the operation. -
-f,--force- when using--outto specify a local target file or directory, the operation will fail if those paths already exist. this flag will force the operation causing local files/dirs to be overwritten by the command. -
--fs-config <name>=<value>-dvc remoteconfig options for the target url. -
-h,--help- prints the usage/help message, and exit. -
-q,--quiet- do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1. -
-v,--verbose- displays detailed tracing information.
Examples
This command will copy an S3 object into the current working directory with the same file name:
$ dvc get-url s3://bucket/pathBy default, DVC expects that AWS CLI is already configured.
DVC will use the AWS credentials file to access S3. To override the
configuration, you can use the parameters described in dvc remote modify.
We use the
boto3library to and communicate with AWS. The following API methods may be performed:
head_objectdownload_fileSo make sure you have the
s3:GetObjectpermission enabled.
$ dvc get-url gs://bucket/path fileThe above command downloads the /path file (or directory) into ./file.