Edit on GitHub

get-url

Download a file or directory from a supported URL (for example s3://, ssh://, and other protocols) into the local file system.

See dvc get to download data/model files or directories from other DVC repositories (e.g. hosted on GitHub).

Synopsis

usage: dvc get-url [-h] [-q | -v] [-j <number>] [-f] [--fs-config <name>=<value>] url [out]

positional arguments:
  url            (See supported URLs in the description.)
  out            Destination path to put files in.

Description

In some cases it's convenient to get a file or directory from a remote location into the local file system. The dvc get-url command helps the user do just that.

Note that unlike dvc import-url, this command does not track the downloaded data files (does not create a .dvc file). For that reason, this command doesn't require an existing DVC project to run in.

The url argument should provide the location of the data to be downloaded, while out can be used to specify the directory and/or file name desired for the downloaded data. If an existing directory is specified, then the file or directory will be placed inside.

See dvc list-url for a way to browse the external location for files and directories to download.

DVC supports several types of (local or) remote data sources (protocols):

TypeDescriptionurl format example
s3Amazon S3s3://bucket/data
azureMicrosoft Azure Blob Storageazure://container/data
gsGoogle Cloud Storagegs://bucket/data
sshSSH serverssh://user@example.com/path/to/data
hdfsHDFS to file*hdfs://user@example.com/path/to/data.csv
httpHTTP to file*https://example.com/path/to/data.csv
webdavWebDav to file*webdavs://example.com/enpoint/path
webhdfsHDFS REST API*webhdfs://user@example.com/path/to/data.csv
localLocal path/path/to/local/data

If you installed DVC via pip and plan to use cloud services as remote storage, you might need to install these optional dependencies: [s3], [azure], [gs], [oss], [ssh]. Alternatively, use [all] to include them all. The command should look like this: pip install "dvc[s3]". (This example installs boto3 library along with DVC to support S3 storage.)

* Notes on remote locations:

  • HDFS, HTTP, WebDav, and WebHDFS do not support downloading entire directories, only single files.

Another way to understand the dvc get-url command is as a tool for downloading data files. On GNU/Linux systems for example, instead of dvc get-url with HTTP(S) it's possible to instead use:

$ wget https://example.com/path/to/data.csv

Options

  • -j <number>, --jobs <number> - parallelism level for DVC to download data from the source. The default value is 4 * cpu_count(). Using more jobs may speed up the operation.

  • -f, --force - when using --out to specify a local target file or directory, the operation will fail if those paths already exist. this flag will force the operation causing local files/dirs to be overwritten by the command.

  • --fs-config <name>=<value> - dvc remote config options for the target url.

  • -h, --help - prints the usage/help message, and exit.

  • -q, --quiet - do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1.

  • -v, --verbose - displays detailed tracing information.

Examples

This command will copy an S3 object into the current working directory with the same file name:

$ dvc get-url s3://bucket/path

By default, DVC expects that AWS CLI is already configured.

DVC will use the AWS credentials file to access S3. To override the configuration, you can use the parameters described in dvc remote modify.

We use the boto3 library to and communicate with AWS. The following API methods may be performed:

  • head_object
  • download_file

So make sure you have the s3:GetObject permission enabled.

$ dvc get-url gs://bucket/path file

The above command downloads the /path file (or directory) into ./file.

$ dvc get-url ssh://user@example.com/path/to/data

Using default SSH credentials, the above command gets the data file (or directory).

$ dvc get-url hdfs://user@example.com/path/to/file

Both HTTP and HTTPS protocols are supported.

$ dvc get-url https://example.com/path/to/file
$ dvc get-url webhdfs://user@example.com/path/to/file
$ dvc get-url /local/path/to/data

The above command will copy the /local/path/to/data file or directory into ./dir.

Content

๐Ÿ› Found an issue? Let us know! Or fix it:

Edit on GitHub

โ“ Have a question? Join our chat, we will help you:

Discord Chat