Edit on GitHub

dvc.api.get_url()

Returns the URL to the storage location of a data file or directory tracked in a DVC project.

def get_url(path: str,
            repo: str = None,
            rev: str = None,
            remote: str = None) -> str

Usage:

import dvc.api

resource_url = dvc.api.get_url(
    'get-started/data.xml',
    repo='https://github.com/iterative/dataset-registry')

# resource_url is now "https://remote.dvc.org/dataset-registry/a3/04afb96060aad90176268345e10355"

Description

Returns the URL string of the storage location (in a DVC remote) where a target file or directory, specified by its path in a repo (DVC project), is stored.

The URL is formed by reading the project's remote configuration and the DVC-file where the given path is an output. The URL schema returned depends on the type of the remote used (see the Parameters section).

If the target is a directory, the returned URL will end in .dir. Refer to Structure of cache directory and dvc add to learn more about how DVC handles data directories.

⚠️ This function does not check for the actual existence of the file or directory in the remote storage.

💡 Having the resource's URL, it should be possible to download it directly with an appropriate library, such as boto3 or paramiko.

Parameters

  • path - location and file name of the file or directory in repo, relative to the project's root.
  • repo - specifies the location of the DVC project. It can be a URL or a file system path. Both HTTP and SSH protocols are supported for online Git repos (e.g. [user@]server:project.git). Default: The current project is used (the current working directory tree is walked up to find it).
  • rev - Git commit (any revision such as a branch or tag name, or a commit hash). If repo is not a Git repo, this option is ignored. Default: HEAD.
  • remote - name of the DVC remote to use to form the returned URL string. Default: The default remote of repo is used.

Exceptions

  • dvc.api.UrlNotDvcRepoError - repo is not a DVC project.
  • dvc.exceptions.NoRemoteError - no remote is found.

Example: Getting the URL to a DVC-tracked file

import dvc.api

resource_url = dvc.api.get_url(
    'get-started/data.xml',
    repo='https://github.com/iterative/dataset-registry'
    )

print(resource_url)

The script above prints

https://remote.dvc.org/dataset-registry/a3/04afb96060aad90176268345e10355

This URL represents the location where the data is stored, and is built by reading the corresponding DVC-file (get-started/data.xml.dvc) where the md5 file hash is stored,

outs:
  - md5: a304afb96060aad90176268345e10355
    path: get-started/data.xml

and the project configuration (.dvc/config) where the remote URL is saved:

['remote "storage"']
url = https://remote.dvc.org/dataset-registry

Content


DescriptionParametersExceptionsExample: Getting the URL to a DVC-tracked file

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

Have a question? Join our chat, we will help you:

Discord Chat