Edit on GitHub

dvc.api.read()

Returns the contents of a tracked file.

def open(path: str,
         repo: str = None,
         rev: str = None,
         remote: str = None,
         mode: str = "r",
         encoding: str = None)

Usage:

import dvc.api

modelpkl = dvc.api.read(
    'model.pkl',
    repo='https://github.com/example/project.git',
    mode='rb')

Description

This function wraps dvc.api.open(), for a simple way to return the complete contents of a file tracked in a DVC project. The file can be tracked by DVC or by Git.

This is similar to the dvc get command in our CLI.

The returned contents can be a string or a bytearray. These are loaded to memory directly (without using any disc space).

The type returned depends on the mode used. For more details, please refer to Python's open() built-in, which is used under the hood.

Parameters

  • path - location and file name of the file in repo, relative to the project's root.
  • repo - specifies the location of the DVC project. It can be a URL or a file system path. Both HTTP and SSH protocols are supported for online Git repos (e.g. [user@]server:project.git). Default: The current project is used (the current working directory tree is walked up to find it).
  • rev - Git commit (any revision such as a branch or tag name, or a commit hash). If repo is not a Git repo, this option is ignored. Default: HEAD.
  • remote - name of the DVC remote to look for the target data. Default: The default remote of repo is used if a remote argument is not given. For local projects, the cache is tied before the default remote.
  • mode - specifies the mode in which the file is opened. Defaults to "r" (read). Mirrors the namesake parameter in builtin open().
  • encoding - codec used to decode the file contents to a string. This should only be used in text mode. Defaults to "utf-8". Mirrors the namesake parameter in builtin open().

Exceptions

  • dvc.exceptions.FileMissingError - file in path is missing from repo.
  • dvc.exceptions.PathMissingError - path cannot be found in repo.
  • dvc.api.UrlNotDvcRepoError - repo is not a DVC project.
  • dvc.exceptions.NoRemoteError - no remote is found.

Example: Load data from a DVC repository

Any data artifact hosted online can be loaded directly in your Python code with this API. For example, let's say that you want to load and unserialize a binary model from a repo on Github:

import pickle
import dvc.api

model = pickle.loads(
    dvc.api.read(
        'model.pkl',
        repo='https://github.com/example/project.git'
        mode='rb'
        )
    )

We're using 'rb' mode here for compatibility with pickle.loads().

Content

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

Have a question? Join our chat, we will help you:

Discord Chat