dvc.api.read()
Returns the contents of a tracked file.
def read(path: str,
repo: str = None,
rev: str = None,
remote: str = None,
mode: str = "r",
encoding: str = None)
Usage
import dvc.api
modelpkl = dvc.api.read(
'model.pkl',
repo='https://github.com/iterative/example-get-started',
mode='rb'
)
Description
This function wraps dvc.api.open()
, for a simple way to return the complete
contents of a file tracked in a DVC project. The file can be
tracked by DVC (as an output) or by Git.
This is similar to the
dvc get
command in our CLI.
The returned contents can be a string or a bytearray. These are loaded to memory directly (without using any disc space).
The type returned depends on the
mode
used. For more details, refer to Python'sopen()
built-in, which is used under the hood.
Parameters
-
path
(required) - location and file name of the target to read, relative to the root of the project (repo
). -
repo
- specifies the location of the DVC project. It can be a URL or a file system path. Both HTTP and SSH protocols are supported for online Git repos (e.g.[user@]server:project.git
). Default: The current project is used (the current working directory tree is walked up to find it). -
rev
- Git commit (any revision such as a branch or tag name, commit hash, or experiment name). Ifrepo
is not a Git repo, this option is ignored. Default:None
(current working tree will be used) -
remote
- name of the DVC remote to look for the target data. Default: The default remote ofrepo
is used if aremote
argument is not given. For local projects, the cache is tried before the default remote. -
mode
- specifies the mode in which the file is opened. Defaults to"r"
(read). Mirrors the namesake parameter in builtinopen()
. -
encoding
- codec used to decode the file contents to a string. This should only be used in text mode. Defaults to"utf-8"
. Mirrors the namesake parameter in builtinopen()
.
Exceptions
-
dvc.exceptions.FileMissingError
- file inpath
is missing fromrepo
. -
dvc.exceptions.PathMissingError
-path
cannot be found inrepo
. -
dvc.exceptions.NoRemoteError
- noremote
is found.
Example: Load data from a DVC repository
Any file tracked in a DVC project (and stored remotely) can be loaded directly in your Python code with this API. For example, let's say that you want to load and unserialize a binary model from a repo on GitHub:
import pickle
import dvc.api
data = dvc.api.read(
'model.pkl',
repo='https://github.com/iterative/example-get-started'
mode='rb'
)
model = pickle.loads(data)
We're using
'rb'
mode here for compatibility withpickle.loads()
.