List repository contents, including files, models, and directories tracked by DVC (data artifacts) and by Git.
usage: dvc list [-h] [-q | -v] [-R] [--dvc-only] [--rev <commit>] url [path] positional arguments: url Location of DVC or Git repository to list from path Path to a file or directory within the repository
DVC, by effectively replacing data files, models, directories with
.dvc), hides actual locations and names. This means that you don't see data
files when you browse a DVC repository on Git hosting (e.g.
Github), you just see the
.dvc files. This makes
it hard to navigate the project to find data artifacts for use with
dvc import, or
dvc list prints a virtual view of a DVC repository, as if files and
directories tracked by DVC
were found directly in the remote Git repo. Only the root directory is listed by
default. The output of this command is equivalent to actually cloning the repo
and pulling its data like this:
$ git clone <url> example $ cd example $ dvc pull $ ls <path>
url argument specifies the address of the Git repository containing the
data source. Both HTTP and SSH protocols are supported for online repos (e.g.
url can also be a local file system path to an
"offline" Git repo.
path argument is used to specify directory to list within the
source repository at
url. It's similar to providing a path to list to commands
aws s3 ls. And similar to the,
-R option might be used to
list files recursively.
Please note that
dvc list doesn't check whether the listed data (tracked by
DVC) actually exists in remote storage, so it's not guaranteed whether it can be
dvc import, or
--recursive- recursively prints contents of all subdirectories.
--dvc-only- show only DVC-tracked files and directories (outputs).
--rev <commit>- commit hash, branch or tag name, etc. (any Git revision) of the repository to list content for. The latest commit in
master(tip of the default branch) is used by default when this option is not specified.
--help- prints the usage/help message, and exit.
--quiet- do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1.
--verbose- displays detailed tracing information. when this option is not specified.
We can use this command for getting information about a repository before using
other commands like
dvc get or
dvc import to reuse any file or directory
found in it. This includes files tracked by Git as well as data
artifacts tracked by DVC-tracked:
$ dvc list https://github.com/iterative/example-get-started .gitignore README.md data dvc.lock dvc.yaml model.pkl params.yaml prc.json scores.json src
If you open the
project's page, you will see a similar list but the
model.pkl file. It's
tracked by DVC and not visible to Git. It's exported in the
file as an output of the
train stage (in the
We can now, for example, download the model file with:
$ dvc get https://github.com/iterative/example-get-started model.pkl
Let's imagine a DVC repo used as a
data registry, structured
with different datasets in separate directories. We can do this recursively,
$ dvc list -R https://github.com/iterative/dataset-registry .gitignore README.md get-started/.gitignore get-started/data.xml get-started/data.xml.dvc images/.gitignore images/dvc-logo-outlines.png images/dvc-logo-outlines.png.dvc images/owl_sticker.png ...