List repository contents, including files, models, and directories tracked by DVC (data artifacts) and by Git.
usage: dvc list [-h] [-q | -v] [-R] [--outs-only] [--rev [REV]] url [path] positional arguments: url Location of DVC or Git repository to list from path Path to a file or directory within the repository
DVC, by effectively replacing data files, models, directories with DVC-files
.dvc), hides actual locations and names. It means that you don't see actual
data when you view a DVC repository with Github/Gitlab UI (you see
.dvc files instead). It makes it hard to navigate the project, makes it hard
dvc.api - they all deal
with actual path to a data file or directory.
This command prints a virtual view of a DVC repository, the way it would have looked like if files and directories that are DVC-tracked were actually regular Git-tracked files.
Another way to explain this - it prints the result similar to:
$ git clone <url> example $ cd example $ dvc pull $ ls <path>
url argument is a Git repository address to list. Command works for any
Git repository - either it has DVC project in it, or not. Both HTTP and SSH
protocols are supported for online repositories (e.g.
url can also be a local
file system path to a valid Git repository.
path argument of this command is used to specify a path within the source
url. It's similar to providing a path to list to the commands
aws s3 ls. And similar to the,
-R option might be used to list
--recursive- recursively prints the repository contents.
--outs-only- show only DVC-tracked files and directories (outputs).
--rev- commit hash, branch or tag name, etc. (any Git revision) of the repository to list content for. The latest commit in
master(tip of the default branch) is used by default when this option is not specified.
--help- prints the usage/help message, and exit.
--quiet- do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1.
--verbose- displays detailed tracing information. when this option is not specified.
We can use the command for getting information about remote repository with all files, directories and data artifacts, including DVC-tracked ones:
$ dvc list https://github.com/iterative/example-get-started .gitignore README.md auc.metric data evaluate.dvc featurize.dvc model.pkl prepare.dvc src train.dvc
If you open the
example-get-started project's page,
you will see a similar list, except that
model.pkl will be missing. That's
because its tracked by DVC and not visible to Git. You can find it specified as
an output if you open
We can now, for example, run
$ dvc get https://github.com/iterative/example-get-started model.pkl
to download the model file (see
Let's imagine a DVC repo used as a
data registry, structured
with different datasets in separate directories. We can do this recursively,
$ dvc list -R https://github.com/iterative/dataset-registry .gitignore README.md get-started/.gitignore get-started/data.xml get-started/data.xml.dvc images/.gitignore images/dvc-logo-outlines.png images/dvc-logo-outlines.png.dvc images/owl_sticker.png ...