Using DVC Commands

DVC is a command line tool. The typical DVC workflow goes as follows:

  • In an existing Git repository, initialize a DVC project with dvc init.
  • Copy data files or dataset directories for modeling into the repository, and track them with DVC using the dvc add command.
  • Process raw data with your own source code, using dvc.yaml and/or the dvc run command, specifying further outputs that should also be tracked by DVC after the code is executed.
  • Sharing a DVC repository with the codified ML pipeline will not include the project's cache. Use remote storage and dvc push to share this cache (data tracked by DVC).
  • Use dvc repro to automatically reproduce your full pipeline iteratively as input data or source code change.

These command references provide a precise specification, complete description, and isolated usage examples for the dvc CLI tool. These are our most technical documentation pages, similar to man-pages in Linux.

