Initialize a DVC project in the current working directory.
DVC works best in a Git repository. This enables all features, providing the
most value. For this reason,
dvc init (without flags) expects to run in a Git
repository root (a
.git/ directory should be present).
At DVC initialization, a new
.dvc/ directory is created for configuration,
default cache location, and other internal files and directories,
that are hidden from the user. This directory is automatically staged with
git add, so it can be easily committed with Git.
The command options can be used to start an alternative workflow for advanced scenarios:
- Initializing DVC in subdirectories
--subdir) - for monorepos and nested DVC projects
- Initializing DVC without Git (
--no-scm) - for very simple projects, version control systems other than Git, deployment automation, among other uses
Initializing DVC in subdirectories
--subdir must be provided to initialize DVC in a subdirectory of a Git
repository. DVC still expects to find a Git root (will check all directories up
to the system root to find
.git/). This options does not affect any config
.dvc/ directory is created the same way as in the default mode. This
way multiple DVC projects can be initialized in a single Git
repository, providing isolation between projects.
This is mostly useful in the scenario of a monorepo (Git repo split into
several project directories), but can also be used with other patterns when such
isolation is needed.
dvc init --subdir mitigates possible limitations of
initializing DVC in the Git repo root:
Repository maintainers might not allow a top level
.dvc/directory, especially if DVC is already being used by several sub-projects (monorepo).
DVC internals (configuration, cache directory, remote storage, etc.) would be shared across different subdirectories.
By default, DVC commands like
dvc reproexplore the whole DVC repository to find DVC-tracked data and pipelines to work with. This can be inefficient for large monorepos.
Commands such as
dvc metrics showwould produce unexpected results if not constrained to a single project scope.
Initializing DVC without Git
In rare cases, the
--no-scm option might be desirable: to initialize DVC in a
directory that is not part of a Git repo, or to make DVC ignore Git. Examples
SCM other than Git is being used. Even though there are DVC features that require DVC to be run in the Git repo, DVC can work well with other version control systems. Since DVC relies on simple
dvc.yamlfiles to manage pipelines, data, etc, they can be added into any version control system, thus providing large data files and directories versioning.
There is no need to keep the history at all, e.g. having a deployment automation like running a data pipeline using
In this mode, DVC features related to versioning are not available. For example
automatic creation and updating of
.gitignore files on
dvc add or
dvc stage add, as well as
dvc diff and
dvc metrics diff, which require Git
revisions to compare.
DVC sets the
core.no_scm config option value to
true in the DVC
configuration when initialized this way. This means that even if the project is
tracked by Git, or if Git is initialized in it later, DVC will keep operating
detached from Git in this project.
.dvc/if it exists before initialization. Will remove any existing local cache. Useful when a previous
dvc inithas been corrupted.
--subdir- initialize the DVC project in the current working directory, even if it's not the Git repository root. (If run in a project root, this option is ignored.) It affects how other DVC commands behave afterwards, see Initializing DVC in subdirectories for more details.
--no-scm- initialize the DVC project detached from Git. It means that DVC doesn't try to find or use Git in the directory it's initialized in. Certain DVC features are not available in this mode. See Initializing DVC without Git for more details.
--help- prints the usage/help message, and exit.
--quiet- do not write anything to standard output. Exit with 0 if no problems arise, otherwise 1.
--verbose- displays detailed tracing information.
Examples: Most common initialization workflow
Create a new DVC repository (requires running in the Git repository root):
$ mkdir mydvcrepo && cd mydvcrepo $ git init $ dvc init $ git status ... new file: .dvc/.gitignore new file: .dvc/config $ git commit -m "Init DVC"
Note that the cache directory (among others) is not tracked with Git. It contains data and model files, and will be managed by DVC.
$ cat .dvc/.gitignore /config.local /tmp /cache
Examples: Initializing DVC in a subdirectory
Create a DVC repository in a subdirectory of a Git repository:
$ mkdir mygitrepo && cd mygitrepo $ git init $ mkdir project-a && cd project-a $ dvc init --subdir
In this case, Git repository is inside
repo directory, while DVC
repository is inside
$ tree repo -a repo ├── .git . . . └── project-a └── .dvc