In this guide we explain the existing ways to setup Google Drive remote storage for your DVC projects, along with the different benefits each one brings.
DVC uses the Google Drive API to synchronize your DVC project data with this type of remote storage, so it's subject to certain usage limits and quotas, which by default are shared with other GDrive remote storage users. For heavy use, it's highly recommended to connect using a custom Google Cloud project, which puts you in control of these limits.
Having your own GC project, it's also possible to use a service account for automating tasks that need to establish GDrive remote connections (e.g. CI/CD).
To start using a Google Drive remote, you only need to add it with a
valid URL format. Then use any DVC command that needs to connect
to it (e.g.
dvc pull or
dvc push once there's tracked data to synchronize).
$ dvc add data ... $ dvc remote add --default myremote \ gdrive://0AIac4JZqHhKmUk9PDA/dvcstore $ dvc push Go to the following link in your browser: https://accounts.google.com/o/oauth2/auth # ... copy this link Enter verification code: # <- enter resulting code
See Authorization for more details.
There's a few alternatives to construct a GDrive remote URL for different uses,
such as a folder or subfolder in root, shared folders not owned by your account,
etc. The URL is formed with a base, and an optional path to an existing
gdrive://<base>/path/to/folder. The base can be one of:
⚠️ The folder in question should be shared to specific users (or groups) so they can use it with DVC. "Anyone with a link" is not guaranteed to work.
$ dvc remote add myremote gdrive://0AIac4JZqHhKmUk9PDA
$ dvc remote add myremote \ gdrive://0AIac4JZqHhKmUk9PDA/Data/text
0AIac4JZqHhKmUk9PDA above is the folder ID, and it can be found in the web
browser address bar, for example
* Please note the Shared drive limits on storage and uploads.
root - indicates your topmost Google Drive folder ("My Drive").
⚠️ Only suitable for personal use, as sharing a remote configured this way would cause DVC to try synchronizing data to/from different Google Drives for every user.
$ dvc remote add myremote gdrive://root/dvcstore
We don't recommend using
gdrive://rootby itself, as it's likely used for many other reasons, and pushing data with DVC here can make it messy.
special hidden folder
(unique per user) meant to store application-specific data. This is a good
choice to prevent accidentally deleting remote storage data from the Google
Drive web UI.
⚠️ Only suitable for personal use.
$ dvc remote add myremote gdrive://appDataFolder
Optionally, follow these steps to create your own Google Cloud project and generate OAuth credentials for your GDrive remotes to connect to Google Drive. We highly recommend this for heavy use and advanced needs because:
Sign into the Google API Console.
Double check you're using the intended Google account (upper-right corner).
Select or Create a project for DVC remote connections.
Enable the Drive API from the APIs & Services Dashboard (left sidebar), click on + ENABLE APIS AND SERVICES. Find and select the "Google Drive API" in the API Library, and click on the ENABLE button.
Go back to APIs & Services in the left sidebar, and select OAuth consent screen. Chose a User Type and click CREATE. On the next screen, enter an Application name e.g. "DVC remote storage", and click the Save (scroll to bottom).
From the left sidebar, select Credentials, and click the Create credentials dropdown to select OAuth client ID. Chose Desktop app and click Create to proceed with a default client name.
The newly generated client ID and client secret should be shown to you now, and you can always come back to Credentials to fetch them.
✅ It should be safe to share client ID and client secret among your team. These credentials are only used to generate the authorization URL you'll need to visit later in order to connect to the Google Drive.
Finally, use the
dvc remote modify command to set the credentials (for each
GDrive remote), for example:
$ dvc remote modify myremote gdrive_client_id 'client-id' $ dvc remote modify myremote gdrive_client_secret 'client-secret'
This covers simple authentication, which gives DVC access to GDrive on behalf of a user account. This is ideal to use to run DVC locally, for example. If some automation is needed (e.g. CI/CD) we recommend using a service account instead.
On the first usage of a GDrive remote, for
example when trying to
dvc push tracked data for the first time, DVC will
prompt you to visit a special Google authentication web page. There you'll need
to sign into a Google account with the needed access to the GDrive
URL in question. The
auth process will ask
you to grant DVC the necessary permissions, and produce a verification code
needed for DVC to complete the connection. On success, the necessary credentials
will be saved in a Git-ignored file, located in
.dvc/tmp/gdrive-user-credentials.json and they will be used automatically next
time you run DVC.
⚠️ In order to prevent unauthorized access to your Google Drive, do not share these credentials with others. Each team member should go through this process individually.
If you use multiple GDrive remotes, by default they will be sharing the same
.dvc/tmp/gdrive-user-credentials.json file. It can be overridden with the
$ dvc remote modify myremote --local \ gdrive_user_credentials_file .dvc/tmp/myremote-credentials.json
⚠️ In order to prevent unauthorized access to your Google Drive, never
commit this file with Git. Instead, add it into
.gitignore and never share
it with other people.
If you wish to change the user you have authenticated with, or for troubleshooting misc. token errors, simply remove the user credentials JSON file and authorize again.
GDRIVE_CREDENTIALS_DATA can be set to pass user credentials
in CI/CD systems, production setup, read-only file systems, etc. The content of
this variable should be a string with JSON that has the same format as in the
credentials files described above, and usually you get it going through the same
authentication process. If
GDRIVE_CREDENTIALS_DATA is set, the
gdrive_user_credentials_file value (if provided) is ignored.
A service account is a Google account associated with your GCP project, and not a specific user. They are intended for scenarios where your code needs to access data on its own, e.g. running inside a Compute Engine, automatic CI/CD, etc. No interactive user OAuth authentication is needed.
This requires having your own GC project as explained above.
create a service account,
navigate to IAM & Admin in the left sidebar, and select Service
Accounts. Click + CREATE SERVICE ACCOUNT, enter a Service account
name e.g. "My DVC project", and optionally provide a custom Service
account ID and description. Then click CREATE AND CONTINUE. You can
skip the two optional sections. Click DONE and you will be returned to
the overview page. Select your service account and go to the Keys tab.
Under Add key select Create new key, choose JSON, and click
CREATE. Download the generated
.json key file to a safe location.
⚠️ Be careful about sharing the key file with others.
Configure the remote to use the service account and tell if where to find the key file:
$ dvc remote modify myremote gdrive_use_service_account true $ dvc remote modify myremote --local \ gdrive_service_account_json_file_path path/to/file.json
GDRIVE_CREDENTIALS_DATA can be set to pass service account
key in CI/CD systems, production setup, read-only file systems, etc. The
content of this variable should be a string with JSON that has the same
format as in the keys file described above. If both this variable and
gdrive_service_account_json_file_path are provided,
GDRIVE_CREDENTIALS_DATA takes priority and
gdrive_service_account_json_file_path is ignored.
Share the Google Drive folders that you want to use with the service account. Navigate to your Google Drive folder's sharing options and add the service account as an editor (read/write) or viewer (read-only):