Skip to content

Support for Shared Drives #40

@rhunwicks

Description

@rhunwicks

Currently, gdrivefs doesn't support shared drives.

I have a setup like:

    root_folder: str = "gdrive://Discovery Folder/Worksheets"
    storage_options: dict = {
        "token": "service_account",
        "access": "read_only",
        "creds": json.loads(os.environ["GOOGLE_APPLICATION_CREDENTIALS"]),
        "root_file_id": "0123456789ABCDEFGH",
    }

If I attempt to access that file (using commit 2b48baa), I get the error:

FileNotFoundError: Directory 0123456789ABCDEFGH has no child named Discovery Folder

  File "./pipelines/assets/base.py", line 210, in original_files
    with p.fs.open(p.path, mode="rb") as f:
  File "./lib/python3.10/site-packages/fsspec/spec.py", line 1295, in open
    f = self._open(
  File "./lib/python3.10/site-packages/gdrivefs/core.py", line 249, in _open
    return GoogleDriveFile(self, path, mode=mode, **kwargs)
  File "./lib/python3.10/site-packages/gdrivefs/core.py", line 270, in __init__
    super().__init__(fs, path, mode, block_size, autocommit=autocommit,
  File "./lib/python3.10/site-packages/fsspec/spec.py", line 1651, in __init__
    self.size = self.details["size"]
  File "./lib/python3.10/site-packages/fsspec/spec.py", line 1664, in details
    self._details = self.fs.info(self.path)
  File "./lib/python3.10/site-packages/fsspec/spec.py", line 662, in info
    out = self.ls(path, detail=True, **kwargs)
  File "./lib/python3.10/site-packages/gdrivefs/core.py", line 174, in ls
    files = self._ls_from_cache(path)
  File "./lib/python3.10/site-packages/fsspec/spec.py", line 372, in _ls_from_cache
    raise FileNotFoundError(path)

The root_file_id is set to the folder id of a GDrive Shared Drive (i.e. https://support.google.com/a/users/answer/7212025?hl=en).

As per https://developers.google.com/drive/api/guides/enable-shareddrives#:~:text=The%20supportsAllDrives%3Dtrue%20parameter%20informs,require%20additional%20shared%20drive%20functionality. we need to set supportsAllDrives=True and includeItemsFromAllDrives=True when calling files.list in order for the API client to find the files.

In my case, if I change the existing:

    def _list_directory_by_id(self, file_id, trashed=False, path_prefix=None):
        all_files = []
        page_token = None
        afields = 'nextPageToken, files(%s)' % fields
        query = f"'{file_id}' in parents  "
        if not trashed:
            query += "and trashed = false "
        while True:
            response = self.service.list(q=query,
                                         spaces=self.spaces, fields=afields,
                                         pageToken=page_token,
                                         ).execute()
            for f in response.get('files', []):
                all_files.append(_finfo_from_response(f, path_prefix))
            more = response.get('incompleteSearch', False)
            page_token = response.get('nextPageToken', None)
            if page_token is None:
                break
        return all_files

to

    def _list_directory_by_id(self, file_id, trashed=False, path_prefix=None):
        all_files = []
        page_token = None
        afields = 'nextPageToken, files(%s)' % fields
        query = f"'{file_id}' in parents  "
        if not trashed:
            query += "and trashed = false "
        while True:
            response = self.service.list(
                q=query,
                spaces=self.spaces, fields=afields,
                pageToken=page_token,
                includeItemsFromAllDrives=True,  # Required for shared drive support
                supportsAllDrives=True,    # Required for shared drive support
            ).execute()
            for f in response.get('files', []):
                all_files.append(_finfo_from_response(f, path_prefix))
            more = response.get('incompleteSearch', False)
            page_token = response.get('nextPageToken', None)
            if page_token is None:
                break
        return all_files

(note the change in the call to self.service.list)

then my code works, and the filesystem can find the file and open it successfully.

I am happy to prepare an MR, but you would need to decide whether you are happy for me to enable shared drive support in all cases, or whether you want to control it via storage_options. And if via storage_options whether it should default to off (completely backwards compatible) or on (may show new files to existing users with shared drives that they don't currently get returned from gdrivefs).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions