Skip to content

[Python][FS][Azure] Pickling SubTreeFileSystem(base_path, AzureFileSystem(...)) is lossy #49078

@Tom-Newton

Description

@Tom-Newton

Describe the bug, including details regarding any error messages, version, and platform.

Reproduce:

import pyarrow.fs

azure_fs = pyarrow.fs.AzureFileSystem(account_name="test", sas_token="test")
print(azure_fs.__reduce__())

subtree_fs = pyarrow.fs.SubTreeFileSystem("/tmp", azure_fs)
print(subtree_fs.base_fs.__reduce__())

Returns

(<cyfunction AzureFileSystem._reconstruct at 0x79d22010c940>, ({'account_name': 'test', 'account_key': '', 'blob_storage_authority': '.blob.core.windows.net', 'blob_storage_scheme': 'https', 'client_id': '', 'client_secret': '', 'dfs_storage_authority': '.dfs.core.windows.net', 'dfs_storage_scheme': 'https', 'sas_token': 'test', 'tenant_id': ''},))
(<cyfunction AzureFileSystem._reconstruct at 0x79d22010c940>, ({'account_name': 'test', 'account_key': '', 'blob_storage_authority': '.blob.core.windows.net', 'blob_storage_scheme': 'https', 'client_id': '', 'client_secret': '', 'dfs_storage_authority': '.dfs.core.windows.net', 'dfs_storage_scheme': 'https', 'sas_token': '', 'tenant_id': ''},))

Notice how the first result the sas_token is not empty but the second one is.

Cause:

The sas_token and a couple of the other values returned by AzureFileSystem.__reduce__ read from self of the python side AzureFileSystem object. When constructing a SubTreeFileSystem, the python side AzureFileSystem object is discarded and the SubTreeFileSystem only holds a pointer to the CAzureFileSystem. Therefore its not possible to reconstruct a python side AzureFileSystem including the attributes on self of the original AzureFileSystem.

Component(s)

Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions