Skip to content

anon does not propagate across .save() boundaries #1778

@dmpetrov

Description

@dmpetrov

Description

#1712 / #1763 made read_storage() auto-detect anonymous access for public buckets, so users no longer need to type anon=True for the listing call. But the anon decision is purely session-scoped: it is not stored on the saved dataset, not stored on the listing dataset, not in any per-URI registry.

The moment a downstream stage runs dc.read_dataset(...) in a fresh Python process and a UDF calls file.open() / file.read(), the new process makes a fresh S3 HeadObject without anon and gets PermissionError: Forbidden. The original UX complaint of #1712 — "users / agents constantly make this mistake which slows down DC code generation" — reappears at every stage boundary of a multi-script pipeline.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions