Description
#1712 / #1763 made read_storage() auto-detect anonymous access for public buckets, so users no longer need to type anon=True for the listing call. But the anon decision is purely session-scoped: it is not stored on the saved dataset, not stored on the listing dataset, not in any per-URI registry.
The moment a downstream stage runs dc.read_dataset(...) in a fresh Python process and a UDF calls file.open() / file.read(), the new process makes a fresh S3 HeadObject without anon and gets PermissionError: Forbidden. The original UX complaint of #1712 — "users / agents constantly make this mistake which slows down DC code generation" — reappears at every stage boundary of a multi-script pipeline.
Description
#1712 / #1763 made read_storage() auto-detect anonymous access for public buckets, so users no longer need to type
anon=Truefor the listing call. But the anon decision is purely session-scoped: it is not stored on the saved dataset, not stored on the listing dataset, not in any per-URI registry.The moment a downstream stage runs
dc.read_dataset(...)in a fresh Python process and a UDF callsfile.open()/file.read(), the new process makes a fresh S3 HeadObject without anon and gets PermissionError: Forbidden. The original UX complaint of #1712 — "users / agents constantly make this mistake which slows down DC code generation" — reappears at every stage boundary of a multi-script pipeline.