-
Notifications
You must be signed in to change notification settings - Fork 720
Description
Describe the bug
https://github.com/aws/aws-sdk-pandas/pull/2767/files
In this PR, the "s3_output" parameter was added to other create/alter functions used in the "athena.to_iceberg()" method.
However, in the (non-default) "overwrite" and "overwrite_partitions" modes, the first call of the "_start_query_execution()" function also needs to have the "s3_output" input parameter passed to, otherwise it will try to create a default athena query bucket, instead of using the one specified in "s3_output" when calling "to_iceberg()", which will lead to errors/timeouts in Lambda/Glue Jobs, that do not and should not have the "s3:CreateBucket"
ERROR:awswrangler.athena._write_iceberg:Waiter BucketExists failed: Max attempts exceeded. Previously accepted state: Matched expected HTTP status code: 404
"s3_output=s3_output" should be added
here for "overwrite_partitions" :
| delete_from_iceberg_table( |
and here for "overwrite" :
| delete_query_execution_id: str = _start_query_execution( |
How to Reproduce
*P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.*
# this works when etl job has all the required standard read/write IAM permissions to athena, s3, glue
wr.athena.to_iceberg(
df=df,
table_location=f"s3://{s3_bucket}/{table_id}",
database=db,
table=table_id,
mode="append",
partition_cols=["ts"],
s3_output=f"s3://{athena_bucket}/iceberg-query-results/",
temp_path=f"s3://{athena_bucket}/temp/",
# This does not work with the same IAM permissions as above, as now it tries to create the default athena bucket
wr.athena.to_iceberg(
df=df,
table_location=f"s3://{s3_bucket}/{table_id}",
database=db,
table=table_id,
mode="overwrite", # overwrite_partitions yields same error
partition_cols=["ts"],
s3_output=f"s3://{athena_bucket}/iceberg-query-results/",
temp_path=f"s3://{athena_bucket}/temp/",
The error thrown will be
ERROR:awswrangler.athena._write_iceberg:Waiter BucketExists failed: Max attempts exceeded. Previously accepted state: Matched expected HTTP status code: 404
Expected behavior
If "s3_output" is given with an existing S3 Bucket, and mode="overwrite", the target glue table will be overwritten/created.
"delete_query_execution_id = _start_query_execution()" in "to_iceberg()" line 590 is successful without throwing exceptions
(Analogous for mode="overwrite_partitions" and "to_iceberg()" line 572)
Your project
No response
Screenshots
No response
OS
Linux (AWS Glue Pythonshell)
Python version
3.9
AWS SDK for pandas version
3.13.0
Additional context
