Skip to content

athena.to_iceberg: Missing trickle down of "s3_output" parameter in write modes "overwrite" and "overwrite_partitions" #3218

@zhedu-del

Description

@zhedu-del

Describe the bug

https://github.com/aws/aws-sdk-pandas/pull/2767/files

In this PR, the "s3_output" parameter was added to other create/alter functions used in the "athena.to_iceberg()" method.

However, in the (non-default) "overwrite" and "overwrite_partitions" modes, the first call of the "_start_query_execution()" function also needs to have the "s3_output" input parameter passed to, otherwise it will try to create a default athena query bucket, instead of using the one specified in "s3_output" when calling "to_iceberg()", which will lead to errors/timeouts in Lambda/Glue Jobs, that do not and should not have the "s3:CreateBucket"
ERROR:awswrangler.athena._write_iceberg:Waiter BucketExists failed: Max attempts exceeded. Previously accepted state: Matched expected HTTP status code: 404

"s3_output=s3_output" should be added

here for "overwrite_partitions" :

delete_from_iceberg_table(

and here for "overwrite" :

delete_query_execution_id: str = _start_query_execution(

How to Reproduce

*P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.*

# this works when etl job has all the required standard read/write IAM permissions to athena, s3, glue 
wr.athena.to_iceberg(
    df=df,
    table_location=f"s3://{s3_bucket}/{table_id}",
    database=db,
    table=table_id,
    mode="append",
    partition_cols=["ts"],
    s3_output=f"s3://{athena_bucket}/iceberg-query-results/",
    temp_path=f"s3://{athena_bucket}/temp/",


# This does not work with the same IAM permissions as above, as now it tries to create the default athena bucket

wr.athena.to_iceberg(
    df=df,
    table_location=f"s3://{s3_bucket}/{table_id}",
    database=db,
    table=table_id,
    mode="overwrite",  # overwrite_partitions yields same error
    partition_cols=["ts"],
    s3_output=f"s3://{athena_bucket}/iceberg-query-results/",
    temp_path=f"s3://{athena_bucket}/temp/",

The error thrown will be
ERROR:awswrangler.athena._write_iceberg:Waiter BucketExists failed: Max attempts exceeded. Previously accepted state: Matched expected HTTP status code: 404

Expected behavior

If "s3_output" is given with an existing S3 Bucket, and mode="overwrite", the target glue table will be overwritten/created.
"delete_query_execution_id = _start_query_execution()" in "to_iceberg()" line 590 is successful without throwing exceptions

(Analogous for mode="overwrite_partitions" and "to_iceberg()" line 572)

Your project

No response

Screenshots

No response

OS

Linux (AWS Glue Pythonshell)

Python version

3.9

AWS SDK for pandas version

3.13.0

Additional context

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions