Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyIceberg - MetaException(message='java.lang.IllegalArgumentException: bucket is null/empty') #1165

Open
malopezh opened this issue Sep 11, 2024 · 7 comments
Labels

Comments

@malopezh
Copy link

Apache Iceberg version

0.7.1 (latest release)

Please describe the bug 🐞

Problem:
Trying to create table in OCI Object Storage.
Metadata is successfully created but data is not.
Expected:
Iceberg structure created but just metadata is being created.
StackTrace:
Traceback (most recent call last): File "/home/marcolo/development/reorgParquets/.venv/lib/python3.10/site-packages/pyiceberg/catalog/__init__.py", line 418, in create_table_if_not_exists return self.create_table(identifier, schema, location, partition_spec, sort_order, properties) File "/home/marcolo/development/reorgParquets/.venv/lib/python3.10/site-packages/pyiceberg/catalog/hive.py", line 376, in create_table self._create_hive_table(open_client, tbl) File "/home/marcolo/development/reorgParquets/.venv/lib/python3.10/site-packages/pyiceberg/catalog/hive.py", line 325, in _create_hive_table open_client.create_table(hive_table) File "/home/marcolo/development/reorgParquets/.venv/lib/python3.10/site-packages/hive_metastore/ThriftHiveMetastore.py", line 3431, in create_table self.recv_create_table() File "/home/marcolo/development/reorgParquets/.venv/lib/python3.10/site-packages/hive_metastore/ThriftHiveMetastore.py", line 3457, in recv_create_table raise result.o3 hive_metastore.ttypes.MetaException: MetaException(message='java.lang.IllegalArgumentException: bucket is null/empty')

Iceberg Catalog:

`local_catalog = load_catalog(name='s3',
uri="thrift://localhost:9083",
warehouse= "s3a://my_bucket",
catalog_type= "hadoop",

                        **{
                            "s3.endpoint": "https://my-endpoint.com",
                            "s3.access-key-id": "myAccess-Key",
                            "s3.secret-access-key": "mySecret-Key",
                            "s3.session.token":"myToken",
                            "bucket_name": "s3a://my_bucket",
                            "hive.hive2-compatible": "true",
                            }
                         )`
@kevinjqliu
Copy link
Contributor

thanks for reporting this. can you add an example code of how you created the table?

@malopezh
Copy link
Author

malopezh commented Sep 12, 2024

thanks for reporting this. can you add an example code of how you created the table?

Hello!

Sure here you have the code:

`
schema = Schema(
NestedField(field_id=1, name="datetime", field_type=StringType(), required=False,current_schema=1),
NestedField(field_id=2, name="symbol", field_type=StringType(), required=False,current_schema=1),
NestedField(field_id=3, name="bid", field_type=FloatType(), required=False,current_schema=1),
NestedField(field_id=4, name="ask", field_type=DoubleType(), required=False,current_schema=1),
)

partition_spec = PartitionSpec(
PartitionField(
source_id=1, field_id=1000, transform=DayTransform(), name="datetime_day"
)
)

from pyiceberg.table.sorting import SortOrder, SortField
from pyiceberg.transforms import IdentityTransform
sort_order = SortOrder(SortField(source_id=2, transform=IdentityTransform()))

identifier = ("iceberg", "default")

tbl = local_catalog.create_table_if_not_exists(identifier=identifier,
schema=schema,
location="s3a://my_oci_bucket/my_folder",
partition_spec=partition_spec,
sort_order=sort_order, properties={})

tbl.overwrite(df)
`

NOTE: metadata is being created successfully

Thanks!!

@kevinjqliu
Copy link
Contributor

line 3457, in recv_create_table raise result.o3 hive_metastore.ttypes.MetaException: MetaException(message='java.lang.IllegalArgumentException: bucket is null/empty')

This error is not from pyiceberg, but possibly from your underlying (hadoop) fs
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java#L1102

catalog_type= "hadoop",

This is also not a valid catalog type

AVAILABLE_CATALOGS: dict[CatalogType, Callable[[str, Properties], Catalog]] = {
CatalogType.REST: load_rest,
CatalogType.HIVE: load_hive,
CatalogType.GLUE: load_glue,
CatalogType.DYNAMODB: load_dynamodb,
CatalogType.SQL: load_sql,
}

"bucket_name": "s3a://my_bucket",

bucket_name is not a valid parameter for catalog
https://py.iceberg.apache.org/configuration/#catalogs

@kevinjqliu
Copy link
Contributor

uri="thrift://localhost:9083",

Is this a HMS? I think the error is from the HMS setup

@malopezh
Copy link
Author

uri="thrift://localhost:9083",

Is this a HMS? I think the error is from the HMS setup

Yes it's HMS. I configured Hadoop, Hive and HiveMetaStore service also I configured MySQL. I was able to create a new namespace with local_catalog.create_namespace("myNS") but it's obvious that I missed something.

My intention is creating Iceberg Tables in OCI Object Storage. Is there any documentation I can check to achieve this?

Thanks for your responses.

@kevinjqliu
Copy link
Contributor

My intention is creating Iceberg Tables in OCI Object Storage. Is there any documentation I can check to achieve this?

I don't know any OCI related documentation. However, here's one on setting up a catalog and writing to it.
https://py.iceberg.apache.org/#connecting-to-a-catalog

I suggest getting that working and then replacing the catalog with your own.

Since you are using HMS, you should be using the Hive Catalog https://py.iceberg.apache.org/configuration/#hive-catalog
or similarly

load_catalog(..., catalog_type= "hive")

Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants