Closed
Description
Question
By trying partitionning my table I've got one error:
ValueError: Could not find in old schema: 2: {field}: identity(2)
I've drowned myself in the documentation, stackoverflow and medium to find one answer.
I even tried chatGPT but without sucess :D
I've used local SQLite and MinIO server to develop a "proof-of-concept".
Next, the code to reproduce the issue:
from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.partitioning import DayTransform, PartitionSpec, PartitionField
import pyarrow as pa
warehouse_path = "local_s3"
catalog = SqlCatalog(
"default",
**{
"uri": f"sqlite:///{warehouse_path}/catalog.db",
"warehouse": "http://localhost:9001",
"s3.endpoint": "http://localhost:9001",
"s3.access-key-id": "minio_user",
"s3.secret-access-key": "minio1234",
},
)
catalog.create_namespace_if_not_exists('my_namespace')
ts_schema = pa.schema([
pa.field('timestamp', pa.timestamp('s'), nullable=False), # Assuming timestamp with seconds precision
pa.field('campaign_id', pa.uint8(), nullable=False),
pa.field('temperature', pa.float32()),
pa.field('pressure', pa.float32()),
pa.field('humidity', pa.int32()),
pa.field('led_0', pa.bool_())
])
# Define partitioning spec for campaign_ID
ts_partition_spec = PartitionSpec(
PartitionField(
field_id=2,
source_id=2,
transform=IdentityTransform(),
name="campaign_id"
)
)
time_series_table = catalog.create_table(
'my_namespace.time_series',
schema=ts_schema,
partition_spec=ts_partition_spec # <= raises error !!
)
My purpose is to partition the table by campaign_id.
Is it possible? If yes, how?
How to interpret the documentation from api documentation ?
I tried with timestamp field and the DayTransform such as:
ts_partition_spec = PartitionSpec(
PartitionField(
source_id=1, field_id=100, transform=DayTransform(), name="timestamp_day"
)
)
It raised the same error.
ValueError: Could not find in old schema: 100: timestamp_day: Day(1)
Metadata
Metadata
Assignees
Labels
No labels