Skip to content

I do not understand the partition error: ValueError: Could not find in old schema: 2: {field}: identity(2) #1100

Closed
@cfrancois7

Description

@cfrancois7

Question

By trying partitionning my table I've got one error:
ValueError: Could not find in old schema: 2: {field}: identity(2)
I've drowned myself in the documentation, stackoverflow and medium to find one answer.
I even tried chatGPT but without sucess :D

I've used local SQLite and MinIO server to develop a "proof-of-concept".
Next, the code to reproduce the issue:

from pyiceberg.catalog.sql import SqlCatalog
from pyiceberg.partitioning import DayTransform, PartitionSpec, PartitionField
import pyarrow as pa

warehouse_path = "local_s3"
catalog = SqlCatalog(
    "default",
    **{
        "uri": f"sqlite:///{warehouse_path}/catalog.db",
        "warehouse": "http://localhost:9001",
        "s3.endpoint": "http://localhost:9001",
        "s3.access-key-id": "minio_user",
        "s3.secret-access-key": "minio1234",
    },
)
catalog.create_namespace_if_not_exists('my_namespace')



ts_schema = pa.schema([
    pa.field('timestamp', pa.timestamp('s'), nullable=False),  # Assuming timestamp with seconds precision
    pa.field('campaign_id', pa.uint8(), nullable=False),
    pa.field('temperature', pa.float32()),
    pa.field('pressure', pa.float32()),
    pa.field('humidity', pa.int32()),
    pa.field('led_0', pa.bool_())
])

# Define partitioning spec for campaign_ID
ts_partition_spec = PartitionSpec(
    PartitionField(
        field_id=2,
        source_id=2,
        transform=IdentityTransform(), 
        name="campaign_id"
    )
)
time_series_table = catalog.create_table(
    'my_namespace.time_series',
    schema=ts_schema,
    partition_spec=ts_partition_spec  #  <= raises error !!
)

My purpose is to partition the table by campaign_id.
Is it possible? If yes, how?
How to interpret the documentation from api documentation ?

I tried with timestamp field and the DayTransform such as:

ts_partition_spec = PartitionSpec(
    PartitionField(
        source_id=1, field_id=100, transform=DayTransform(), name="timestamp_day"
    )
)

It raised the same error.
ValueError: Could not find in old schema: 100: timestamp_day: Day(1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions