bug: build_fallback_field_id_map produces incorrect column indices for schemas with nested types

### Describe the bug

`build_fallback_field_id_map` maps Iceberg field IDs to wrong Parquet leaf column indices when the schema contains nested types (struct, list, map). This causes predicate evaluation to crash on migrated Parquet files (files without embedded field IDs).

**Error:**
"Leave column id in predicates isn't a root column in Parquet schema"

This affects migrated tables where Parquet files were written by Spark/Hive without Iceberg field IDs, then imported via `add_files` or `importSparkTable()`.

### Root Cause

#### How fallback field IDs work

When a Parquet file lacks embedded field IDs, iceberg-rust assigns position-based fallback IDs. Two functions must agree on the mapping:

1. `add_fallback_field_ids_to_arrow_schema` — assigns field IDs 1, 2, 3... to **top-level** Arrow schema fields
2. `build_fallback_field_id_map` — maps those field IDs to Parquet **leaf** column indices for predicate evaluation

#### What goes wrong

`build_fallback_field_id_map` iterates over `parquet_schema.columns()` (leaf columns) instead of top-level fields. Nested types expand into multiple leaves,
causing the mapping to diverge from the Arrow schema's field IDs.

**Example:** `name: string, address: struct(street: string, city: string), id: int`

| | Arrow top-level fields | Parquet leaf columns |
|---|---|---|
| Fields | name, address, id | name, street, city, id |
| Assigned field IDs | 1, 2, 3 | 1, 2, 3, 4 (bug) |

When a predicate references `id` (field_id=3 from Arrow), the column map returns leaf index 2 (`city`, inside the `address` group). `PredicateConverter::bound_reference` then calls `get_column_root(2).is_group()` → `true` → error.

### How Iceberg Java handles this

Java's [`ParquetSchemaUtil.addFallbackIds()`](https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/parquet/ParquetSchemaUtil.java#L174-L184) iterates **top-level fields**, not leaf columns:

```java
public static MessageType addFallbackIds(MessageType fileSchema) {
    MessageTypeBuilder builder = org.apache.parquet.schema.Types.buildMessage();
    int ordinal = 1;
    for (Type type : fileSchema.getFields()) {
        builder.addField(type.withId(ordinal));
        ordinal += 1;
    }
    return builder.named(fileSchema.getName());
}
```

Additionally, Java's https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java gracefully handles nested types — predicates on nested columns return ROWS_MIGHT_MATCH instead of crashing.

### Proposed Fix

Change `build_fallback_field_id_map` to iterate over `parquet_schema.root_schema().get_fields()`` (top-level fields) instead of `parquet_schema.columns()`` (leaf columns).
 For each top-level field:
- If primitive: map `ordinal` → `leaf_column_index`
- If group (struct/list/map): skip the mapping, advance the leaf counter past all leaves in that group

This makes `build_fallback_field_id_map` consistent with `add_fallback_field_ids_to_arrow_schema`, which already correctly iterates top-level Arrow fields.

`PredicateConverter::bound_reference` already validates that the resolved column is a root column and rejects groups, so no changes are needed there.

Files to modify

1. `crates/iceberg/src/arrow/reader.rs — build_fallback_field_id_map`

Related

- https://github.com/apache/datafusion-comet/issues/3860: Downstream issue in Comet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: build_fallback_field_id_map produces incorrect column indices for schemas with nested types #2306

Describe the bug

Root Cause

How fallback field IDs work

What goes wrong

How Iceberg Java handles this

Proposed Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	Arrow top-level fields	Parquet leaf columns
Fields	name, address, id	name, street, city, id
Assigned field IDs	1, 2, 3	1, 2, 3, 4 (bug)

bug: build_fallback_field_id_map produces incorrect column indices for schemas with nested types #2306

Description

Describe the bug

Root Cause

How fallback field IDs work

What goes wrong

How Iceberg Java handles this

Proposed Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions