Dynamic prefetch

### Description

Currently prefetch is fixed (default = 2), which works poorly for different file sizes and always requires manual tuning. It's especially painful in large amount of small files.

**Prefetch** can be derived automatically from file size.

**Parallelism** is more sensitive and should stay user-controlled. The computed prefetch can be treated as a total budget and split across workers.

Heuristic:
```python
if avg_file_size <= 256 * KiB:   # aggressive - latency/overhead is dominating
    total_prefetch = math.clamp(4 * MiB / avg_file_size, 8, 128) # 4 MiB is the target total size of data in-flight
elif avg_file_size <= 64 * MiB:  # moderate
    total_prefetch = 8
else:
    total_prefetch = 2  # conservative

prefetch = math.ceil(total_prefetch / parallel)
```

This should be set before UDF runs that takes File as input. `chain.avg("file.size")` is the only overhead. If `settings(prefetch=5)` takes priority if defined. If multiple File - use 1st one for estimation.

This removes the need for manual tuning and adapts to small vs large files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic prefetch #1706

Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Dynamic prefetch #1706

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions