Skip to content

fix: Restore direct Arrow thread pool control inside Parquet format library#284

Merged
lxy-9602 merged 2 commits into
alibaba:mainfrom
lxy-9602:fix-arrow-thread-pool
May 15, 2026
Merged

fix: Restore direct Arrow thread pool control inside Parquet format library#284
lxy-9602 merged 2 commits into
alibaba:mainfrom
lxy-9602:fix-arrow-thread-pool

Conversation

@lxy-9602
Copy link
Copy Markdown
Collaborator

Purpose

Linked issue: #68

Motivation

When Arrow is statically linked into multiple shared libraries (libpaimon.so and libpaimon_parquet_file_format.so), each .so gets its own copy of Arrow's CpuThreadPool singleton.

Commit d3bb3a9 introduced paimon::SetArrowCpuThreadPoolCapacity() in libpaimon.so as a wrapper around arrow::SetCpuThreadPoolCapacity(), and removed the direct arrow::SetCpuThreadPoolCapacity() call from parquet_file_batch_reader.cpp (which lives in libpaimon_parquet_file_format.so). This means setting the thread pool capacity through the wrapper only affects the singleton inside libpaimon.so, never the one inside libpaimon_parquet_file_format.so — making Parquet read thread control completely ineffective.

On a 96-core machine, this caused Arrow to spawn 96 CpuThreadPool workers + 8 IOThreadPool workers inside libpaimon_parquet_file_format.so, regardless of any capacity setting by the user.

Changes

This PR reverts commit d3bb3a9 to restore the original behavior where parquet_file_batch_reader.cpp directly calls arrow::SetCpuThreadPoolCapacity(), ensuring the call targets the correct singleton within libpaimon_parquet_file_format.so.

Tests

API and Format

Documentation

Generative AI tooling

lxy-9602 added 2 commits May 15, 2026 11:22
…conf (alibaba#68)"

This reverts commit d3bb3a9.

The original commit moved arrow::SetCpuThreadPoolCapacity from
libpaimon_parquet_file_format.so (direct call) to libpaimon.so
(via paimon::SetArrowCpuThreadPoolCapacity wrapper). Since Arrow
is statically linked into both .so files, each has its own
CpuThreadPool singleton. Setting capacity through libpaimon.so
never affects the singleton inside libpaimon_parquet_file_format.so,
making thread control ineffective for Parquet reads.
@lxy-9602
Copy link
Copy Markdown
Collaborator Author

@Eyizoha I came across some unexpected behavior related to the earlier change #68 and opened this pr to fix.
The root cause and solution are detailed in the PR description.
Would you mind taking a quick look to see if you agree with the approach?

Copy link
Copy Markdown
Collaborator

@lucasfang lucasfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@Eyizoha
Copy link
Copy Markdown
Contributor

Eyizoha commented May 15, 2026

+1

@lxy-9602 lxy-9602 merged commit de9815b into alibaba:main May 15, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants