Skip to content

Comments

GH-47279: [C++] Support BinaryView/StringView in ReferencedBufferSize#49373

Open
veeceey wants to merge 2 commits intoapache:mainfrom
veeceey:fix/issue-47279-nbytes-string-view
Open

GH-47279: [C++] Support BinaryView/StringView in ReferencedBufferSize#49373
veeceey wants to merge 2 commits intoapache:mainfrom
veeceey:fix/issue-47279-nbytes-string-view

Conversation

@veeceey
Copy link

@veeceey veeceey commented Feb 23, 2026

ReferencedBufferSize (used by pyarrow's Table.nbytes / Array.nbytes) was missing a visitor for BinaryViewType, which caused:

ArrowTypeError: Extracting byte ranges not supported for type string_view

when calling .nbytes on any table/array containing string_view or binary_view columns.

The fix adds a Visit(const BinaryViewType&) handler to GetByteRangesArray that accounts for:

  1. The validity bitmap (buffer 0)
  2. The views buffer (buffer 1) - fixed-width, 16 bytes per element
  3. Out-of-line data buffers (buffers 2+) - only the ranges actually referenced by non-inline views

Since StringViewType inherits from BinaryViewType, both types are handled.

Also added tests for inline-only, mixed, and out-of-line binary_view/string_view arrays.

Fixes #47279

…erSize

The `ReferencedBufferSize` function (used by pyarrow's `nbytes` property)
was missing a visitor for `BinaryViewType`, causing an `ArrowTypeError`
when calling `table.nbytes` on tables containing string_view or
binary_view columns.

Add a `Visit(const BinaryViewType&)` handler to `GetByteRangesArray`
that correctly accounts for the views buffer (fixed-width, 16 bytes per
element) and any out-of-line data buffers referenced by non-inline
views. Since `StringViewType` inherits from `BinaryViewType`, this
handles both types.
@github-actions
Copy link

⚠️ GitHub issue #47279 has been automatically assigned in GitHub to PR creator.

@veeceey
Copy link
Author

veeceey commented Feb 23, 2026

Test Results

Code structure verified - braces balanced, visitor correctly added. The C++ build requires cmake setup which I don't have locally, but the implementation follows the exact same patterns as the existing VisitBaseBinary handler.

The visitor handles:

  • Validity bitmap via VisitBitmap()
  • Views buffer as fixed-width (16 bytes per view)
  • Out-of-line data buffers with min/max offset tracking per buffer index

Added test cases:

  • BinaryViewInline: all-inline strings (no data buffers)
  • StringViewInline: string_view (inherits BinaryViewType)
  • BinaryViewOutOfLine: strings >12 bytes stored out-of-line

@github-actions github-actions bot added the awaiting review Awaiting review label Feb 23, 2026
…tting

The new BinaryView/StringView test cases were placed outside the
arrow::util namespace closing braces, causing compilation errors due
to unresolved symbols (Array, ArrayFromJSON, binary_view, utf8_view,
ReferencedBufferSize). Moved the tests back inside the namespace.

Also fixed lint formatting: collapsed short RETURN_NOT_OK calls to
single lines and wrapped a long comment to stay within line limits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Unable to calculate pyarrow.Table.nbytes if column type is string_view

1 participant