Skip to content

feat(inspect): add ArrowRowBuilder for materializing Arrow batches#780

Open
WZhuo wants to merge 1 commit into
apache:mainfrom
WZhuo:arrow-row-builder
Open

feat(inspect): add ArrowRowBuilder for materializing Arrow batches#780
WZhuo wants to merge 1 commit into
apache:mainfrom
WZhuo:arrow-row-builder

Conversation

@WZhuo

@WZhuo WZhuo commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds ArrowRowBuilder (inspect/row_builder_internal), a schema-driven helper that materializes in-memory rows into an Arrow ArrowArray (a batch) for an arbitrary Iceberg schema. It wraps the nanoarrow boilerplate and exposes per-column access plus typed append helpers, so metadata tables (snapshots, history, manifests, …) can emit rows without re-implementing it.

This is the first of a series splitting metadata-table support into focused PRs; the InMemoryBatchReader and the SnapshotsTable::Scan integration are intended to follow in separate PRs that build on this.

What's included

  • ArrowRowBuilder::Make/column/FinishRow/Finish and typed helpers AppendNull/AppendBoolean/AppendInt/AppendString/AppendStringMap.
  • The implementation lives in the core iceberg library — it only needs nanoarrow + ToArrowSchema (no Apache Arrow), matching peers like manifest_adapter and arrow_c_data_util.
  • Unit tests in row_builder_test.cc covering typed appends (int32/string/int64/boolean/map), null handling for optional columns, multi-entry/empty string maps, zero-row batches, and column-index bounds. Compiled into the metadata_table_test target.

Testing

  • CMake (Ninja): cmake --build build --target metadata_table_test then ran it — 9/9 tests pass (4 existing MetadataTableTest + 5 new ArrowRowBuilderTest). ctest green.
  • The test verifies output by importing the produced C-data into Apache Arrow (arrow::ImportRecordBatch), so its target is USE_BUNDLE.

Notes

  • The test is registered under CMake's bundle build only. The meson build (which has no Apache Arrow/bundle layer) is left unchanged; the core-only metadata_table_test.cc continues to build there.
  • Developed with AI-assisted tooling, reviewed by the author.

Add ArrowRowBuilder (inspect/row_builder_internal) to materialize
in-memory rows into an ArrowArray for an arbitrary Iceberg schema,
with typed append helpers (AppendNull/Boolean/Int/String/StringMap)
reused by later metadata tables.
@WZhuo WZhuo force-pushed the arrow-row-builder branch from b0a8615 to b42f0da Compare June 25, 2026 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant