Skip to content

GH-50247: Reuse abstraction for null partitions in sorting functions#50248

Open
taepper wants to merge 12 commits into
apache:mainfrom
taepper:better-null-partitions
Open

GH-50247: Reuse abstraction for null partitions in sorting functions#50248
taepper wants to merge 12 commits into
apache:mainfrom
taepper:better-null-partitions

Conversation

@taepper

@taepper taepper commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Thanks for opening a pull request!

Rationale for this change

@pitrou mentioned this as a follow-up in #46926

What changes are included in this PR?

Refactoring sorting methods to reuse the helper methods avoid maintaining two abstractions for null partitions. The new abstraction was very seamless to implement in most cases, but a few spots required some care

In particular, these functions were severly simlpified by the new abstraction:

  • MarkDuplicates: duplicate nulls and nans were detected by checking every single row for Null one additional time, after we already had (and discarded) the nullness information
  • GenericMergeImpl: merging of null-ranges involved repartitioning null and nan values in every merge invocation. Now, we track this distinction and do not need any merge function for null and nan blocks

Are these changes tested?

Yes, the compute test suite passes as before

Are there any user-facing changes?

No.

@taepper taepper requested a review from pitrou as a code owner June 25, 2026 01:48
@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #50247 has been automatically assigned in GitHub to PR creator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant