GH-50247: Reuse abstraction for null partitions in sorting functions#50248
Open
taepper wants to merge 12 commits into
Open
GH-50247: Reuse abstraction for null partitions in sorting functions#50248taepper wants to merge 12 commits into
taepper wants to merge 12 commits into
Conversation
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thanks for opening a pull request!
Rationale for this change
@pitrou mentioned this as a follow-up in #46926
What changes are included in this PR?
Refactoring sorting methods to reuse the helper methods avoid maintaining two abstractions for null partitions. The new abstraction was very seamless to implement in most cases, but a few spots required some care
In particular, these functions were severly simlpified by the new abstraction:
MarkDuplicates: duplicate nulls and nans were detected by checking every single row forNullone additional time, after we already had (and discarded) the nullness informationGenericMergeImpl: merging ofnull-ranges involved repartitioningnullandnanvalues in every merge invocation. Now, we track this distinction and do not need any merge function fornullandnanblocksAre these changes tested?
Yes, the compute test suite passes as before
Are there any user-facing changes?
No.