[python] Fix upsert by key to update all rows matching an existing key#8318
Open
XiaoHongbo-Hope wants to merge 8 commits into
Open
[python] Fix upsert by key to update all rows matching an existing key#8318XiaoHongbo-Hope wants to merge 8 commits into
XiaoHongbo-Hope wants to merge 8 commits into
Conversation
When the table already holds multiple rows sharing an upsert key (append-only tables allow duplicate keys), TableUpsertByKey updates only the last-scanned matching row and leaves the others stale. Add a test asserting the intended behavior (all matching rows updated); it currently fails. Fix to follow.
When the table already holds multiple rows sharing an upsert key, TableUpsertByKey updated only the last-scanned row and left the others stale. Collect all row ids per key (key -> [row_id, ...]) and expand each matched input row to every matching row id, so all of them are updated. Turns the previously failing test_upsert_for_existing_table_duplicate_keys green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
When updating an append-only table via
upsert_by_key, every existing row with a matching key should be updated. But if the table already has multiple rows sharing the same key (append-only tables allow duplicates), only one of them is updated — the rest silently keep their old values, leaving inconsistent rows for that key.This PR updates all matching rows.
update_colsbehavior and the single-match case are unchanged.Tests
table_upsert_by_key_test.py:test_upsert_for_existing_table_duplicate_keystest_existing_duplicate_keys_partial_update_colstest_existing_duplicate_keys_partitionedtest_multiple_keys_each_with_duplicatestest_composite_key_upsertto expect all duplicate matches updated