feat(bigframes): UDF transpiler handles some control flow#17558
feat(bigframes): UDF transpiler handles some control flow#17558TrevorBergeron wants to merge 12 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces control flow graph (CFG) construction and topological sorting to transpile Python bytecode with conditional branches and jumps into BigFrames expressions, supported by extensive unit tests. The review feedback highlights several critical improvements, including handling the JUMP_BACKWARD_NO_INTERRUPT instruction to prevent loops from being silently compiled, optimizing CFG building by precomputing instruction offsets to avoid O(N^2) sorting overhead, removing dead code for STORE_FAST, and correcting a misleading stack underflow error message.
| notnull_op = NotNullOp() | ||
|
|
||
|
|
||
| CoerceToBoolOp = base_ops.create_unary_op( |
There was a problem hiding this comment.
Let's clarify that the semantics here are meant to match those of Python if statements, which accept "truthy" and "falsey" objects. https://docs.python.org/3/library/stdtypes.html#truth-value-testing
There was a problem hiding this comment.
added docstring
| starts = {0} | ||
| for i, inst in enumerate(instructions): | ||
| opname = inst.opname | ||
| if "JUMP" in opname: |
There was a problem hiding this comment.
Are there different kinds of jumps that we need to be concerned with? I'm a bit worried that this "in" check will have false positives. Note: the other direction in the other branch is okay, since that's doing equality checks on those specific values.
Looks like we could refactor the jump types listed in get_block_successors as a common variable.
There was a problem hiding this comment.
removed all inexact matching and now enumerate all matching instructions exactly
| starts.add(inst.argval) | ||
| if i + 1 < len(instructions): | ||
| starts.add(instructions[i + 1].offset) | ||
| elif opname in ("RETURN_VALUE", "RETURN_CONST"): |
There was a problem hiding this comment.
Nit: common variable/constant for these return ops too, since they are used more than once.
| ): | ||
| return [last_inst.argval] | ||
|
|
||
| if "JUMP" in opname and ("IF" in opname or "OR_POP" in opname): |
There was a problem hiding this comment.
Same here. This direction for "in" comparisons scares me. Let's find an alternative.
There was a problem hiding this comment.
removed all inexact matching and now enumerate all matching instructions exactly
| queue.append(succ) | ||
|
|
||
| if len(order) != len(blocks): | ||
| raise ValueError( |
There was a problem hiding this comment.
Not needed right now, but I wonder if we add a TODO and a backlog item to support a subset of loops using generate array?
Then again, one couldn't modify state inside the loop for a natural translation to SQL, so maybe not?
There was a problem hiding this comment.
created issue 521549179. Loops should be possible to handle for simpler cases. Mutation is not so much the problem so much as the loop count being irreducible.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕