Skip to content

feat(bigframes): UDF transpiler handles some control flow#17558

Open
TrevorBergeron wants to merge 12 commits into
mainfrom
tbergeron_more_py_funcs
Open

feat(bigframes): UDF transpiler handles some control flow#17558
TrevorBergeron wants to merge 12 commits into
mainfrom
tbergeron_more_py_funcs

Conversation

@TrevorBergeron

Copy link
Copy Markdown
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces control flow graph (CFG) construction and topological sorting to transpile Python bytecode with conditional branches and jumps into BigFrames expressions, supported by extensive unit tests. The review feedback highlights several critical improvements, including handling the JUMP_BACKWARD_NO_INTERRUPT instruction to prevent loops from being silently compiled, optimizing CFG building by precomputing instruction offsets to avoid O(N^2) sorting overhead, removing dead code for STORE_FAST, and correcting a misleading stack underflow error message.

Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
Comment thread packages/bigframes/bigframes/core/bytecode.py Outdated
@TrevorBergeron TrevorBergeron requested a review from tswast June 24, 2026 19:42
@TrevorBergeron TrevorBergeron marked this pull request as ready for review June 24, 2026 19:42
@TrevorBergeron TrevorBergeron requested review from a team as code owners June 24, 2026 19:42
notnull_op = NotNullOp()


CoerceToBoolOp = base_ops.create_unary_op(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify that the semantics here are meant to match those of Python if statements, which accept "truthy" and "falsey" objects. https://docs.python.org/3/library/stdtypes.html#truth-value-testing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added docstring

starts = {0}
for i, inst in enumerate(instructions):
opname = inst.opname
if "JUMP" in opname:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there different kinds of jumps that we need to be concerned with? I'm a bit worried that this "in" check will have false positives. Note: the other direction in the other branch is okay, since that's doing equality checks on those specific values.

Looks like we could refactor the jump types listed in get_block_successors as a common variable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed all inexact matching and now enumerate all matching instructions exactly

starts.add(inst.argval)
if i + 1 < len(instructions):
starts.add(instructions[i + 1].offset)
elif opname in ("RETURN_VALUE", "RETURN_CONST"):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: common variable/constant for these return ops too, since they are used more than once.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

):
return [last_inst.argval]

if "JUMP" in opname and ("IF" in opname or "OR_POP" in opname):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This direction for "in" comparisons scares me. Let's find an alternative.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed all inexact matching and now enumerate all matching instructions exactly

queue.append(succ)

if len(order) != len(blocks):
raise ValueError(

@tswast tswast Jun 25, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed right now, but I wonder if we add a TODO and a backlog item to support a subset of loops using generate array?

Then again, one couldn't modify state inside the loop for a natural translation to SQL, so maybe not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created issue 521549179. Loops should be possible to handle for simpler cases. Mutation is not so much the problem so much as the loop count being irreducible.

@TrevorBergeron TrevorBergeron requested a review from tswast June 26, 2026 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants