Skip to content

feat: add by:name to append transform#6037

Draft
kgutwin wants to merge 5 commits into
PRQL:mainfrom
bioteam:kg/5165/append-by-name
Draft

feat: add by:name to append transform#6037
kgutwin wants to merge 5 commits into
PRQL:mainfrom
bioteam:kg/5165/append-by-name

Conversation

@kgutwin

@kgutwin kgutwin commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Closes #5165.

This PR adds the named argument by to the append transform, supporting by:name and by:position.

prql target:sql.duckdb

from foo | append by:name bar

produces:

SELECT
  *
FROM
  foo
UNION
ALL BY NAME
SELECT
  *
FROM
  bar

A review of documentation for currently supported dialects suggests that BigQuery, Snowflake, and DuckDB support UNION ALL BY NAME.

@kgutwin

kgutwin commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

@max-sixty @prql-bot why is tend-review failing?

@prql-bot

Copy link
Copy Markdown
Collaborator

It's not your PR — tend-review is hitting an infra issue that affects every fork PR right now.

actions/checkout@v7.0.0 (2026-06-18) added a guard (PR #2454) that refuses to check out fork-PR code under pull_request_target unless allow-unsafe-pr-checkout: true is set. tend-review.yaml is on: pull_request_target and passes the fork ref straight to actions/checkout, so since this PR is from a fork (bioteam/prql) the checkout step fails before the review ever runs:

##[error]Refusing to check out fork pull request code from a 'pull_request_target' workflow. ... set 'allow-unsafe-pr-checkout: true' on the actions/checkout step.

This is already fixed upstream in tend (max-sixty/tend#723#725, shipped in 0.1.7; this repo is still on 0.1.6). I've opened #6038 with the matching one-line stopgap so fork-PR reviews work again ahead of the full 0.1.7 regen.

None of this gates your PR's actual test suite — the tests matrix is what matters for merge, and tend-review is advisory.

@max-sixty

Copy link
Copy Markdown
Member

thanks @prql-bot . please re-run now

@prql-bot prql-bot closed this Jun 24, 2026
@prql-bot prql-bot reopened this Jun 24, 2026
@prql-bot

Copy link
Copy Markdown
Collaborator

Done — tend-review is running cleanly again (run).

A note for next time: GitHub's "Re-run" button (and gh run rerun) doesn't help here. Re-running a pull_request_target run reuses the workflow file from the original trigger, so the rerun still had allow-unsafe-pr-checkout: false and failed at the same checkout guard. The fix only takes effect on a fresh pull_request_target event, so I closed→reopened this PR to fire a reopened event against the now-fixed workflow on main. The actions/checkout@v7 step passed this time, and the review proceeded.

@prql-bot prql-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing as a draft — flagging anything that looks worth a quick fix. Mark ready for a full review.

The structure looks solid: the AppendBy enum threads cleanly through the PL/RQ folds and lowering, the by:position default keeps existing append calls working, and the set_quantifier match is exhaustive over (distinct, set_ops_distinct, by). Two things to consider before marking ready:

  • Docs aren't updated. web/book/src/reference/stdlib/transforms/append.md still describes append as a plain UNION ALL with no mention of by:name / by:position. Worth documenting the new argument (and the dialect support matrix) there.

  • The unsupported-dialect error leaks the internal dialect struct name. format!("The dialect {:?} does not support UNION BY NAME", ctx.dialect) renders as GenericDialect — the Rust variant name rather than the user-facing target (sql.generic). The rest of sql/ phrases these as "Target dialect does not support ..." (e.g. the *-position error in gen_expr.rs). Aligning with that convention would be more consistent — note the append_by_name_unsupported_dialect snapshot in error_messages.rs would need updating too. Attaching a span (as that * error does with .with_span) would also point the user at the by:name token if one's reachable at that point in preprocess::union.

One thing I couldn't verify: the Snowflake test_append_by_name_distinct snapshot emits bare UNION BY NAME (no ALL/DISTINCT). The snapshot only checks generated SQL, not that Snowflake accepts it — worth confirming against a live instance before relying on it.

@prql-bot prql-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing as a draft — flagging anything that looks worth a quick fix. Mark ready for a full review.

The error-message wording from the last round is sorted (now Target dialect does not support UNION BY NAME + hint). Two things on the incremental changes:

  • The todo!() in append_by_name is what's breaking the lineage path. test_append_by_name_distinct uses bare tables (from foo | append by:name bar), whose lineage is a single LineageColumn::All wildcard rather than named Single columns — so the LineageColumn::All { .. } => todo!() arm panics (not yet implemented at transforms.rs, the append_by_name fn). That's likely the "stuck on lineage" symptom: the by-name merge only handles the explicit-column case, and falls over as soon as either side carries a *. Per CLAUDE.md, this also shouldn't ship as a todo!() since it's reachable from user input.

  • cargo clippy will fail on the no-arg format! (inline suggestion below) — clippy::useless_format, on by default. Quick fix since the string has no interpolation.

The commented-out SqlTransform::Super(Transform::Select(...)) block in preprocess.rs is presumably the in-progress fallback (it's why test_append_by_name_fallback doesn't emit the NULL AS … alignment yet) — noting it so it doesn't get left in.

Comment on lines +316 to +318
return Err(Error::new_simple(format!(
"Target dialect does not support UNION BY NAME"
))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format! with no interpolation trips clippy::useless_format (a default-on lint, which the CI denies). new_simple takes Into<String>, so pass the literal directly:

Suggested change
return Err(Error::new_simple(format!(
"Target dialect does not support UNION BY NAME"
))
return Err(Error::new_simple(
"Target dialect does not support UNION BY NAME",
))

@kgutwin

kgutwin commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

I'm actually going to step back from this PR for the moment, as it's getting much more complex than I had initially anticipated.

I started working on a "fallback" implementation for by:name, when the dialect does not support UNION BY NAME. If we do know the full set of columns for both top and bottom, we should be able to generate a PQ structure that resembles what would be generated by the PRQL:

from foo
select { x, y, z = null }
append (from bar | select { x = null, y, z })

However, following that path made me realize that the base implementation (even without the fallback) requires updating the lineage process, which also had the previous assumption that both top and bottom relations had the same number of columns. I started working on that challenge, but got stuck. The code in the PR right now, given the following PRQL:

prql target sql:duckdb

from foo
select {x, y}
append by:name (from bar | select {y, z})

generates the SQL:

SELECT x, y, z FROM foo
UNION ALL BY NAME
SELECT y, z FROM bar

I'm not quite following why that z is showing up in the first SELECT, and I've run out of time to chase this down now. Perhaps I will pick it up again later...

@prql-bot

Copy link
Copy Markdown
Collaborator

The z is leaking across the union boundary because the by-name merge collapses both sides into a single output lineage, and the lowerer then applies that one column list to the top branch.

Walking it through with prqlc debug lineage on your example, the Append frame ends up with three columns — foo.x, foo.y, and bar.z (the bar.y was deduped out by the name-collision check, and bar.z was appended):

Lineage of the append by:name frame
- - 1:23-64                # the Append
  - columns:
    - !Single { name: [foo, x], target_id: 134 }
    - !Single { name: [foo, y], target_id: 135 }
    - !Single { name: [bar, z], target_id: 124 }   # <- belongs to bar, not foo
    inputs:
    - { id: 132, name: foo }
    - { id: 121, name: bar }

That merged lineage is what append_by_name builds. The lowerer's push_select then derives a single final projection for the whole top-level pipeline from it — [134, 135, 124] = [foo.x, foo.y, bar.z]. When SQL-gen splits the pipeline at the Append into two UNION branches, that unified list is applied to the top branch. CID 124 is bar.z, but rendered against foo it just emits the bare name z — hence SELECT x, y, z FROM foo.

The reason position-based append never hits this: it requires both relations to have matching columns (that's the cannot combine relations with different numbers of columns error), so the merged lineage is identical to each branch's own columns and applying one unified select to both is correct. by:name deliberately relaxes that invariant, so the unified select now contains a column (bar.z) that the top relation doesn't physically have.

For the DB-native UNION ALL BY NAME path the column alignment is the database's job, so each branch wants to keep its own native projection (SELECT x, y FROM foo UNION ALL BY NAME SELECT y, z FROM bar) rather than being forced through one merged push_select. So the fix probably can't live purely in the lineage merge — the merged lineage is a fine description of the output, but something at the union-lowering boundary needs to stop that single projection from crossing into the individual branches. That's also why the bare-table case (from foo | append by:name bar) is a separate problem: there the per-branch columns are * wildcards (LineageColumn::All), which is the todo!() at transforms.rs:929 — at HEAD that one panics rather than mis-projecting.

No rush given you're stepping back — just wanted to leave the trace here so it's not cold if you pick it up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Should append sometimes or always create UNION ALL BY NAME?

3 participants