feat: add `by:name` to `append` transform by kgutwin · Pull Request #6037 · PRQL/prql

kgutwin · 2026-06-24T18:50:21Z

Closes #5165.

This PR adds the named argument by to the append transform, supporting by:name and by:position.

prql target:sql.duckdb

from foo | append by:name bar

produces:

SELECT
  *
FROM
  foo
UNION
ALL BY NAME
SELECT
  *
FROM
  bar

A review of documentation for currently supported dialects suggests that BigQuery, Snowflake, and DuckDB support UNION ALL BY NAME.

kgutwin · 2026-06-24T19:09:54Z

@max-sixty @prql-bot why is tend-review failing?

prql-bot · 2026-06-24T19:15:48Z

It's not your PR — tend-review is hitting an infra issue that affects every fork PR right now.

actions/checkout@v7.0.0 (2026-06-18) added a guard (PR #2454) that refuses to check out fork-PR code under pull_request_target unless allow-unsafe-pr-checkout: true is set. tend-review.yaml is on: pull_request_target and passes the fork ref straight to actions/checkout, so since this PR is from a fork (bioteam/prql) the checkout step fails before the review ever runs:

##[error]Refusing to check out fork pull request code from a 'pull_request_target' workflow. ... set 'allow-unsafe-pr-checkout: true' on the actions/checkout step.

This is already fixed upstream in tend (max-sixty/tend#723 → #725, shipped in 0.1.7; this repo is still on 0.1.6). I've opened #6038 with the matching one-line stopgap so fork-PR reviews work again ahead of the full 0.1.7 regen.

None of this gates your PR's actual test suite — the tests matrix is what matters for merge, and tend-review is advisory.

max-sixty · 2026-06-24T20:23:13Z

thanks @prql-bot . please re-run now

prql-bot · 2026-06-24T20:28:56Z

Done — tend-review is running cleanly again (run).

A note for next time: GitHub's "Re-run" button (and gh run rerun) doesn't help here. Re-running a pull_request_target run reuses the workflow file from the original trigger, so the rerun still had allow-unsafe-pr-checkout: false and failed at the same checkout guard. The fix only takes effect on a fresh pull_request_target event, so I closed→reopened this PR to fire a reopened event against the now-fixed workflow on main. The actions/checkout@v7 step passed this time, and the review proceeded.

prql-bot

Reviewing as a draft — flagging anything that looks worth a quick fix. Mark ready for a full review.

The structure looks solid: the AppendBy enum threads cleanly through the PL/RQ folds and lowering, the by:position default keeps existing append calls working, and the set_quantifier match is exhaustive over (distinct, set_ops_distinct, by). Two things to consider before marking ready:

Docs aren't updated. web/book/src/reference/stdlib/transforms/append.md still describes append as a plain UNION ALL with no mention of by:name / by:position. Worth documenting the new argument (and the dialect support matrix) there.
The unsupported-dialect error leaks the internal dialect struct name. format!("The dialect {:?} does not support UNION BY NAME", ctx.dialect) renders as GenericDialect — the Rust variant name rather than the user-facing target (sql.generic). The rest of sql/ phrases these as "Target dialect does not support ..." (e.g. the *-position error in gen_expr.rs). Aligning with that convention would be more consistent — note the append_by_name_unsupported_dialect snapshot in error_messages.rs would need updating too. Attaching a span (as that * error does with .with_span) would also point the user at the by:name token if one's reachable at that point in preprocess::union.

One thing I couldn't verify: the Snowflake test_append_by_name_distinct snapshot emits bare UNION BY NAME (no ALL/DISTINCT). The snapshot only checks generated SQL, not that Snowflake accepts it — worth confirming against a live instance before relying on it.

prql-bot

Reviewing as a draft — flagging anything that looks worth a quick fix. Mark ready for a full review.

The error-message wording from the last round is sorted (now Target dialect does not support UNION BY NAME + hint). Two things on the incremental changes:

The todo!() in append_by_name is what's breaking the lineage path. test_append_by_name_distinct uses bare tables (from foo | append by:name bar), whose lineage is a single LineageColumn::All wildcard rather than named Single columns — so the LineageColumn::All { .. } => todo!() arm panics (not yet implemented at transforms.rs, the append_by_name fn). That's likely the "stuck on lineage" symptom: the by-name merge only handles the explicit-column case, and falls over as soon as either side carries a *. Per CLAUDE.md, this also shouldn't ship as a todo!() since it's reachable from user input.
cargo clippy will fail on the no-arg format! (inline suggestion below) — clippy::useless_format, on by default. Quick fix since the string has no interpolation.

The commented-out SqlTransform::Super(Transform::Select(...)) block in preprocess.rs is presumably the in-progress fallback (it's why test_append_by_name_fallback doesn't emit the NULL AS … alignment yet) — noting it so it doesn't get left in.

prql-bot · 2026-06-25T14:03:07Z

+                return Err(Error::new_simple(format!(
+                    "Target dialect does not support UNION BY NAME"
+                ))


format! with no interpolation trips clippy::useless_format (a default-on lint, which the CI denies). new_simple takes Into<String>, so pass the literal directly:

Suggested change

return Err(Error::new_simple(format!(

"Target dialect does not support UNION BY NAME"

))

return Err(Error::new_simple(

"Target dialect does not support UNION BY NAME",

))

kgutwin · 2026-06-25T14:04:24Z

I'm actually going to step back from this PR for the moment, as it's getting much more complex than I had initially anticipated.

I started working on a "fallback" implementation for by:name, when the dialect does not support UNION BY NAME. If we do know the full set of columns for both top and bottom, we should be able to generate a PQ structure that resembles what would be generated by the PRQL:

from foo
select { x, y, z = null }
append (from bar | select { x = null, y, z })

However, following that path made me realize that the base implementation (even without the fallback) requires updating the lineage process, which also had the previous assumption that both top and bottom relations had the same number of columns. I started working on that challenge, but got stuck. The code in the PR right now, given the following PRQL:

prql target sql:duckdb

from foo
select {x, y}
append by:name (from bar | select {y, z})

generates the SQL:

SELECT x, y, z FROM foo
UNION ALL BY NAME
SELECT y, z FROM bar

I'm not quite following why that z is showing up in the first SELECT, and I've run out of time to chase this down now. Perhaps I will pick it up again later...

prql-bot · 2026-06-25T14:11:09Z

The z is leaking across the union boundary because the by-name merge collapses both sides into a single output lineage, and the lowerer then applies that one column list to the top branch.

Walking it through with prqlc debug lineage on your example, the Append frame ends up with three columns — foo.x, foo.y, and bar.z (the bar.y was deduped out by the name-collision check, and bar.z was appended):

Lineage of the append by:name frame

- - 1:23-64                # the Append
  - columns:
    - !Single { name: [foo, x], target_id: 134 }
    - !Single { name: [foo, y], target_id: 135 }
    - !Single { name: [bar, z], target_id: 124 }   # <- belongs to bar, not foo
    inputs:
    - { id: 132, name: foo }
    - { id: 121, name: bar }

That merged lineage is what append_by_name builds. The lowerer's push_select then derives a single final projection for the whole top-level pipeline from it — [134, 135, 124] = [foo.x, foo.y, bar.z]. When SQL-gen splits the pipeline at the Append into two UNION branches, that unified list is applied to the top branch. CID 124 is bar.z, but rendered against foo it just emits the bare name z — hence SELECT x, y, z FROM foo.

The reason position-based append never hits this: it requires both relations to have matching columns (that's the cannot combine relations with different numbers of columns error), so the merged lineage is identical to each branch's own columns and applying one unified select to both is correct. by:name deliberately relaxes that invariant, so the unified select now contains a column (bar.z) that the top relation doesn't physically have.

For the DB-native UNION ALL BY NAME path the column alignment is the database's job, so each branch wants to keep its own native projection (SELECT x, y FROM foo UNION ALL BY NAME SELECT y, z FROM bar) rather than being forced through one merged push_select. So the fix probably can't live purely in the lineage merge — the merged lineage is a fine description of the output, but something at the union-lowering boundary needs to stop that single projection from crossing into the individual branches. That's also why the bare-table case (from foo | append by:name bar) is a separate problem: there the per-branch columns are * wildcards (LineageColumn::All), which is the todo!() at transforms.rs:929 — at HEAD that one panics rather than mis-projecting.

No rush given you're stepping back — just wanted to leave the trace here so it's not cold if you pick it up again.

kgutwin added 4 commits June 24, 2026 14:39

First pass - add append-by named argument

9dd27b8

Update tests

02361f1

Merge branch 'main' into kg/5165/append-by-name

dec0079

More test coverage

533dfb9

prql-bot mentioned this pull request Jun 24, 2026

ci: opt into v7 fork-PR checkout so tend-review runs on fork PRs #6038

Merged

prql-bot closed this Jun 24, 2026

prql-bot reopened this Jun 24, 2026

prql-bot reviewed Jun 24, 2026

View reviewed changes

Partial implementation, getting stuck on lineage

f53ca52

prql-bot reviewed Jun 25, 2026

View reviewed changes

prql-bot mentioned this pull request Jun 26, 2026

review-runs-tracking: 2026-06 #5972

Open

kgutwin mentioned this pull request Jun 26, 2026

feat: alternate PRQL-native implementation of append by:name #6046

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add `by:name` to `append` transform#6037

feat: add `by:name` to `append` transform#6037
kgutwin wants to merge 5 commits into
PRQL:mainfrom
bioteam:kg/5165/append-by-name

kgutwin commented Jun 24, 2026

Uh oh!

kgutwin commented Jun 24, 2026

Uh oh!

prql-bot commented Jun 24, 2026

Uh oh!

max-sixty commented Jun 24, 2026

Uh oh!

prql-bot commented Jun 24, 2026

Uh oh!

prql-bot left a comment

Uh oh!

prql-bot left a comment

Uh oh!

prql-bot Jun 25, 2026

Uh oh!

kgutwin commented Jun 25, 2026

Uh oh!

prql-bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

kgutwin commented Jun 24, 2026

Uh oh!

kgutwin commented Jun 24, 2026

Uh oh!

prql-bot commented Jun 24, 2026

Uh oh!

max-sixty commented Jun 24, 2026

Uh oh!

prql-bot commented Jun 24, 2026

Uh oh!

prql-bot left a comment

Choose a reason for hiding this comment

Uh oh!

prql-bot left a comment

Choose a reason for hiding this comment

Uh oh!

prql-bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

kgutwin commented Jun 25, 2026

Uh oh!

prql-bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants