feat: add by:name to append transform#6037
Conversation
|
@max-sixty @prql-bot why is tend-review failing? |
|
It's not your PR —
This is already fixed upstream in tend (max-sixty/tend#723 → #725, shipped in 0.1.7; this repo is still on 0.1.6). I've opened #6038 with the matching one-line stopgap so fork-PR reviews work again ahead of the full 0.1.7 regen. None of this gates your PR's actual test suite — the |
|
thanks @prql-bot . please re-run now |
|
Done — A note for next time: GitHub's "Re-run" button (and |
prql-bot
left a comment
There was a problem hiding this comment.
Reviewing as a draft — flagging anything that looks worth a quick fix. Mark ready for a full review.
The structure looks solid: the AppendBy enum threads cleanly through the PL/RQ folds and lowering, the by:position default keeps existing append calls working, and the set_quantifier match is exhaustive over (distinct, set_ops_distinct, by). Two things to consider before marking ready:
-
Docs aren't updated.
web/book/src/reference/stdlib/transforms/append.mdstill describesappendas a plainUNION ALLwith no mention ofby:name/by:position. Worth documenting the new argument (and the dialect support matrix) there. -
The unsupported-dialect error leaks the internal dialect struct name.
format!("The dialect {:?} does not support UNION BY NAME", ctx.dialect)renders asGenericDialect— the Rust variant name rather than the user-facing target (sql.generic). The rest ofsql/phrases these as"Target dialect does not support ..."(e.g. the*-position error ingen_expr.rs). Aligning with that convention would be more consistent — note theappend_by_name_unsupported_dialectsnapshot inerror_messages.rswould need updating too. Attaching a span (as that*error does with.with_span) would also point the user at theby:nametoken if one's reachable at that point inpreprocess::union.
One thing I couldn't verify: the Snowflake test_append_by_name_distinct snapshot emits bare UNION BY NAME (no ALL/DISTINCT). The snapshot only checks generated SQL, not that Snowflake accepts it — worth confirming against a live instance before relying on it.
prql-bot
left a comment
There was a problem hiding this comment.
Reviewing as a draft — flagging anything that looks worth a quick fix. Mark ready for a full review.
The error-message wording from the last round is sorted (now Target dialect does not support UNION BY NAME + hint). Two things on the incremental changes:
-
The
todo!()inappend_by_nameis what's breaking the lineage path.test_append_by_name_distinctuses bare tables (from foo | append by:name bar), whose lineage is a singleLineageColumn::Allwildcard rather than namedSinglecolumns — so theLineageColumn::All { .. } => todo!()arm panics (not yet implementedattransforms.rs, theappend_by_namefn). That's likely the "stuck on lineage" symptom: the by-name merge only handles the explicit-column case, and falls over as soon as either side carries a*. PerCLAUDE.md, this also shouldn't ship as atodo!()since it's reachable from user input. -
cargo clippywill fail on the no-argformat!(inline suggestion below) —clippy::useless_format, on by default. Quick fix since the string has no interpolation.
The commented-out SqlTransform::Super(Transform::Select(...)) block in preprocess.rs is presumably the in-progress fallback (it's why test_append_by_name_fallback doesn't emit the NULL AS … alignment yet) — noting it so it doesn't get left in.
| return Err(Error::new_simple(format!( | ||
| "Target dialect does not support UNION BY NAME" | ||
| )) |
There was a problem hiding this comment.
format! with no interpolation trips clippy::useless_format (a default-on lint, which the CI denies). new_simple takes Into<String>, so pass the literal directly:
| return Err(Error::new_simple(format!( | |
| "Target dialect does not support UNION BY NAME" | |
| )) | |
| return Err(Error::new_simple( | |
| "Target dialect does not support UNION BY NAME", | |
| )) |
|
I'm actually going to step back from this PR for the moment, as it's getting much more complex than I had initially anticipated. I started working on a "fallback" implementation for However, following that path made me realize that the base implementation (even without the fallback) requires updating the lineage process, which also had the previous assumption that both top and bottom relations had the same number of columns. I started working on that challenge, but got stuck. The code in the PR right now, given the following PRQL: generates the SQL: SELECT x, y, z FROM foo
UNION ALL BY NAME
SELECT y, z FROM barI'm not quite following why that |
|
The Walking it through with Lineage of the
|
Closes #5165.
This PR adds the named argument
byto theappendtransform, supportingby:nameandby:position.produces:
A review of documentation for currently supported dialects suggests that BigQuery, Snowflake, and DuckDB support
UNION ALL BY NAME.