Unified/swift: new AST spec and Swift mappings#22016
Open
asgerf wants to merge 29 commits into
Open
Conversation
After a {expr} or {..expr} placeholder, an optional chain of
.<builtin>() calls may follow. Currently the only builtin is:
.map(param -> template)
which applies the template to each element of the iterable and
collects the resulting node IDs. A chain auto-splices into the
enclosing field/child position.
Example:
path: {parts}.map(p -> (identifier #{p}))
The framework is extensible: additional builtins can be added by
matching on the method name in parse_chain_suffix.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A left fold over an iterable where the first element seeds the accumulator: - first -> init : converts the first element to the initial accumulator - acc, elem -> fold : fold step; acc = current accumulator, elem = next element - Empty iterable produces nothing (0-element splice) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Ensure the full wildcard _ supports quantifiers - Also rewrite unnamed nodes in one-shot phases
When a field pattern has a bare capture with no preceding pattern atom (i.e. `foo: @bar`), implicitly use a true wildcard (`_`, match_unnamed: true) as the node pattern, making it equivalent to `foo: _ @bar`. This is a convenience shorthand: in practice every `field: _ @cap` in the Swift rules can now be written more concisely as `field: @cap`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previously, when a node was synthesized it would always take the
location from the node that matched the current rule. This resulted
in overly broad locations however.
For (foo #{bar}) we now take the location of the 'bar' node.
For non-leaf nodes we merge all its child node locations.
The switch_entry rule was capturing switch_pattern wrapper nodes instead of drilling into them to extract the actual pattern nodes. This caused patterns from switch cases to be lost during desugaring. Changed the pattern match from: (switch_entry pattern: (switch_pattern)* @pats ...) to: (switch_entry pattern: (switch_pattern pattern: @pats)* ...) This now correctly extracts the pattern field from each switch_pattern node, ensuring that patterns from cases like 'case 1:' and 'case .circle(let r):' are preserved in the switch_case AST nodes. Updated control-flow.txt corpus outputs to reflect the new behavior.
…tuple_pattern Changed the desugaring rules to properly map case patterns with binding (e.g., 'case .circle(let r):') to constructor_pattern nodes instead of tuple_pattern. New rules added: - tuple_pattern_item → pattern_element (preserves optional name/key) - pattern.kind: binding_pattern → name_pattern (extracts bound identifier) - pattern.kind: case_pattern → constructor_pattern (creates proper constructor with bound arguments as pattern_elements) This provides a more semantically correct AST representation: - Constructor name: name_expr identifier 'circle' - Elements: pattern_element containing name_pattern identifier 'r' Instead of the previous tuple_pattern string representation. Updated control-flow.txt corpus outputs.
Adds a test case 'Switch with labeled case pattern arguments' covering: - case .implicit(isAcknowledged: false) — labeled bool literal - case .thread(threadRowId: _, let rowId) — labeled wildcard + binding The current output contains type errors: pattern_element::key is being produced as name_expr instead of identifier. These will be fixed in the following commit.
Patterns have an unusual parse tree, but now the matching should at least be a bit easier to follow. The TODO regarding not being able to pass down context to handle var/let is still relevant, and can't be solved in the mapping alone.
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Contributor
Author
Rerun has been triggered: 2 restarted 🚀 |
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request expands the unified AST schema and implements substantially richer Swift extraction by updating the Swift tree-sitter grammar, the Swift→unified AST mapping rules, and the supporting “yeast” rewrite engine features needed to preserve locations and enable more expressive rule templates.
Changes:
- Redesign and expand the unified AST (new node kinds for blocks, patterns, declarations, operators, types, control flow, etc.) and update the corresponding QL/dbscheme bindings.
- Improve the Swift parser surface (new named fields like
scoped_import_kindanddot) and implement a much more complete Swift mapping inswift.rs, with extensive new corpus expectations. - Add yeast features for better source range handling and richer template/query capabilities (capture sugar, chaining methods like
.map/.reduce_left, and source-location preservation for#{...}).
Show a summary per file
| File | Description |
|---|---|
| unified/ql/test/library-tests/BasicTest/test.expected | Updates expected unified AST test output for BasicTest. |
| unified/ql/lib/unified.dbscheme | Large expansion/refactor of the unified database schema to support the new AST shape. |
| unified/ql/lib/codeql/unified/Ast.qll | Updates the QL AST wrapper classes to match the expanded unified schema. |
| unified/extractor/tree-sitter-swift/node-types.yml | Reflects Swift grammar field changes (e.g., dot, scoped_import_kind) for review/regeneration. |
| unified/extractor/tree-sitter-swift/grammar.js | Adds/renames Swift grammar fields to enable more precise downstream mapping. |
| unified/extractor/tests/corpus/swift/variables.txt | Regenerated Swift corpus expectations for variable-related constructs. |
| unified/extractor/tests/corpus/swift/types.txt | Regenerated Swift corpus expectations for type/class-like declarations and members. |
| unified/extractor/tests/corpus/swift/optionals-and-errors.txt | Regenerated Swift corpus expectations for optionals, try/do-catch, and related operators. |
| unified/extractor/tests/corpus/swift/operators.txt | Regenerated Swift corpus expectations for operator expressions. |
| unified/extractor/tests/corpus/swift/loops.txt | Regenerated Swift corpus expectations for loops and labeled flow control. |
| unified/extractor/tests/corpus/swift/literals.txt | Regenerated Swift corpus expectations for literal forms. |
| unified/extractor/tests/corpus/swift/functions.txt | Regenerated Swift corpus expectations for functions/calls and leading-dot constructs. |
| unified/extractor/tests/corpus/swift/desugar.txt | Adds/updates desugaring-focused Swift corpus expectations (imports, etc.). |
| unified/extractor/tests/corpus/swift/control-flow.txt | Regenerated Swift corpus expectations for if/guard/switch patterns and flow control. |
| unified/extractor/tests/corpus/swift/collections.txt | Regenerated Swift corpus expectations for collection literals and indexing-like parses. |
| unified/extractor/tests/corpus/swift/closures.txt | Regenerated Swift corpus expectations for closures/capture lists and shorthand params. |
| unified/extractor/src/languages/swift/swift.rs | Major rewrite of Swift translation rules to produce the new unified AST nodes. |
| unified/extractor/ast_types.yml | Updates the unified AST type definitions (supertypes, node shapes, new constructs). |
| unified/AGENTS.md | Updates contributor guidance around Swift parser, AST mapping, and regeneration workflows. |
| shared/yeast/tests/test.rs | Adds tests for new query matching behavior and #{capture} location behavior. |
| shared/yeast/src/lib.rs | Adds source-range unioning for synthesized nodes and enables rewriting unnamed nodes. |
| shared/yeast/src/build.rs | Adds helpers for source-range-aware literals and field prepending. |
| shared/yeast-macros/src/parse.rs | Enhances query/template parsing (capture sugar, chaining, and fixed capture multiplicity parsing). |
| shared/yeast-macros/src/lib.rs | Documents new template chain features (.map, .reduce_left). |
Copilot's findings
- Files reviewed: 24/24 changed files
- Comments generated: 2
Comment on lines
+489
to
+493
| /// Prepend a child id to the given field of the given node. | ||
| pub fn prepend_field_child(&mut self, node_id: Id, field_id: FieldId, value_id: Id) { | ||
| let node = self.nodes.get_mut(node_id).expect("prepend_field_child: invalid node id"); | ||
| node.fields.entry(field_id).or_default().insert(0, value_id); | ||
| } |
|
|
||
| # A literal backed by a keyword such as `nil`, `null`, or `nullptr`. | ||
| # | ||
| # Altough nil/null are keyword literals in many languages there should be |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR rewrites the unified AST and fleshes out the corresponding Swift mappings. It also contains a bunch of yeast features needed to make ends meet and actually get things working.
Some TODO comments are left in the mappings for now, for features that can be implemented separately in later PRs. Most notably is that patterns are not translated correctly at the moment, and needs a parser change or the ability to pass down contextual information in yeast.