DRILL-8543: Add Support for Materialized Views#3036
Conversation
letian-jiang
left a comment
There was a problem hiding this comment.
LGTM. Materialized view is a powerful feature for analytic engine. 🥳
letian-jiang
left a comment
There was a problem hiding this comment.
We could also add a plan-asserting test to ensure the query is correctly rewrite using MV.
|
@letian-jiang I believe I addressed your review comments. Could you please mark the review as complete so we can merge? |
rymarm
left a comment
There was a problem hiding this comment.
@cgivre Thank you for implementing this feature! It looks great overall. I found a few issues from my point of view - please check them out. They relate to:
- MV
dataStoragePath: making its format strict and accessing it from the object instead of relying on duck typing. - Using hardcoded backticks during query building.
- Optimizing code syntax.
The MV rewriter had three bugs preventing query rewriting from working: 1. Schema discovery used lazy-loaded schema tree (always empty) - now iterates StoragePluginRegistry for FileSystemPlugin instances 2. SubstitutionVisitor arguments were swapped - now uses Calcite's RelOptMaterializations.useMaterializedViews() API which handles normalization and correct argument order internally 3. buildMvScanRel used SELECT * causing DYNAMIC_STAR type mismatch - now selects explicit columns from the MV field definitions Also adds plan verification tests to both TestMaterializedViewSupport and TestMaterializedViewRewriting to assert that query plans actually reference _mv_data (Parquet) or region.json as expected.
- Make MaterializedView.dataStoragePath the single source of truth for the
data directory ({name}_mv_data) and use getDataStoragePath() everywhere
instead of hardcoding the _mv_data suffix.
- Quote generated identifiers using the session's configured quoting
character so MV data scans work when planner.parser.quoting_identifiers
is not the default backtick.
- createMaterializedViewDataWriter now takes the MaterializedView object.
- Simplify isTable check and use a declarative findFirst in
getMaterializedView.
- Use pattern-matching instanceof in RecordCollector.
- Bump maven.compiler release/source/target 11 -> 17 (Drill no longer
supports Java 11; matches the existing enforcer rule and CI matrix).
|
@rymarm Thanks for the review. I believe I addressed all your comments. |
|
@cgivre Thank you for the changes! Everything looks good. The only thing I wrote about that didn't seem to be understood correctly. Why do you use the full class name: org.apache.calcite.sql.SqlNode parsedNode = sqlConverter.parse(scanSql.toString());
org.apache.calcite.sql.SqlNode validatedNode = sqlConverter.validate(parsedNode);instead of import: import org.apache.calcite.sql.SqlNode;Here: |
Oops.. I fixed that. |
DRILL-8543: Add Support for Materialized Views
Description
This PR adds materialized view support to Apache Drill, enabling users to store pre-computed query results for improved query performance.
Features
Implementation
Configuration
planner.enable_materialized_view_rewrite(default: true) - Controls automatic query rewritingDocumentation
Added docs/dev/MaterializedViews.md with complete feature documentation
Testing
Added additional unit tests.