Query performance optimizations #82

zhousun · 2025-01-15T22:36:28Z

Description

Query performance is critical for any analytics engine. This epic tracks known issues to optimize pg_mooncake's performance.

ClickBench is an excellent starting point for performance engineering. While pg_mooncake already performs great, there is still plenty of room for improvement.

Issues in this epic

Query-specific optimization

[ClickBench-Q29] DuckDB performs poorly on aggregation queries over multiple Parquet files, as also seen in "DuckDB (Parquet, partitioned)" and "ParadeDB (Parquet, partitioned)".

SELECT SUM(ResolutionWidth), SUM(ResolutionWidth + 1), SUM(ResolutionWidth + 2), SUM(ResolutionWidth + 3), ..., SUM(ResolutionWidth + 89) FROM hits;

DuckDB doesn't push down OR and INLIST filter to table scan.

SELECT * FROM t WHERE a=1 OR a=2;

[ClickBench-Q23] DuckDB doesn't support late projection, i.e. initially projecting only the columns used in ORDER BY, sorting to find the top rows, and then projecting additional columns only for those top rows. This depends on resolving the previous issue.

SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;

To implement these query optimizations, we could: (1) upstream them to DuckDB, (2) implement the optimizations using DuckDB's OptimizerExtension, or (3) implement the optimizations within Postgres and only send optimized queries to DuckDB.

General query execution overhead

pg_duckdb has a constant per-query overhead due to running DuckdbPrepare() twice before executing queries with ExecuteQuery().
pg_mooncake may also experience per-file overhead, which will require further investigation and optimization.

The text was updated successfully, but these errors were encountered:

zhousun pinned this issue Jan 15, 2025

zhousun changed the title ~~Query Performance Optimization~~ Query Performance Optimizations Jan 15, 2025

dpxcc added the epic label Jan 15, 2025

dpxcc changed the title ~~Query Performance Optimizations~~ Query performance optimizations Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query performance optimizations #82

Query performance optimizations #82

zhousun commented Jan 15, 2025 •

edited by dpxcc

Loading

Query performance optimizations #82

Query performance optimizations #82

Comments

zhousun commented Jan 15, 2025 • edited by dpxcc Loading

Description

Issues in this epic

Query-specific optimization

General query execution overhead

zhousun commented Jan 15, 2025 •

edited by dpxcc

Loading