You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Query performance is critical for any analytics engine. This epic tracks known issues to optimize pg_mooncake's performance.
ClickBench is an excellent starting point for performance engineering. While pg_mooncake already performs great, there is still plenty of room for improvement.
Issues in this epic
Query-specific optimization
[ClickBench-Q29] DuckDB performs poorly on aggregation queries over multiple Parquet files, as also seen in "DuckDB (Parquet, partitioned)" and "ParadeDB (Parquet, partitioned)".
DuckDB doesn't push down OR and INLIST filter to table scan.
SELECT*FROM t WHERE a=1OR a=2;
[ClickBench-Q23] DuckDB doesn't support late projection, i.e. initially projecting only the columns used in ORDER BY, sorting to find the top rows, and then projecting additional columns only for those top rows. This depends on resolving the previous issue.
SELECT*FROM hits WHERE URL LIKE'%google%'ORDER BY EventTime LIMIT10;
To implement these query optimizations, we could: (1) upstream them to DuckDB, (2) implement the optimizations using DuckDB's OptimizerExtension, or (3) implement the optimizations within Postgres and only send optimized queries to DuckDB.
General query execution overhead
pg_duckdb has a constant per-query overhead due to running DuckdbPrepare() twice before executing queries with ExecuteQuery().
pg_mooncake may also experience per-file overhead, which will require further investigation and optimization.
The text was updated successfully, but these errors were encountered:
Description
Query performance is critical for any analytics engine. This epic tracks known issues to optimize pg_mooncake's performance.
ClickBench is an excellent starting point for performance engineering. While pg_mooncake already performs great, there is still plenty of room for improvement.
Issues in this epic
Query-specific optimization
OR
andINLIST
filter to table scan.ORDER BY
, sorting to find the top rows, and then projecting additional columns only for those top rows. This depends on resolving the previous issue.To implement these query optimizations, we could: (1) upstream them to DuckDB, (2) implement the optimizations using DuckDB's
OptimizerExtension
, or (3) implement the optimizations within Postgres and only send optimized queries to DuckDB.General query execution overhead
DuckdbPrepare()
twice before executing queries withExecuteQuery()
.The text was updated successfully, but these errors were encountered: