Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support approx_count_distinct #499

Merged
merged 1 commit into from
Jan 2, 2025
Merged

Support approx_count_distinct #499

merged 1 commit into from
Jan 2, 2025

Conversation

dpxcc
Copy link
Contributor

@dpxcc dpxcc commented Dec 17, 2024

Add support for approx_count_distinct()

@@ -0,0 +1,9 @@
CREATE FUNCTION @[email protected]_count_distinct(a anyelement)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should no be added as a function, but as a function but as an aggregate. Currently the following query fails on your test schema because of that:

> SELECT approx_count_distinct(a) FROM t group by b;
ERROR:  42803: column "t.a" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT approx_count_distinct(a) FROM t group by b;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch!!

@@ -0,0 +1,9 @@
CREATE FUNCTION @[email protected]_count_distinct(a anyelement)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should be created in @extschema@ (i.e. the public schema) or the duckdb schema. I think @extschema@ is probably fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. I was just following the existing functions

@dpxcc dpxcc force-pushed the cc branch 2 times, most recently from 5a816ad to 70231b2 Compare December 18, 2024 22:37
@dpxcc
Copy link
Contributor Author

dpxcc commented Dec 18, 2024

Change approx_count_distinct from a user-defined function to a user-defined aggregate

@dpxcc dpxcc force-pushed the cc branch 2 times, most recently from 2352ea2 to 3d4f730 Compare December 18, 2024 23:31
@dpxcc
Copy link
Contributor Author

dpxcc commented Dec 20, 2024

@JelteF Could you please take a look again? Thanks!

@JelteF JelteF merged commit 6941855 into duckdb:main Jan 2, 2025
5 checks passed
@dpxcc dpxcc deleted the cc branch January 2, 2025 19:48
JelteF added a commit that referenced this pull request Jan 10, 2025
I noticed two issues with the new `approx_count_distinct`
implementation:

1. If no FROM clause was used it was not possible to use it
2. It would not be detected correctly as duckdb-only without
   `duckdb.force_execution = true` (or some  other mechanism). This
   fixes both of those issues.

Related to #499
ritwizsinha pushed a commit to ritwizsinha/pg_duckdb that referenced this pull request Jan 11, 2025
I noticed two issues with the new `approx_count_distinct`
implementation:

1. If no FROM clause was used it was not possible to use it
2. It would not be detected correctly as duckdb-only without
   `duckdb.force_execution = true` (or some  other mechanism). This
   fixes both of those issues.

Related to duckdb#499
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants