Should dbt know / have an opinion on data types (& related functions)? #5778
jpmmcneill
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
This is of interest to me because I did some related work for dbt_artifacts to help with cross-db compatibility. I'm wondering if you're aware of the macros in https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/utils/data_types.sql which seem like they do some of what you're asking. I also opened a related issue (which I plan to work on) #5739 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Overview
dbt has some notion of standardising sql across database adapters.
There is a pretty clear pattern of "multi adapter" macros being moved into dbt-core: #4813, #5265). There are also multi adapter materializations, which afaik have been present in dbt for as close to "forever" as is a reasonable definition.
I believe there is a great opportunity for dbt to handle and be opinionated on data types, offering a set of multi adapter data types and functions that would make dbt code more likely to run in multiple warehouses.
Example Use Case 1
Note
This isn't a super precise example (as snowflake aliases
string
asvarchar
), but this is a feature of snowflake and I'm sure I could find an example where the below code wouldn't work between two databases.Let's imagine I have a model,
example_model.sql
, that converts a column from being an integer to being a string:The above would work in snowflake, but fail in bigquery:
Adding a catalogue of data types as well as a dbt native caster would solve this:
I think that the ideal case is what I've outlined above, where
string
is not some text passed to jinja, but anobject rather than a text, meaning it could error out if it wasn't in the supported lists.Example Use Case 2
Let's imagine I have a model,
example_model.sql
, that uses a column that is a key / value store:Again, the above would work in snowflake, but fail in bigquery:
An alternative could be:
Where
json_from_string
is be a macro that is aware of both the dbtjson
andstring
types and will convert between these two with an adapter specific function. In principle, this function could error out if it was called by non string data types / enforce this - for example under the hood it could be (psuedocode):{{ adapter_specific_function( dbt_cast('data', string) ) }}
ie. enforce that it's being called on a
string
typeWhy should dbt do this
This would allow a dbt native way of having project code being much closer to be adapter independent. For example, this could allow a snowflake / bigquery user to run dbt in CI on duckDB without huge effort in many cases
The way I would envision this is having a list of "dbt supported" data types, that could each have a bunch of relevant functions (macros) associated with them for operations.
It also introduces the (in my opinion, compelling) of strengthening "what is a data type" in schema yaml (ie. make a more formal & on rails version of the current free text). This could possibly even be extended to exceptions being raised pre-compilation stage if data types don't match yaml.
However, not using this feature would still be completely possible.
How would I see / use this as a user?
TODO
Beta Was this translation helpful? Give feedback.
All reactions