Does dbt support models with two different implementations? #8863
Unanswered
philippefutureboy
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey there!
Anyone knows how to write models with two different implementations (different parent nodes)?
We provide standardized analytics across multiple ERPs for our specific industry.
We take data from said ERPs, and standardize to a common model, which we then extract BI analytics from.
The issue we are facing is that given that each of said ERPs have different tables and schemas, our intermediate, standardized models rely on different source schemas depending on the ERP the client uses. More visually:
We've pondered on solutions to the issue, and here's what we've come up with so far:
Solutions
Solution 1 – Env Var Branching
Description: Generate an intermediate, standardized model for each provider; in the common trunk model, switch between which impl is used based on the env var that specifies which provider should be used.
Example:
int_fct_orders
:Pros: Very simple to implement, quite easy to understand for future developpers as well.
Cons: Whenever
dbt compile
will be run, the produced manifest and SQL will only consider the branch associated to the value of env_var PROVIDER. As such static analysis of the manifest (for things such as Airflow DAG authorship) becomes more complex/convoluted.Additional notes: The same issue will arise if the intermediate provider model comes from a private dbt package, as it will still require switching between the different PROVIDER values to obtain the correct model.
Solution 2 – Separate dbt projects for source standardization and further intermediate/mart models
Description: For each competing ERP provider, create a dbt project. Each of these projects cover from staging to standardized common intermediate model. These can then be loaded as sources by the common-trunk dbt project that handles downstream intermediate and mart models.
Example:
Folder structure:
Pros: Since the providers are separated in their own dbt projects, the
analytics
dbt project can rely on the standard models of the providers as sources. Since they both expose standardized models that have the same contract, there is no need for a branching logic. Furthermore, the manifest produced will always produce all the relations between nodes.Airflow DAG authorship is not significantly impacted as the provider standardization process can be done in dags separate from the analytics generation DAG.
Cons: We lose some observability on the lineage, or need to implement some logic that merges the manifests from the provider with that of the
analytics
dbt project and then exports it to whichever tool is used for data lineage and observability.Invitation to discuss
These are the solutions we have come up with, but as you can see both come with their own cons.
At the present time we intend to use the second solution.
This being said, we'd love to hear your opinion on the matter:
Whether or not you have something to contribute, enjoy the writeup; hopefully it can serve someone facing the same issue :)
Beta Was this translation helpful? Give feedback.
All reactions