Does dbt support models with two different implementations? #8863

philippefutureboy · 2023-10-16T15:33:28Z

philippefutureboy
Oct 16, 2023

Hey there!

Anyone knows how to write models with two different implementations (different parent nodes)?
We provide standardized analytics across multiple ERPs for our specific industry.
We take data from said ERPs, and standardize to a common model, which we then extract BI analytics from.
The issue we are facing is that given that each of said ERPs have different tables and schemas, our intermediate, standardized models rely on different source schemas depending on the ERP the client uses. More visually:

Provider 1: Acme Corp.
acme.order   --> stg_acme_order    --> int_fct_orders

Provider 2: Stark Industries
stark.t_order --> stg_stark_t_order --> int_fct_orders

We've pondered on solutions to the issue, and here's what we've come up with so far:

Solutions

Solution 1 – Env Var Branching

Description: Generate an intermediate, standardized model for each provider; in the common trunk model, switch between which impl is used based on the env var that specifies which provider should be used.

Example:

Provider 1: Acme Corp.
acme.order   --> stg_acme_order    --> int_acme_fct_orders --> int_fct_orders

Provider 2: Stark Industries
stark.t_order --> stg_stark_t_order --> int_stark_fct_orders --> int_fct_orders

int_fct_orders:

WITH
/*
 * IMPORT
 */
{% if env_var('PROVIDER') == "acme" %}
int_fct_orders AS (
    SELECT * FROM {{ref('int_acme_fct_orders')}}
),
{% elif env_var('PROVIDER') == "stark" %}
int_fct_orders AS (
    SELECT * FROM {{ref('int_stark_fct_orders')}}
),
{% endif %}

/*
 * EXPORT
 */
final AS (
    SELECT
        ...
    FROM
        int_fct_orders
)

SELECT * FROM final

Pros: Very simple to implement, quite easy to understand for future developpers as well.

Cons: Whenever dbt compile will be run, the produced manifest and SQL will only consider the branch associated to the value of env_var PROVIDER. As such static analysis of the manifest (for things such as Airflow DAG authorship) becomes more complex/convoluted.

Additional notes: The same issue will arise if the intermediate provider model comes from a private dbt package, as it will still require switching between the different PROVIDER values to obtain the correct model.

Solution 2 – Separate dbt projects for source standardization and further intermediate/mart models

Description: For each competing ERP provider, create a dbt project. Each of these projects cover from staging to standardized common intermediate model. These can then be loaded as sources by the common-trunk dbt project that handles downstream intermediate and mart models.

Example:

Folder structure:

dbt/
  providers/
    acme/
      models/
        sources/
          order.yml
        staging/
          stg_acme_order.sql
          ...
        standardized/
          int_fct_orders.sql
          ...
      ...
      dbt_project.yml

    stark/
      models/
        sources/
          t_order.yml
        staging/
          stg_stark_t_order.sql
          ...
        standardized/
          int_fct_orders.sql
          ...
      ...
      dbt_project.yml
  
  analytics/
    models/
      sources/
        int_fct_orders.yml
      intermediate/
        ...
      mart/
        ...
    ...
    dbt_project.yml

Pros: Since the providers are separated in their own dbt projects, the analytics dbt project can rely on the standard models of the providers as sources. Since they both expose standardized models that have the same contract, there is no need for a branching logic. Furthermore, the manifest produced will always produce all the relations between nodes.
Airflow DAG authorship is not significantly impacted as the provider standardization process can be done in dags separate from the analytics generation DAG.

Cons: We lose some observability on the lineage, or need to implement some logic that merges the manifests from the provider with that of the analytics dbt project and then exports it to whichever tool is used for data lineage and observability.

Invitation to discuss

These are the solutions we have come up with, but as you can see both come with their own cons.
At the present time we intend to use the second solution.
This being said, we'd love to hear your opinion on the matter:

What would be your approach to this issue?
Do you know of a dbt-native way to support this use-case?

Whether or not you have something to contribute, enjoy the writeup; hopefully it can serve someone facing the same issue :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does dbt support models with two different implementations? #8863

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Does dbt support models with two different implementations? #8863

philippefutureboy Oct 16, 2023

Solutions

Solution 1 – Env Var Branching

Solution 2 – Separate dbt projects for source standardization and further intermediate/mart models

Invitation to discuss

Replies: 0 comments

philippefutureboy
Oct 16, 2023