Skip to content

Commit

Permalink
Merge branch 'nfiann-prerelease' into new-branch-name-VE
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Jan 24, 2025
2 parents 8ee97f3 + fae1f10 commit 6c84676
Show file tree
Hide file tree
Showing 114 changed files with 1,956 additions and 1,432 deletions.
4 changes: 2 additions & 2 deletions website/blog/2023-11-14-specify-prod-environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ is_featured: false

---

:::note You can now use a Staging environment!
This blog post was written before Staging environments. You can now use dbt Cloud can to support the patterns discussed here. Read more about [Staging environments](/docs/deploy/deploy-environments#staging-environment).
:::note You can now specify a Staging environment too!
This blog post was written before dbt Cloud added full support for Staging environments. Now that they exist, you should mark your CI environment as Staging as well. Read more about [Staging environments](/docs/deploy/deploy-environments#staging-environment).
:::

:::tip The Bottom Line:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
title: "Why I wish I had a control plane for my renovation"
description: "When I think back to my renovation, I realize how much smoother it would've been if I’d had a control plane for the entire process."
slug: wish-i-had-a-control-plane-for-my-renovation

authors: [mark_wan]

tags: [analytics craft, data_ecosystem]
hide_table_of_contents: false

date: 2025-01-21
is_featured: true
---

When my wife and I renovated our home, we chose to take on the role of owner-builder. It was a bold (and mostly naive) decision, but we wanted control over every aspect of the project. What we didn’t realize was just how complex and exhausting managing so many moving parts would be.

<Lightbox src="/img/blog/2024-12-22-why-i-wish-i-had-a-control-plane-for-my-renovation/control-plane.png" width="70%" title="My wife pondering our sanity" />

We had to coordinate multiple elements:

- The **architects**, who designed the layout, interior, and exterior.
- The **architectural plans**, which outlined what the house should look like.
- The **builders**, who executed those plans.
- The **inspectors**, **councils**, and **energy raters**, who checked whether everything met the required standards.

<!--truncate-->

Each piece was critical &mdash; without the plans, there’s no shared vision; without the builders, the plans don’t come to life; and without inspections, mistakes go unnoticed.

But as an inexperienced project manager, I was also the one responsible for stitching everything together:
- Architects handed me detailed plans, builders asked for clarifications.
- Inspectors flagged issues that were often too late to fix without extra costs or delays.
- On top of all this, I also don't speak "builder".

So what should have been quick and collaborative conversations, turned into drawn-out processes because there was no unified system to keep everyone on the same page.

## In many ways, this mirrors how data pipelines operate

- The **architects** are the engineers &mdash; designing how the pieces fit together.
- The **architectural plans** are your dbt code &mdash; the models, tests, and configurations that define what your data should look like.
- The **builders** are the compute layers (for example, Snowflake, BigQuery, or Databricks) that execute those transformations.
- The **inspectors** are the monitoring tools, which focus on retrospective insights like logs, job performance, and error rates.

Here’s the challenge: monitoring tools, by their nature, look backward. They’re great at telling you what happened, but they don’t help you plan or declare what should happen. And when these roles, plans, execution, and monitoring are siloed, teams are left trying to manually stitch them together, often wasting time troubleshooting issues or coordinating workflows.

## What makes dbt Cloud different

[dbt Cloud](https://www.getdbt.com/product/dbt-cloud) unifies these perspectives into a single [control plane](https://www.getdbt.com/blog/data-control-plane-introduction), bridging proactive and retrospective capabilities:

- **Proactive planning**: In dbt, you declare the desired [state](https://docs.getdbt.com/reference/node-selection/syntax#state-selection) of your data before jobs even run &mdash; your architectural plans are baked into the pipeline.
- **Retrospective insights**: dbt Cloud surfaces [job logs](https://docs.getdbt.com/docs/deploy/run-visibility), performance metrics, and test results, providing the same level of insight as traditional monitoring tools.

But the real power lies in how dbt integrates these two perspectives. Transformation logic (the plans) and monitoring (the inspections) are tightly connected, creating a continuous feedback loop where issues can be identified and resolved faster, and pipelines can be optimized more effectively.

## Why does this matter?

1. **The silo problem**: Many organizations rely on separate tools for transformation and monitoring. This fragmentation creates blind spots, making it harder to identify and resolve issues.
2. **Integrated workflows**: dbt Cloud eliminates these silos by connecting transformation and monitoring logic in one place. It doesn’t just report on what happened; it ties those insights directly to the proactive plans that define your pipeline.
3. **Operational confidence**: With dbt Cloud, you can trust that your data pipelines are not only functional but aligned with your business goals, monitored in real-time, and easy to troubleshoot.

## Why I wish I had a control plane for my renovation

When I think back to my renovation, I realize how much smoother it would have been if I’d had a control plane for the entire process. There are firms that specialize in design-and-build projects, in-house architects, engineers, and contractors. The beauty of these firms is that everything is under one roof, so you know they’re communicating seamlessly.

In my case, though, my architect, builder, and engineer were all completely separate, which meant I was the intermediary. I was the pigeon service shuttling information between them, and it was exhausting. Discussions that should have taken minutes, stretched into weeks and sometimes even months because there was no centralized communication.

dbt Cloud is like having that design-and-build firm for your data pipelines. It’s the control plane that unites proactive planning with retrospective monitoring, eliminating silos and inefficiencies. With dbt Cloud, you don’t need to play the role of the pigeon service &mdash; it gives you the visibility, integration, and control you need to manage modern data workflows effortlessly.
150 changes: 150 additions & 0 deletions website/blog/2025-01-23-levels-of-sql-comprehension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
title: "The Three Levels of SQL Comprehension: What they are and why you need to know about them"
description: "Parsers, compilers, executors, oh my! What it means when we talk about 'understanding SQL'."
slug: the-levels-of-sql-comprehension

authors: [joel_labes]

tags: [data ecosystem]
hide_table_of_contents: false

date: 2025-01-23
is_featured: true
---


Ever since [dbt Labs acquired SDF Labs last week](https://www.getdbt.com/blog/dbt-labs-acquires-sdf-labs), I've been head-down diving into their technology and making sense of it all. The main thing I knew going in was "SDF understands SQL". It's a nice pithy quote, but the specifics are *fascinating.*

For the next era of Analytics Engineering to be as transformative as the last, dbt needs to move beyond being a [string preprocessor](https://en.wikipedia.org/wiki/Preprocessor) and into fully comprehending SQL. **For the first time, SDF provides the technology necessary to make this possible.** Today we're going to dig into what SQL comprehension actually means, since it's so critical to what comes next.

<!-- truncate -->

## What is SQL comprehension?

Let’s call any tool that can look at a string of text, interpret it as SQL, and extract some meaning from it a *SQL Comprehension tool.*

Put another way, SQL Comprehension tools **recognize SQL code and deduce more information about that SQL than is present in the [tokens](https://www.postgresql.org/docs/current/sql-syntax-lexical.html) themselves**. Here’s a non-exhaustive set of behaviors and capabilities that such a tool might have for a given [dialect](https://blog.sdf.com/p/sql-dialects-and-the-tower-of-babel) of SQL:

- Identify constituent parts of a query.
- Create structured artifacts for their own use or for other tools to consume in turn.
- Check whether the SQL is valid.
- Understand what will happen when the query runs: things like what columns will be created, what datatypes do they have, and what DDL is involved
- Execute the query and return data (unsurprisingly, your database is a tool that comprehends SQL!)

By building on top of tools that truly understand SQL, it is possible to create systems that are much more capable, resilient and flexible than we’ve seen to date.

## The Levels of SQL Comprehension

When you look at the capabilities above, you can imagine some of those outcomes being achievable with [one line of regex](https://github.com/joellabes/mode-dbt-exposures/blob/main/generate_yaml.py#L52) and some that are only possible if you’ve literally built a database. Given that range of possibilities, we believe that “can you comprehend SQL” is an insufficiently precise question.

A better question is “to what level can you comprehend SQL?” To that end, we have identified different levels of capability. Each level deals with a key artifact (or more precisely - a specific "[intermediate representation](https://en.wikipedia.org/wiki/Intermediate_representation)"). And in doing so, each level unlocks specific capabilities and more in-depth validation.

| Level | Name | Artifact | Example Capability Unlocked |
| --- | --- | --- | --- |
| 1 | Parsing | Syntax Tree | Know what symbols are used in a query. |
| 2 | Compiling | Logical Plan | Know what types are used in a query, and how they change, regardless of their origin. |
| 3 | Executing | Physical Plan + Query Results | Know how a query will run on your database, all the way to calculating its results. |

At Level 1, you have a baseline comprehension of SQL. By parsing the string of SQL into a Syntax Tree, it’s possible to **reason about the components of a query** and identify whether you've **written syntactically legal code**.

At Level 2, the system produces a complete Logical Plan. A logical plan knows about every function that’s called in your query, the datatypes being passed into them, and what every column will look like as a result (among many other things). Static analysis of this plan makes it possible to **identify almost every error before you run your code**.

Finally, at Level 3, you can actually **execute a query and modify data**, because it understands all the complexities involved in answering the question "how does the exact data passed into this query get transformed/mutated".

## Can I see an example?

This can feel pretty theoretical based on descriptions alone, so let’s look at a basic Snowflake query.

A system at each level of SQL comprehension understands progressively more about the query, and that increased understanding enables it to **say with more precision whether the query is valid**.

To tools at lower levels of comprehension, some elements of a query are effectively a black box - their syntax tree has the contents of the query but cannot validate whether everything makes sense. **Remember that comprehension is deducing more information than is present in the plain text of the query; by comprehending more, you can validate more.**

### Level 1: Parsing

<Lightbox src="/img/blog/2025-01-23-levels-of-sql-comprehension/level_1.png" width="100%" />

A parser recognizes that a function called `dateadd` has been called with three arguments, and knows the contents of those arguments.

However, without knowledge of the [function signature](https://en.wikipedia.org/wiki/Type_signature#Signature), it has no way to validate whether those arguments are valid types, whether three is the right number of arguments, or even whether `dateadd` is an available function. This also means it can’t know what the datatype of the created column will be.

Parsers are intentionally flexible in what they will consume - their purpose is to make sense of what they're seeing, not nitpick. Most parsers describe themselves as “non-validating”, because true validation requires compilation.

### Level 2: Compiling

<Lightbox src="/img/blog/2025-01-23-levels-of-sql-comprehension/level_2.png" width="100%" />

Extending beyond a parser, a compiler *does* know the function signatures. It knows that on Snowflake, `dateadd` is a function which takes three arguments: a `datepart`, an `integer`, and an `expression` (in that order).

A compiler also knows what types a function can return without actually running the code (this is called [static analysis](https://en.wikipedia.org/wiki/Static_program_analysis), we’ll get into that another day). In this case, because `dateadd`’s return type depends on the input expression and our expression isn’t explicitly cast, the compiler just knows that the `new_day` column can be [one of three possible datatypes](https://docs.snowflake.com/en/sql-reference/functions/dateadd#returns).

### Level 3: Executing

<Lightbox src="/img/blog/2025-01-23-levels-of-sql-comprehension/level_3.png" width="100%" />

A tool with execution capabilities knows everything about this query and the data that is passed into it, including how functions are implemented. Therefore it can perfectly represent the results as run on Snowflake. Again, that’s what databases do. A database is a Level 3 tool.

### Review

Let’s review the increasing validation capabilities unlocked by each level of comprehension, and notice that over time **the black boxes completely disappear**:

<Lightbox src="/img/blog/2025-01-23-levels-of-sql-comprehension/validation_all_levels.png" width="100%" />

In a toy example like this one, the distinctions between the different levels might feel subtle. As you move away from a single query and into a full-scale project, the functionality gaps become more pronounced. That’s hard to demonstrate in a blog post, but fortunately there’s another easier option: look at some failing queries. How the query is broken impacts what level of tool is necessary to recognize the error.

## So let’s break things

As the great analytics engineer Tolstoy [once noted](https://en.wikipedia.org/wiki/Anna_Karenina_principle), “All correctly written queries are alike; each incorrectly written query is incorrect in its own way”.

Consider these three invalid queries:

- `selecte dateadd('day', 1, getdate()) as tomorrow` (Misspelled keyword)
- `select dateadd('day', getdate(), 1) as tomorrow` (Wrong order of arguments)
- `select cast('2025-01-32' as date) as tomorrow` (Impossible date)

Tools that comprehend SQL can catch errors. But they can't all catch the same errors! Each subsequent level will catch more subtle errors in addition to those from *all prior levels*. That's because the levels are additive — each level contains and builds on the knowledge of the ones below it.

Each of the above queries requires progressively greater SQL comprehension abilities to identify the mistake.

### Parser (Level 1): Capture Syntax Errors

Example: `selecte dateadd('day', 1, getdate()) as tomorrow`

Parsers know that `selecte` is **not a valid keyword** in Snowflake SQL, and will reject it.

### Compiler (Level 2): Capture Compilation Errors

Example: `select dateadd('day', getdate(), 1) as tomorrow`

To a parser, this looks fine - all the parentheses and commas are in the right places, and we’ve spelled `select` correctly this time.

A compiler, on the other hand, recognizes that the **function arguments are out of order** because:

- It knows that the second argument (`value`) needs to be a number, but that `getdate()` returns a `timestamp_ltz`.
- Likewise, it knows that a number is not a valid date/time expression for the third argument.

### Executor (Level 3): Capture Data Errors

Example: `select cast('2025-01-32' as date) as tomorrow`

Again, the parser signs off on this as valid SQL syntax.

But this time the compiler also thinks everything is fine! Remember that a compiler checks the signature of a function. It knows that `cast` takes a source expression and a target datatype as arguments, and it's checked that both these arguments are of the correct type.

It even has an overload that knows that strings can be cast into dates, but since it can’t do any validation of those strings’ *values* it doesn’t know **January 32nd isn’t a valid date**.

To actually know whether some data can be processed by a SQL query, you have to, well, process the data. Data errors can only be captured by a Level 3 system.

## Conclusion

Building your mental model of the levels of SQL comprehension – why they matter, how they're achieved and what they’ll unlock for you – is critical to understanding the coming era of data tooling.

In introducing these concepts, we’re still just scratching the surface. There's a lot more to discuss:

- Going deeper on the specific nuances of each level of comprehension
- How each level actually works, including the technologies and artifacts that power each level
- How this is all going to roll into a step change in the experience of working with data
- What it means for doing great data work

Over the coming days, you'll be hearing more about all of this from the dbt Labs team - both familiar faces and our new friends from SDF Labs.

This is a special moment for the industry and the community. It's alive with possibilities, with ideas, and with new potential. We're excited to navigate this new frontier with all of you.
8 changes: 8 additions & 0 deletions website/blog/authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -623,3 +623,11 @@ yu_ishikawa:
url: https://www.linkedin.com/in/yuishikawa0301
name: Yu Ishikawa
organization: Ubie
mark_wan:
image_url: /img/blog/authors/mwan.png
job_title: Senior Solutions Architect
links:
- icon: fa-linkedin
url: https://www.linkedin.com/in/markwwan/
name: Mark Wan
organization: dbt Labs
5 changes: 5 additions & 0 deletions website/blog/ctas.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,8 @@
subheader: Catch up on Coalesce 2024 and register to access a select number of on-demand sessions.
button_text: Register and watch
url: https://coalesce.getdbt.com/register/online
- name: sdf_webinar_2025
header: Accelerating dbt with SDF
subheader: Join leaders from dbt Labs and SDF Labs for insights and a live Q&A.
button_text: Save your seat
url: https://www.getdbt.com/resources/webinars/accelerating-dbt-with-sdf
2 changes: 1 addition & 1 deletion website/blog/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
featured_image: ""

# This CTA lives in right sidebar on blog index
featured_cta: "coalesce_2024_catchup"
featured_cta: "sdf_webinar_2025"

# Show or hide hero title, description, cta from blog index
show_title: true
Expand Down
5 changes: 5 additions & 0 deletions website/docs/docs/build/custom-aliases.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,3 +157,8 @@ If these models should indeed have the same database identifier, you can work ar

By default, dbt will create versioned models with the alias `<model_name>_v<v>`, where `<v>` is that version's unique identifier. You can customize this behavior just like for non-versioned models by configuring a custom `alias` or re-implementing the `generate_alias_name` macro.

## Related docs

- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias
- [Custom schema](/docs/build/custom-schemas) to learn how to customize dbt schema
- [Custom database](/docs/build/custom-databases) to learn how to customize dbt database
6 changes: 6 additions & 0 deletions website/docs/docs/build/custom-databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,9 @@ See docs on macro `dispatch`: ["Managing different global overrides across packa
### BigQuery

When dbt opens a BigQuery connection, it will do so using the `project_id` defined in your active `profiles.yml` target. This `project_id` will be billed for the queries that are executed in the dbt run, even if some models are configured to be built in other projects.

## Related docs

- [Customize dbt models database, schema, and alias](/guides/customize-schema-alias?step=1) to learn how to customize dbt models database, schema, and alias
- [Custom schema](/docs/build/custom-schemas) to learn how to customize dbt model schema
- [Custom aliases](/docs/build/custom-aliases) to learn how to customize dbt model alias name
Loading

0 comments on commit 6c84676

Please sign in to comment.