Skip to content

Commit

Permalink
Merge branch 'current' into dbt-teradata-1.8.x
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Jul 2, 2024
2 parents cc648bc + 487bed4 commit b728d4d
Show file tree
Hide file tree
Showing 71 changed files with 1,257 additions and 912 deletions.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Intro to MetricFlow"
description: Getting started with the dbt and MetricFlow
hoverSnippet: Learn how to get started with the dbt and MetricFlow
title: "Intro to the dbt Semantic Layer"
description: Getting started with the dbt Semantic Layer
hoverSnippet: Learn how to get started with the dbt Semantic Layer
pagination_next: "best-practices/how-we-build-our-metrics/semantic-layer-2-setup"
pagination_prev: null
---
Expand All @@ -12,25 +12,23 @@ Flying cars, hoverboards, and true self-service analytics: this is the future we

- ❓ Understand the **purpose and capabilities** of the **dbt Semantic Layer**, particularly MetricFlow as the engine that powers it.
- 🧱 Familiarity with the core components of MetricFlow — **semantic models and metrics** — and how they work together.
- 🛠️ Hands-on **experience building** semantic models and metrics in dbt Cloud.
- 🔁 Know how to **refactor** models for MetricFlow.
- 🏅 Aware of new **best practices** to take maximum advantage of the Semantic Layer.
- 🔁 Know how to **refactor** dbt models for the Semantic Layer.
- 🏅 Aware of **best practices** to take maximum advantage of the Semantic Layer.

## Guide structure overview

We'll work through our learning goals via an [example project](https://github.com/dbt-labs/jaffle-sl-template), we encourage you to follow along and try the code out for yourself if you'd like on the `start-here` branch, or you can just follow along with the completed state of the codebase on the `main` branch.
1. Getting **setup** in your dbt project.
2. Building a **semantic model** and its fundamental parts: **entities, dimensions, and measures**.
3. Building a **metric**.
4. Defining **advanced metrics**: `ratio` and `derived` types.
5. **File and folder structure**: establishing a system for naming things.
6. **Refactoring** marts and roll-ups for the Semantic Layer.
7. Review **best practices**.

1. Getting **setup** with MetricFlow in your dbt project.
2. Building your first **semantic model** and its fundamental parts: **entities, dimensions, and measures**.
3. Building your first **metric**.
4. **Refactoring** a mart into the Semantic Layer.
5. Defining **advanced metrics**: `ratio` and `derived` types.
6. Review **best practices**.

If you're ready to ship your users more power with less code, let's dive in!
If you're ready to ship your users more power and flexibility with less code, let's dive in!

:::info
MetricFlow is a new way to define metrics in dbt and one of the key components of the [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl). It handles SQL query construction and defines the specification for dbt semantic models and metrics.
MetricFlow is the engine for defining metrics in dbt and one of the key components of the [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl). It handles SQL query construction and defines the specification for dbt semantic models and metrics.

To fully experience the dbt Semantic Layer, including the ability to query dbt metrics via external integrations, you'll need a [dbt Cloud Team or Enterprise account](https://www.getdbt.com/pricing/). Refer to [dbt Semantic Layer FAQs](/docs/use-dbt-semantic-layer/sl-faqs) for more information.
:::
Original file line number Diff line number Diff line change
@@ -1,62 +1,26 @@
---
title: "Set up MetricFlow"
description: Getting started with the dbt and MetricFlow
hoverSnippet: Learn how to get started with the dbt and MetricFlow
title: "Set up the dbt Semantic Layer"
description: Getting started with the dbt Semantic Layer
hoverSnippet: Learn how to get started with the dbt Semantic Layer
pagination_next: "best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models"
---

## Getting started

First, if you want to follow along, we'll need to clone the [example project](https://github.com/dbt-labs/jaffle-sl-template). You will need access to a Snowflake, BigQuery, Databricks, or Postgres warehouse for this, for the time being. The project is our classic Jaffle Shop, a simulated chain restaurant serving [jaffles](https://en.wikipedia.org/wiki/Pie_iron) and tasty beverages.
There are two options for developing a dbt project, including the Semantic Layer:

```shell
git clone [email protected]:dbt-labs/jaffle-sl-template.git
cd path/to/project
```
- [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) — MetricFlow commands are embedded in the dbt Cloud CLI under the `dbt sl` subcommand. This is the easiest, most full-featured way to develop dbt Semantic Layer code for the time being. You can use the editor of your choice and run commands from the terminal.

Next, before you start writing code, you need to install MetricFlow:
- [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) — You can create semantic models and metrics in the dbt Cloud IDE. However, support for running `dbt sl` commands isn't available just yet, but is in active development. This means you won't be able to validate your code, so we'd recommend working with the Cloud CLI and a local editor for now.

<Tabs>

<TabItem value="cloud" label="dbt Cloud">

- [dbt Cloud CLI](/docs/cloud/cloud-cli-installation) &mdash; MetricFlow commands are embedded in the dbt Cloud CLI. You can immediately run them once you install the dbt Cloud CLI. Using dbt Cloud means you won't need to manage versioning — your dbt Cloud account will automatically manage the versioning.

- [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) &mdash; You can create metrics using MetricFlow in the dbt Cloud IDE. However, support for running MetricFlow commands in the IDE will be available soon.

</TabItem>

<TabItem value="core" label="dbt Core">

- Download MetricFlow as an extension of a dbt adapter from PyPI (dbt Core users only). The MetricFlow is compatible with Python versions 3.8 through 3.11.
- **Note**: You'll need to manage versioning between dbt Core, your adapter, and MetricFlow.
- Beginning in v1.8, installing an adapter does not automatically install `dbt-core`. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations.
- Use pip to install MetricFlow and the dbt adapter.

```shell
# activate a virtual environment for your project,
# if you don't have a name you like to use we suggest .venv
python -m venv [virtual environment name]
source [virtual environment name]/bin/activate
# install dbt and MetricFlow
python -m pip install "dbt-metricflow[adapter name]"
# e.g. python -m pip install "dbt-metricflow[snowflake]"
```

</TabItem>
</Tabs>

- Now that you're ready to use MetricFlow, get to the pre-Semantic Layer starting state by checking out the `start-here` branch:
## Basic commands

```shell
git checkout start-here
```
- 🔍 A less common command that will come in handy with the Semantic Layer is `dbt parse`. This will parse your project and generate a **semantic manifest**, a representation of meaningful connections described by your project. This is uploaded to dbt Cloud, and used for running `dbt sl` commands in development. This file gives MetricFlow a **state of the world from which to generate queries**.
- 🧰 `dbt sl query` is your other best friend, it will execute a query against your semantic layer and return a sample of the results. This is great for testing your semantic models and metrics as you build them. For example, if you're building a revenue model you can run `dbt sl query revenue --group-by metric_time__month` to validate that monthly revenue is calculating correctly.
- 📝 Lastly, `dbt sl list dimensions --metrics [metric name]` will list all the dimensions available for a given metric. This is useful for checking that you're increasing dimensionality as you progress. You can `dbt sl list` other aspects of your Semantic Layer as well, run `dbt sl list --help` for the full list of options.

For more information, refer to the [MetricFlow commands](/docs/build/metricflow-commands) or the [quickstart guides](/guides) to get more familiar with setting up a dbt project.
For more information on the available commands, refer to the [MetricFlow commands](/docs/build/metricflow-commands) reference, or use `dbt sl --help` and `dbt sl [subcommand] --help` on the command line. If you need to set up a dbt project first, check out the [quickstart guides](/docs/get-started-dbt).

## Basic commands
## Onward!

- 💻 This package will install both `dbt` and `mf` as CLIs in our virtual environment. All the regular `dbt` commands like `run`, `build`, and `test` are available.
- 🔍 A less common one that will come in handy with the Semantic Layer is `dbt parse`. This will parse your project and generate a **semantic manifest**, a representation of meaningful connections described by your project. This file gives MetricFlow a **state of the world from which to generate queries**.
- 🧰 In addition to `dbt`, you'll have access to `mf` commands like `query` and `validate-configs`, which operate based on that semantic manifest. We'll dig more into all of these as we go along.
- 🛠️ Lets start off by running a `dbt build` to get the **starting state** of our project built.
Throughout the rest of the guide, we'll show example code based on the Jaffle Shop project, a fictional chain of restaurants. You can check out the code yourself and try things out in the [Jaffle Shop repository](https://github.com/dbt-labs/jaffle-shop). So if you see us calculating metrics like `food_revenue` later in this guide, this is why!
Original file line number Diff line number Diff line change
@@ -1,56 +1,51 @@
---
title: "Building semantic models"
description: Getting started with the dbt and MetricFlow
hoverSnippet: Learn how to get started with the dbt and MetricFlow
description: Getting started with the dbt Semantic Layer
hoverSnippet: Learn how to get started with the dbt Semantic Layer
pagination_next: "best-practices/how-we-build-our-metrics/semantic-layer-4-build-metrics"
---

## How to build a semantic model

A semantic model is the MetricFlow equivalent to a logical layer model (what historically has just been called a 'model' in dbt land). Just as configurations for models are defined on the `models:` YAML key, configurations for semantic models are housed under `semantic models:`. A key difference is that while a logical model consists of configuration and SQL or Python code, a **semantic model is defined purely via YAML**. Rather than encoding a specific dataset, a **semantic model describes relationships** that let your end users select and refine their own datasets reliably.
A semantic model is the Semantic Layer equivalent to a logical layer model (what historically has just been called a 'model' in dbt land). Just as configurations for models are defined on the `models:` YAML key, configurations for semantic models are housed under `semantic models:`. A key difference is that while a logical model consists of configuration and SQL or Python code, a **semantic model is defined purely via YAML**. Rather than encoding a specific dataset, a **semantic model describes relationships and expressions** that let your end users select and refine their own datasets dynamically and reliably.

- ⚙️ Semantic models are **comprised of three components**:
- 🫂 **entities**: these describe the **relationships** between various semantic models (think ids)
- 🪣 **dimensions**: these are the columns you want to **slice, dice, group, and filter by** (think timestamps, categories, booleans).
- 🔪 **dimensions**: these are the columns you want to **slice, dice, group, and filter by** (think timestamps, categories, booleans).
- 📏 **measures**: these are the **quantitative values you want to aggregate**
- 📚 We define **columns as being an entity, dimension, or measure**.

:::tip
**File per model**. Given the interdependence of logical and semantic models, and semantic models and metrics, we've updated our best practice recommendation to a one YAML file per model approach if you're using the Semantic Layer. This houses everything related to a model in one place and preserves unique file names for quickly getting to the code you want.
:::
- 🪣 We define **columns as being an entity, dimension, or measure**. Columns will typically fit into one of these 3 buckets, or if they're a complex aggregation expression, they might constitute a metric.

## Defining orders

- 🥪 The semantic model we're going to define is _orders_.
- 📗 We define it as a **YAML dictionary in the semantic models list**.
Let's zoom in on how we might define an _orders_ semantic model.

- 📗 We define it as a **YAML dictionary in the `semantic_models` list**.
- 📑 It will have a **name, entities list, dimensions list, and measures list**.
- ⏬ We recommend defining them **in this order consistently** as a style best practice.

```YAML
<File name="models/marts/orders.yml" />

```yaml
semantic_models:
- name: orders
entities:
...
dimensions:
...
measures:
...
entities: ... # we'll define these later
dimensions: ... # we'll define these later
measures: ... # we'll define these later
```
- Next we'll point to the corresponding logical model by supplying a [`ref`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref) in the `model:` property, and a `description` for documentation.

```YAML
<File name="models/marts/orders.yml" />

```yml
semantic_models:
- name: orders
description: |
Model containing order data. The grain of the table is the order id.
model: ref('stg_orders')
entities:
...
dimensions:
...
measures:
...
entities: ...
dimensions: ...
measures: ...
```

## Establishing our entities
Expand All @@ -64,9 +59,11 @@ semantic_models:

### Entities in action

If we look at the staging model for orders, we see that it has 3 id columns, so we'll need three entities.
If we look at an example staging model for orders, we see that it has 3 id columns, so we'll need three entities.

```SQL
<File name="models/staging/stg_orders.sql" />

```sql
renamed as (
select
Expand All @@ -88,9 +85,11 @@ renamed as (

- 👉 We add them with a **`name`, `type`, and optional `expr`** (expression). The expression can be any valid SQL expression on your platform.
- 📛 If you **don't add an expression**, MetricFlow will **assume the name is equal to the column name** in the underlying logical model.
- 👍 Our best practices pattern is to, whenever possible, provide a `name` that is the singular form of the subject or grain of the table, and use `expr` to specify the precise column name (with `_id` etc). This will let us write **more readable metrics** on top of these semantic models.
- 👍 Our best practices pattern is to, whenever possible, provide a `name` that is the singular form of the subject or grain of the table, and use `expr` to specify the precise column name (with `_id` etc). This will let us write **more readable metrics** on top of these semantic models. For example, we'll use `location` instead of `location_id`.

<File name="models/marts/orders.yml" />

```YAML
```yml
semantic_models:
- name: orders
...
Expand All @@ -109,7 +108,6 @@ semantic_models:
...
measures:
...
```

## Defining our dimensions
Expand All @@ -125,7 +123,9 @@ semantic_models:

- 👀 Let's look at our staging model again and see what fields we have available.

```SQL
<File name="models/staging/stg_orders.sql" />

```sql
select
---------- ids -> entities
Expand All @@ -143,18 +143,19 @@ select
from source
```

- ⏰ For now the only dimension to add is a **time dimension**.
- ⏰ For now the only dimension to add is a **time dimension**: `ordered_at`.
- 🕰️ At least one **primary time dimension** is **required** for any semantic models that **have measures**.
- 1️⃣ We denote this with the `is_primary` property, or if there is only a one-time dimension supplied it is primary by default. Below we only have `ordered_at` as a timestamp so we don't need to specify anything except the maximum granularity we're bucketing to (in this case, day).
- 1️⃣ We denote this with the `is_primary` property, or if there is only a one-time dimension supplied it is primary by default. Below we only have `ordered_at` as a timestamp so we don't need to specify anything except the _minimum granularity_ we're bucketing to (in this case, day). By this we mean that we're not going to be looking at orders at a finer granularity than a day.

<File name="models/marts/orders.yml" />

```YAML
```yml
dimensions:
- name: ordered_at
expr: date_trunc('day', ordered_at)
# use date_trunc(ordered_at, DAY) if using [BigQuery](/docs/build/dimensions#time)
type: time
type_params:
time_granularity: day
- name: ordered_at
expr: date_trunc('day', ordered_at)
type: time
type_params:
time_granularity: day
```

:::tip
Expand All @@ -166,19 +167,18 @@ We'll discuss an alternate situation, dimensional tables that have static numeri
- 🔢 We can also **make a dimension out of a numeric column** that would typically be a measure.
- 🪣 Using `expr` we can **create buckets of values that we label** for our dimension. We'll add one of these in for labeling 'large orders' as any order totals over $50.

```YAML
...
<File name="models/marts/orders.yml" />

```yml
dimensions:
- name: ordered_at
expr: date_trunc('day', ordered_at)
# use date_trunc(ordered_at, DAY) if using BigQuery
type: time
type_params:
time_granularity: day
- name: is_large_order
type: categorical
expr: case when order_total > 50 then true else false end
...
```

## Making our measures
Expand All @@ -191,7 +191,9 @@ dimensions:

- 👀 Let's look at **our staging model** one last time and see what **fields we want to measure**.

```SQL
<File name="models/staging/stg_orders.sql" />

```sql
select
---------- ids -> entities
Expand All @@ -211,10 +213,12 @@ from source

- ➕ Here `order_total` and `tax paid` are the **columns we want as measures**.
- 📝 We can describe them via the code below, specifying a **name, description, aggregation, and expression**.
- 👍 As before MetricFlow we default to the **name being the name of a column when no expression is supplied**.
- 👍 As before MetricFlow will default to the **name being the name of a column when no expression is supplied**.
- 🧮 [Many different aggregations](https://docs.getdbt.com/docs/build/measures#aggregation) are available to us. Here we just want sums.

```YAML
<File name="models/marts/orders.yml" />

```yml
measures:
- name: order_total
description: The total amount for each order including taxes.
Expand All @@ -226,18 +230,22 @@ measures:

- 🆕 We can also **create new measures using expressions**, for instance adding a count of individual orders as below.

```YAML
- name: order_count
description: The count of individual orders.
expr: 1
agg: sum
<File name="models/marts/orders.yml" />

```yml
- name: order_count
description: The count of individual orders.
expr: 1
agg: sum
```

## Validating configs
## Reviewing our work

Our completed code should look like this, our first semantic model!
Our completed code will look like this, our first semantic model!

```orders
<File name="models/marts/orders.yml" />

```yml
semantic_models:
- name: orders
defaults:
Expand Down Expand Up @@ -281,12 +289,7 @@ semantic_models:
agg: sum
```

- 🦺 We can check that it's a valid configuration and works with the real data our dbt project is generating by using the `mf validate-configs` command. This will:
1. **Parse the semantic manifest** our configuration describes out of the dbt project.
2. Validate the **internal semantics** of the manifest as described by our code.
3. Validate the **external semantics** of the manifest against your data warehouse (e.g. making sure that a column specified as a dimension exists on the proper table)

## Review and next steps
## Next steps

Let's review the basics of semantic models:

Expand Down
Loading

0 comments on commit b728d4d

Please sign in to comment.